Glossary

Aggregated Data
A combination of unit records created with the objective that individual details are not disclosed.

Anonymisation
The process of adapting data so that individuals cannot be identified from it.

Application Programming Interface
A way computer programs talk to one another. Can be understood in terms of how a programmer sends instructions between programs.

Attribution Licence
A licence that requires that the original source of the licensed material is cited (attributed).

Authoritative
Able to be trusted as being accurate or true; reliable: e.g. “clear, authoritative information”.

Authoritative Data Source
A recognised or official data production source with a designated mission statement or source/product to publish reliable and accurate data for subsequent use by customers. An authoritative data source may be the functional combination of multiple, separate data sources.

Big Data
A loose term, not formally defined, for high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing, that can give enhanced insight and decision making.

Big Data Analytics
The process of examining and interrogating big data assets to derive insights of value for decision making.

BitTorrent
BitTorrent is a protocol for distributing the bandwith for transferring very large files between the computers which are participating in the transfer. Rather than downloading a file from a specific source, BitTorrent allows peers to download from each other.

Comma-separated values (CSV)
A file type used to store tabular data (numbers and text) in plain-text form.

Commercial Use/Re-Use
Use that is intended for or directed toward commercial advantage or private monetary compensation.

Connectivity
Connectivity relates to the ability for communities to connect to the Internet, especially the World Wide Web.

Content
The collection of information stored for a purpose in a file, folder or electronic message

Copyright
A right for the creators of creative works to restrict others’ use of those works. An owner of copyright is entitled to determine how others may use that work.

Creative Commons
A non-profit US organisation that enables the sharing and use of creativity and knowledge through free legal tools.

Data (Can be singular or plural in common usage)
The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. (The terms data, information and knowledge are frequently used for overlapping concepts. The main difference is in the level of abstraction being considered. Data is a broad term, embracing others, but is often the lowest level of abstraction, information is the next level and, finally, knowledge is the highest level.)

Data Access Protocol
A system that allows outsiders to be granted access to databases without overloading either system.

Data Discovery
The process of finding out what data exists and how it can be accessed.

Data Sharing
The transfer, by agreement, of data collected for a specific purpose between two or more parties

Dataset
A collection of data, usually presented in tabular form, presented either electronically or in other formats

De-Anonymisation
The technical process of attempting to determine the identity of a person or individual to whom a pseudonymised dataset relates.

Definitive
Of recognised authority or excellence

Derived Data
A data element or dataset adapted from other data sources using a mathematical, logical, or other type of transformation, e.g. arithmetic formula, composition, aggregation. See Value-added data.

Digital Rights Management
A class of access control technologies that are used by hardware manufacturers, publishers, copyrightholders and individuals with the intent to limit the use of digital content and devices after sale.

Disclosive
Data is potentially disclosive if, despite the removal of obvious identifiers, characteristics of this dataset in isolation or in conjunction with other datasets might lead to identification of the individual to whom a record belongs.

Document
Any content whatever its medium (written on paper or stored in electronic form or as a sound, visual or audiovisual recording).

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable.

Geospatial Data
Also known as spatial data or geographic information, it is the data that represents the geographic location of natural and man-made features on Earth. Spatial data is usually stored as coordinates of points, lines and areas and may include their topological relationship and attributes.

HTML
See HyperText Markup Language

HyperText Markup Language (HTML)
The standard markup language used to create web pages

Information
Interpretation and analysis of data that when presented in context represents added value, message or meaning.

Information Asset Register
IARs are registers specifically set up to capture and organise meta-data about the vast quantities of information held by government departments and agencies. A comprehensive IAR includes databases, old sets of files, recent electronic files, collections of statistics, research and so forth.

Intellectual property rights
Monopolies granted to individuals for intellectual creations.

JSON (JavaScript Object Notation)
An open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.

Licence (Noun)
A legal document giving permission to use information

Linked Data
The technical term used to describe the best practice of exposing, sharing and connecting items of data on the semantic web using unique resource identifiers (URIs) and resource description framework (RDF). Not to be confused with data linking.

Machine-readable
Formats that are machine readable are ones which are able to have their data extracted by computer programs easily. PDF documents are not machine readable. Computers can display the text nicely, but have great difficulty understanding the context that surrounds the text.

Metadata
Data that describes or defines other data. Anything that users need to know to make proper and correct use of the real data, in terms of reading, processing, interpreting, analysing and presenting the information. Thus metadata includes file descriptions, codebooks, processing details, sample designs, fieldwork reports, conceptual motivations, etc., in other words, anything that might influence the way in which the information is used.

Modelled Data
Information created by mathematical representation of data relationships; sometimes used to simulate environments that are difficult to observe reliably or consistently.

Non-Commerical Use
Use that is not intended for or directed toward commercial advantage or private monetary compensation.

Ontology
Formal representation of knowledge as a set of concepts within a domain, and the relationships among those concepts.

Open Data
Data is open if anyone is free to access, use, modify, and share it — subject, at most, to measures that preserve provenance and openness

Open Government Data
Open data produced by the government. This is generally accepted to be data gathered during the course of business as usual activities which do not identify individuals or breach commercial sensitivity. Open government data is a subset of Public Sector Information, which is broader in scope. See http://opengovernmentdata.org for details.

Open standards
Generally understood as technical standards which are free from licencing restrictions. Can also be interpreted to mean standards which are developed in a vendor-neutral manner.

Personal Data
Data which relate to a living individual who can be identified – (a) from those data, or (b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller, and includes any expression of opinion about the individual and any indication of the intentions of the data controller or any other person in respect of the individual.

Plain text
The contents of an ordinary sequential file readable as textual material without much processing. Plain text is different from formatted text, where style information is included, and “binary files” in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.).

Pseudonymised Data
Data relating to a specific individual where the identifiers have been replaced by artificial identifiers to prevent identification of the individual.

Public domain
Works that are publicly available and in which the intellectual property rights have expired or been waived

Public Sector Bodies
State, regional or local authorities, bodies governed by public law and associations formed by one or several such authorities or one or several such bodies governed by public law.

Public Sector Information
Information collected or controlled by the public sector.

Raw Data
Data collected which has not been subjected to processing or any other manipulation beyond that necessary for its first use. Raw data, i.e. unprocessed data, is a relative term; data processing commonly occurs by stages, and the ‘processed data’ from one stage may be considered the ‘raw data’ of the next.

Re-use
Use of content outside of its original intention.

Resource Description Framework (RDF)
RDF, a W3C standard, is the foundation of several technologies for modelling distributed knowledge and is meant to be used as the basis of the Semantic Web

RSS Feed
Rich Site Summary (RSS) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video.

Semantic Web
A web of data that can be processed directly and indirectly by machines, providing a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is based on the Resource Description Framework (RDF).

Share-alike Licence
A licence that requires users of a work to provide the content under the same or similar conditions as the original.

Tab-seperated values
Tab-seperated values (TSV) are a very common form of text file format for sharing tabular data. The format is extremely simple and highly machine-readable.

Taxonomy
The science or technique of classification.

Uniform Resource Identifier (URI)
The generic term for all types of names and addresses that refer to objects on the World Wide Web. A URL is one kind of URI.

Uniform Resource Locator (URL)
A type of URI that identifies a resource via a representation of its network location

Unit Records
Individual items of information from surveys or observations that often contain confidential details.

Value-Added Information (or Data)
Data to which value has been added to enhance and facilitate its use and effectiveness by or for users.

Web API
An API that is designed to work over the Internet.

XML
See Extensible Markup Language

Acknowledgements:

  • The Open Data Handbook. 2010. The Open Data Handbook. [ONLINE]: http://opendatahandbook.org/en/glossary.html. [Accessed 02 April 15].
  • data.gov.uk . 2010. data.gov.uk . [ONLINE] : http://data.gov.uk/. [Accessed 02 April 15].

South Australian node of GovHack