Module 16 – Metadata

This module will discuss metadata.

In our previous module we:

Opened up the floor to learning about controlled vocabularies. Controlled vocabularies help to standardize data and this goes hand in hand with standard metadata or metadata that follows a metadata schema..

Metadata is data about data! Which sounds confusing, we know, but put more simply, metadata are the words we use to describe data.

Here’s a simple example:

This book has a lot of data that needs to be categorized. Such as: the title of the book, the page numbers, the author, the publishing company, and the first publication date etc, etc.

As a human, you can look at this book, and contextually understand who has written it, and what the title is, simply from existing and understanding what books are, and how they typically formatted. You see the words ‘written by’ and understand who the author is. But machines are unable to parse that context unless we give them the rules to do so. Even just feeding the words ‘Moby-Dick’ and ‘Herman Melville’ into a machine won’t be enough, it needs standards to be able to makes sense of that information.

This is where a metadata schema comes in as it describes what information is required in a precise way so that machines can understand it.

Here is a very simple example of metadata, that could be made about the book above:

The metadata properties described in this case are the title of the book and the author. The actual metadata within that metadata property are Moby-Dick and Herman Melville, respectively. This type of metadata about the book allows books (the data) in a library to be categorized and organized properly within a library catalogue where they are searchable and findable.

With this metadata, you would be able to use a filter in a catalogue that searches books by only the title or by only the author. Because this metadata allows a machine to know which part of the book is the title and which one is the author. This makes your data more FAIR by making it ‘Findable’ by machines on the web.

Metadata schemas are basically list of questions that you answer to help get your metadata up to compliance or a standard. This is especially true if a metadata schema is already picked out by your funder, or if one is standard by the repository you’ve picked out to submit to. Consider adding going through your metadata schema requirements as something you do before you start your research. Knowing what metadata you need to fill in will help you feel more comfortable throughout the entire project because you will already know what data about your data you need to record, and what you don’t.

So, now that you have a basic understanding of what metadata is, the next lesson explores more on standardized metadata schemas. These are particular metadata structures that vary based on the type of data that they describe. The metadata schemas are often created of the schema have decided are most important.

Some of the most common are Dublin Core and MDOS (Metadata Object Description Schema) which are both general schemas. There are many scientific schemas, and they are  broken down by category, to include social sciences, or biological sciences etc.

One of the most common, which you might’ve heard of before, is Darwin Core. Darwin Core is an extension of Dublin Core, with an emphasis on biodiversity and natural sciences. There is also the Climate and Forecast (CF) Metadata Conventions, which we’ve mentioned throughout some of our other modules, as well as the Ecological Metadata Language (EML). EML is an XML markup syntax used for Earth and environmental sciences; it’s use in other research disciplines continues to grow. CF metadata focuses on the spatial and temporal properties of climate and forecast data. Spatial and temporal, are as mentioned previously, geographic properties, as all ocean data is geographic in nature.

Here is an example of a few Darwin Core Terms and what you would put in the fields:

LabelOrganism Name
DefinitionA textual name or label assigned to an Organism instance.
ExamplesHubertaBoab Prison TreeJ pod
Example taken from the list of Darwin Core terms
LabelOrganism Remarks
DefinitionComments or notes about the Organism instance.
ExamplesOne of a litter of six

As you can see from these examples, Darwin Core Terms allow for the easy labeling and sharing of biological data. Following a schema means there is no arguing about definition, or extra time that needs to be spent deciding where each piece of data fits or what to call it, such as if you were creating your own.

Understanding metadata and the role it plays in your research project will make your project more efficient and easier.

In our next module we will look at how to translate one metadata schema to another.

Before you go! Things to consider for the next module:

If you were to organize a collection or set of objects in your room, what metadata properties would you want in their metadata schema? Imagine if your collection has a hundred unique objects, or a thousand! What schema would you use to make sure it would be easy to find things with a machine or search engine?