In this module we will explore controlled vocabularies, how they work and a little bit about one of the controlled vocabularies that CIOOS uses. Controlled vocabularies are all around us and you’ve probably used them already without even realizing it.
A controlled vocabulary is: words that are organized in a particular way for the purpose of indexing the information, whether by human indexers or machine indexers. Controlled vocabularies are the way that you organize the data and a defining factor in how it is formatted to a standard. They define how to label variables and how the variables are recorded. (units, standard taxonomy, etc.)
A simple example of controlled vocabulary is subject headings that you would see at the library, such as History, Science, Politics. Another example, for those who are still in post-secondary, is your ‘Major’ or primary area of study in post secondary school. The words used to connect and define your major are a controlled vocabulary. Here is a web to show what we mean
Of course, the full controlled vocabulary has many, many more elements, but this simplified model represents a controlled vocabulary of post-secondary courses.
The most important part about a controlled vocabulary is the controlled part. The definition and organization of the different majors in each faculty is laid out and confirmed by the authority of the post secondary institute that uses them. Other controlled vocabularies have different entities that have created and control them. A controlled vocabulary cannot exist without some sort of organizing body and mutual agreement from the people that are using it.
Why are controlled vocabularies so important?
Controlled vocabularies help with advancing the FAIR guidelines for ethical research, specifically the ‘I’ for interoperability. Having a controlled set of variable names means that it is much easier for multiple datasets from different places to work together because they are following a standard format.
From 2018-2021 a project called ‘Big Ocean, Big Data’ (BOBD) ran with the effort of understanding how much ocean data there is. Each new year continues to break the record of the last year for the amount of ocean data that is created. The BOBD team states that they had limited success with cataloguing the vast swathes of data because of:
- A lack of data set standardization;
- Sparse annotation tools for the wider oceanographic community; and
- Insufficient formatting of existing, expertly curated imagery for use by data scientists.
The first point in the list above is the most relevant to controlled vocabularies. With the kind of standardization that controlled vocabularies provide; the barrage of ocean data can be more easily collected, curated and disseminated.
Two examples of controlled vocabularies used in the ocean sector are Darwin Core Terms and the CF Conventions vocabulary.
Darwin Core Darwin Core is a body of standards for biodiversity informatics. It provides stable terms and vocabularies for sharing biodiversity data. Darwin core terms include terms relevant to: taxon, identification, occurrence, record level, location, and event. All ocean data is geospatial, so the location terms are important and include, for example: decimalLatitude, decimalLongitude, minimumDepthInMeters, maximumDepthInMeters, locationID etc. You may not have to use all of the terms listed in the link above but you should use all that are relevant. Using controlled vocabularies to standardize your data will make it easier to ‘Find’, and make it more ‘Interoperable’, meeting two of the four tenets of FAIR.
Another controlled vocabulary is the CF Conventions (Climate and Forecast metadata conventions). This controlled vocabulary is Earth science focused, and used not only heavily in the ocean sector, but by other weather related organizations and even for some data from NASA’s Jet Propulsion Laboratory.
Controlled vocabularies may seem daunting, but following them is like using a checklist, in most cases. Do you have the title, do you have authors? It will help you feel more confident in your work, that you’ve accounted for everything to follow a controlled vocabulary.
Compliance checkers exist for doing your own Quality Control of controlled vocabularies. For CF conventions, the Integrated Ocean Observing System (IOOS), the American counterpart to CIOOS has this compliance checker.
Before you go! Things to consider for the next module:
As you go through the rest of your week until the next module look for controlled vocabularies in your daily life. Where are they, what is their purpose? Is it a good controlled vocabulary, or does it have elements that don’t make sense?