In this module we will go over data crosswalking.
In our previous module we:
discussed metadata schemas. As you know, each schema has its own rules and required elements that are recorded. In some cases you may need to translate your metadata from one metadata schema to another, such as if you want to submit to another repository, but they require a different schema.
The translation that would be required is called crosswalking.
A common childhood experience is learning how to use the crosswalk. The lines and lights allow pedestrians to safely get from one street to the other, when those streets are cut across by cars, in a car centric world. Part of learning was someone walking you across.
Unlike humans though, machines are at a wall when it comes to dynamically learning and understanding certain concepts. There is only so much they can do, which means that there are still tasks that humans must do for them to function.
One of these functions is, ‘walking the crosswalk’ involves metadata schemas. Essentially it involves going from one schema to the other. For example, think of it as Metadata Object Description Schema (MODS) is on one side of the street, and you need to take all those terms and fit them properly and dynamically into the terms on the side of the street, in the CF Metadata Convention. The act of crossing the crosswalk is that sorting.
Another way to think of it is like translating a language, you are translating one metadata schema to another.
Here is an example of crosswalking, using common standards you’ve likely encountered before:
List A: Provinces by their full name
- Nova Scotia
- British Columbia
List A has a few provinces listed, but they need to be translated to list B, where they have to be listed in their two letter province code for mailing purposes.
So, this would involve knowing what the agreed upon abbreviation for each province is, and translating them to list B. (Some of them involve the two letters of first word, some are the first two letters, and some are the first and last letters.)
This is a good example of standardization of metadata schemas, as the schema for B was created and agreed upon by a particular institution, in this case Canada Post. These mailing codes for provinces make sorting mail more efficient for both machines and humans, as well as add the benefit of saving space on small envelopes or postcards, since the entire province name need not be listed.
List B: Provinces by their official mailing code
One big reason crosswalking is still necessary for humans to do, is that humans can more easily make contextual judgement calls that machines might not be able to do, or find missing information that machines might not be able to read. In the example above there is no algorithm you could give a machine to figure out the province codes for each province, it would simply have to be programmed in individually. (Alberta’s province code is AB, which is the first and third letter of its name, whereas Manitoba’s province code is also MB, but in this case it is the first and second last letter. There is no mathematical pattern for a machine to use.) This is why crosswalking is useful, because otherwise the machine would not be able to read it.
Walking the crosswalk is also an important first step to understanding basic web design and coding, as much of your time encoding XSLT style sheets and XML style sheets, will be spent translating properties from one design schema to the next, such as translating the use of color or border size, or text font and style from the descriptive properties of one page to another.
XML stands for Extensible Markup Language, and is a file format for storing, reconstructing and sharing data. They help encode documents to be both machine readable and human readable, and are used in many places across the web.
Here is a metadata page of data from CIOOS’s own data explorer. XML has trees of information that become more and more specific. For this reason, we emphasized base knowledge of coding and web design as understanding the way that XML trees work will help you understand metadata better.
Crosswalking is most often necessary to be done by humans because some metadata schemas have higher levels of complexity than others. For example, Darwin Core allows specification of location down to GPS coordinates, and almost all levels of biological specimen classification from Kingdom down to Species. (This is useful because in large databases of datasets, searching could be as granular as Species, to allow for comparison between them.)
However, not all databases nor all schemas are this complex, and it can, in some cases be useful to add your data to a repository with a lower order of complexity for sharing.
This is where crosswalking would come in.
Here is the more complex Darwin Core to the less biologically complex MODS:
Darwin Core has more than 50 different ways to delineate just the spatial context of your data, and even more when it comes to temporal parts of your data. However, MODS only has very broad information.
|Year, Month, Day||<subject><temporal>|
|Country, State Province, Island Group, Island||<subject><hierarchicalGeographic>|
|Decimal Latitude and Decimal Longitude||<subject><cartographics><coordinates>|
As you can see, Darwin Core has a lot of specific attributes that would need to be combined into one attribute in MODS. This is also just a small sample of the specificity of Darwin Core, which has even more terms in it’s schema. It even includes provision for coordinate error percentage as something that can be part of the schema and extremely granular temporal and spatial examples.
With this example it is easy to see why human led crosswalking is necessary, because the terms aren’t always the same, and decisions about what data to crosswalk into where is based on context, and therefore best performed by humans.
Crosswalking may allow data to be shared ever further, even across repositories that may not be able to use one metadata schema versus another.
Before you go! Things to consider for the next module:
In our module on metadata schemas we asked you to think about what kind of schema you would use for your own collection of objects. If you know someone else taking this tutorial who came up with their own metadata schema, take turns trying to crosswalk your data into the other person’s schema.