Module 10 – Data Sharing

This module will focus on data sharing processes.

In our previous module:

We discussed search practices, and how to use them to search databases and repositories. Putting just some of these tips into practice should help make it easier to find somewhere to share your data.

Data Sharing

Did you know that sharing your data is an opt-in process rather than opt-out? Essentially your work will not automatically be shared, and it will be up to you to implement the right processes and find a repository to share your work in an open fashion.

The right processes are: making sure that data sharing is part of your initial data management plan, at the beginning of the research lifecycle. This is important to focus on early because data sharing is driven by metadata. To reiterate from a previous module: No matter how good you are at searching, if there is poorly implemented metadata, the data will not be able to be found. And if you can’t find it, it’s unlikely that other researchers will be able to find it either.

Deciding where you want to share your data early will help you decide on a metadata schema and controlled variables for your data. This is especially important if your data needs to be open access, as best practice dictates that open access data be interoperable, which is again, where metadata comes in.

Many grants from the Tri-Agency require that you make your data open access to fully comply with the terms they have set out.

For this reason, it is important to know ahead of time if this is necessary and add to your data management plan about what you are going to do to make your data available when the project ends.

One tip to finding good repositories, if you don’t have one picked out by your organization or institution already, is to search for data that is similar in function or thesis to yours, and see where it is hosted. So, even if it isn’t an ocean data repository, if they’ve worked well with ocean data in the past, or have hosted the kind of data files you will be using, it may be worth looking at in more detail.

There are also databases that specifically focus on being links to multiple databases (a database of databases if you will), with search functions similar to databases of journals and data. The Database of Open Access Journals (DOAJ) is one such example, being a database that focuses on entirely open access journals and articles. This is a good place to look for open access journals to submit your data to, should you need to find one.

Here are some data sharing best practices:

These best practices are similar to data management best practices as well, so the more confident you are in one the easier time you will have at the other.

  • Make sure your data is well organized, and properly labelled. Consider the tips from earlier modules, especially to make sure your data is re-usable for both yourself and future researchers. This is where controlled vocabularies come in! (More on those in a later module.) Controlled vocabularies help you index and sort your data. Along with controlled vocabularies, which usually are used for variables within your dataset, metadata descriptions such as the CF Conventions are best practice for keeping your data organized.
  • Commit to a metadata schema and make sure your data is complete within that schema (more on this in a later module). Metadata schemas allow your work to be more easily categorized and found by machines on the web. Metadata schemas also ensure that both you and others have detailed and correct information about your work, so that it can be used again in the future.
  • Research the journal/repository you are going to submit to, and make sure they have data sharing rules that align with the ones laid out by the project funder. Different journals might have different levels of open access, or licensing (see Module 14 for more on licenses.) Once you submit to a repository, you will be expected to comply with their submission standards and their licensing practices. (So, for example, if it was in their rules that data had to be fully open access, you would be consenting to your own data being open access by submitting to them.) Thus, it is important to be aware of the license requirements ahead of time.
  • While it is not necessary to put your data through an official Quality Control process before it can be shared, if quality control is done, it needs to be documented. Who did it, what process did they use and when?
  • Networking! Creating a community of other researchers and getting to know them will help you find journals and repositories that might be of interest to your project. If they’ve submitted to a journal before, they could have invaluable knowledge about the process preferred by that journal.

While sharing is opt-in only, it is fairly simple to opt-in. By submitting your data to an open access journal or repository you automatically opt-in to share your data—do not be alarmed, however as you still retain copyright and will still be credited for your work, it is simply that more people will have access to your data! While opting in is simple, following the best practices outlined above adds some complexity to the process that must be kept in mind.

Activity:

In previous modules we asked you to talk data with a larger organization or stakeholder that you were interested in. For this activity, we want you to do something similar, but on a smaller, more intimate scale. Get to know your fellow researchers if you haven’t already. What are they working on, what is interesting to them? Are they planning on submitting their data to an open access journal? If both of you worked with open access data, are there ways both your data sets could enrich each other?

Before you go! Things to consider for the next module:

Consider ways that your data could be useful to someone else, or for yourself in the future. Is this data you’d want to come back to and study again? How many years down the road do you think your data will still be useful?