Module 12: Creation and Preservation of Data

As we’ve mentioned in previous modules, short term, active data management is an important skill for early career researchers to have. It is just as important to think about what you’ll do with your data after a project, as it is to think about what you’ll do with it while the project is in progress.

In this module, we will discuss data management and storage during your project. Specifically, we will outline a few best practice principles of active data management and suggest some useful computer programs. Good data storage will help you feel more comfortable with data sharing!

Data Storage vs Data Preservation

First, let’s discuss the difference between data storage and data preservation. Data storage is simply putting the data somewhere while you work on your project (the active data management that we mentioned). Data preservation considers which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology.

Data preservation means following the FAIR principles, making sure that data is findable, accessible and usable long term; it means working into your data management plan the steps that will help you put your data in a repository.

What is a data repository? Data repositories, like CIOOS, collect, store and manage vast swathes of datasets for data sharing and analysis.

Data repositories are useful because then you don’t have to stress about your data forever! In an earlier module we talked about data pitfalls, and one way that nearly all of those pitfalls could have been avoided was if the data was in a repository. The two researchers wouldn’t have even had to talk to each other. Repositories also have best practice guidelines for submitting to them, which means an easy-to-follow checklist to make sure your data is FAIR.

Choosing a repository should be done early in your data management plan, and choosing a repository will help inform your controlled vocabulary, metadata schema, and of course how you will actively manage your data.

Consider some of these active data management best practices that will help you prepare your data for eventual submission to a repository:

Do’s and Don’ts of Active Data Management

Don’t: be the only person with a copy of the data/data manipulations. Don’t have the work only on your laptop or a flashdrive that could become lost.

Do this instead: Do this instead: Use Google Drive, Dropbox, or SharePoint to make sure that all relevant members of your team are kept in the loop about the project. Many of these programs also allow multiple people to edit a document at once, which means more efficient collaboration. (The programs listed here are merely examples, your institution may have other cloud services.)

Don’t: rely solely on the cloud to back-up your data. While it is not a good idea to have your data solely on a flash drive or laptop that could become lost or damaged, it is equally not a good idea to have your work solely on the cloud.

Do this instead: Follow the principles of 3-2-1 for backing up, 3 separate copies, with 2 of them on-site/local, but on different mediums and 1 in an offsite location.

An example of this 3-2-1 principle would be: 1 copy on the cloud, 1 copy on someone’s computer and the 3rd copy printed out.

These are all things that can and should be discussed in the data management plans as the project is getting started, and can be included in the budget, if money is needed to print a back-up or pay for offsite storage.

For those of you with a post-secondary institute or workplace, see what resources they have to offer, many have systems in place for off-site storage, or will take care of the data for you.

Now, I’m sure that many of you have recognized one of the concerns of using the 3-2-1 method during the project, which is that having 3 copies means that there might be ethical concerns if the data needs to be moved to long term storage. This is something that also needs to be accounted for in your data management plan: who is going to make sure that all copies are correctly stored and disposed of if they need to be for ethical concerns. Whoever does so will need to have access to all 3 copies.

How to choose a repository:

First, see if your institution or funder has specified a repository for you to use. If so, you must use that one, otherwise it will be unprofessional.

If there is not a specified repository, here are the recommended characteristics to look out for:

  1. Discipline specific repositories. For ocean research in Canada, CIOOS is a fantastic option.
  2. Does it follow the FAIR principles?
  3. Has clear policy about data retention and how it will store your data long term
  4. Uses persistent, unique identifiers, like DOIs

Before you go! Things to consider for the next module:

What is the best form of active data management? Do you think it is important to have one of the three storage formats be a printed, physical form of storage? How long do you think each storage format lasts (comparing paper vs flashdrive vs cloud storage)? How much upkeep does each storage form need?