Module 5 – Data in the Research Lifecycle

In this module we will be going over data management across the whole of the research lifecycle and why it is important no matter what stage your research project is on.

In our previous module we asked

What is the most important element of data management?

Properly managed data is important to:

  • You! The better managed your data is, the easier a time you will have with your project.
  • Anyone you work with on a research project in the future.
  • The three main governing bodies that provide funding to the majority of publicly funded research in Canada.

These three main governing bodies are colloquially referred to as the Tri-Agency/Tri-Council and are made up of the Canadian Institute of Health Research (CIHR), the Natural Sciences and Engineering Research Council (NSERC) and the Social Sciences and Humanities Research Council (SSHRC). These three agencies are the main way that the federal government of Canada uses to support research in post-secondary institutions. Even outside of post-secondary institutions, if an employee or government research entity is paid through grants by one of these councils, then they are subject to the same rules.

Compliance with the Tri-Agency guidelines for receiving funding is incredibly important! Not only can non-compliance affect you (making it harder to conduct future research) but it can affect the institution you work for too, making it harder for them to do research as well.

The Canadian Tri-Agencies have newly required that to receive full funding for a research project, the team must meet specific guidelines around data management plans. (These guidelines differ for those doing research in and with Indigenous Communities; more on that in a later module.)

See the full guidelines of the new policy.

These new guidelines are to make research more efficient, more ethical and to more easily comply with F.A.I.R. Principles. (Learn more about FAIR)

FAIR stands for:

Findability – this means your research is able to be found by machine learning, which allows the greatest amount of online finding

Accessibility – this means that your research is open access or otherwise easy to get access to

Interoperability – this means your data can be read by many different programs and can be integrated with other datasets seamlessly for interaction. More specifically this means your data can be found by machines! Machines use their own language to find and index items on the web, and interoperability means that information about your data (metadata) is ‘translated’ into a language they can read. 

Re-use – your data can be used for other projects to increase replicability

The ultimate goal of FAIR principles is to increase sharing amongst researchers!

Here are just some of the most important elements of data management across the research lifecycle. These elements should be considered before the project begins in earnest, however they are delineated here by when you will likely end up implementing each element.

Before the project starts:

This step would essentially be during the planning/proposal writing stage. 

Coming in with a data first mindset and using data creatively

It is sometimes the case that sometimes people will collect their data first, and then figure out what to do with it. While most people who are in a post-secondary institute will be unlikely to have this problem, since the format of post-secondary comes with a data first mindset, not all institutions in the business sector may have them, and researchers who have only worked in industry may not.

Cultivating a data first mindset means that all aspects of data collection and curation that will be needed for a project are considered before the project is even begun. It means thinking ahead and calculating what your data needs will be, while also building in safeguards in case something unexpected happens. Practicing and making sure this skill stays fresh, even if you move into an industry or institution that doesn’t prioritize it will give you a huge advantage on future projects. It looks great on a resume to be able to point out that you know how to think about data from this point of view.

During the project:

This step is while you are collecting, curating and manipulating the project.

Active data management

This is what happens to your data during the research project, while you are still collecting and manipulating it. Active data management is very important because it can be detrimental to your project to wait until the end to manage your data. This ties back to having a data first mindset, as active data management would be something considered before the project has begun and be an important part of your DMP.

Data Analysis

It is important to understand how to analyze the data you will be collecting. One skill that can help you become a better data analyst is understanding the basics of coding, and understanding how automatic analysis tools work. While it is not necessary to be an expert coder, even a small amount of coding knowledge will be useful, and especially knowledge of the underlying theories behind good coding. Data analysis is also an important skill to have, especially when translating across spreadsheets. Being able to analyze and understand where a translation went wrong from spreadsheet to spreadsheet or where a program made a mistake is an important skill that will save you time in the long run. Basic coding can make those translations more efficient, especially using Python and .csv files, but analysis is still necessary to catch mistakes.

After the project:

This step is during the publishing phase, after you have completed the main objectives behind your research.

Data sharing

Data management can help you share your data effectively. Good data management planning helps with licensing, metadata and standardized formats that also follow the FAIR principles that aim to increase data sharing.

Open data is data that has a license for re-use in good faith, by anyone who wants it. Open data is one of the easiest and most effective methods of data sharing if done correctly following data and metadata standards. Finding and using your open data should be simple for others.

Why is data sharing important? Data sharing is beneficial for all researchers, you included! It means that less money has to be spent making data able to interact with each other, it means less repeated experiments (except for replicability studies) so that you’re not just doing the same research over and over again because none of it is accessible, and allows for greater collaboration. It also increases efficiency and quality of research by giving a benchmark of best practices.

This course will touch on all of the most important aspects to help you become the best data manager and early career researcher you can be! Our next module will focus specifically on publishers and funders within the ocean sector.

Before you go! Please consider for the next module:

What do you think is the most important part of the research lifecycle? How does this connect more broadly to data, and are there any connections you’d like to explore in more detail?