Module 13: Citation Culture

In this module we will be discussing another aspect of data, and that is the citing of the data that you use! Citation can become a heavy weight on a project, if it is left to the end.

In our previous module

we talked about active data management, and left you with a number of questions about data storage and preservation.

Another important aspect of preservation is citation. Citation of your dataset is something that you’ll want to make sure is a habit, and that you continually think about and plan for during the initial phases of your project.

For example, imagine having to go through one hundred pages or more because you realized a citation in your bibliography isn’t properly connected to the citation in the report you’ve written and now you have to find it. This could take hours of re-reading your report. Instead, chipping away at it bit by bit, or making sure it is planned for ahead of time will make sure that nothing gets missed. The same is true for data citation! If you cite a part of a dataset or some of the variables, you will need the full citation for it.

While it won’t end your project if you miss a citation, it will cause headaches trying to chase it down, and all of your works must be cited to be considered professional and to make sure you receive full funding. Plagiarism, even accidental, is serious. And if there isn’t a plan in place to avoid it, it is easy for it to happen accidentally.

As well, if you use multiple datasets that have similar variables you could wind up mixing up your citations if you’re not careful! A good best practice is to make sure to cite everything as you go, rather than saving any of it for later. It will also help you make sure you don’t forget what you meant to cite in the first place. One thing that will help with this best practice is automatic citation generators! Many databases have them built-in, including CIOOS, however if there are none available natively, there are also programs like Zotero or webpages like Citation Machine to use. It is recommended that you double check all citations generated by machines as they can occasionally have errors, but they do take some of the load off of you, especially if you have a lot of citations to make.

Getting into a culture of citation is important! Put simply this means practicing and training yourself so that citing things becomes second nature to you. I’m sure you’ve had the importance of citing photos, graphs, research papers and videos drilled into your head by now, but what about the importance of citing the raw data in a project?

Now, I’m sure many of you are wondering why a culture of citation is so important, well that is because summarizing data into a picture or graph and not also providing the data can cause misplaced verifiability and contribute to ocean misinformation if one is not careful.

Here’s what we mean: Have you ever been on Reddit, or Twitter or Tumblr and seen a graph or infographic summarizing a research topic go viral on those sites? If you have, and you’ve checked out the comment sections it’s likely that you’ve noticed many people interpret only from the graph, without looking at further context—or even worse the further context is not available to them, because it’s behind a paywall, or wasn’t included with the journal since it is not common practice to cite and share datasets. It’s similar with science headlines—people read the headline and react without all the proper context.

A data citation culture will help make it more common for this context to be available to the average person, which is one way to help combat misinformation.

Activity:

How quickly can you find people taking a graph/infographic or headline out of context and engaging in spreading misinformation? Specifically scientific information. When you find this misinformation, consider the source: how well cited is it, is there a citation for the raw data itself? Is it possible for members of the audience to see the raw data themselves?

Citation is important, not only because it gives proper acknowledgement to the authors and everyone else who worked on the project, but because it carries necessary information that allows others to find, verify and use the data themselves.

While every style conveys the information in a different order and with different aesthetics, they all contain the same basic information: who authored this work, where was it originally found and when was it published or accessed. Not only do different style guides and standards change the order of information a citation may contain, so too does the license that is applied to your dataset. More on that in the next module.

Thankfully, many databases, repositories and catalogues have created systems to make citation easier, including having a button that will generate the citation for you. CIOOS Catalogues have one such system available, which allows you to select different citation styles, and either copy-paste or download the citation in the format of your choice. The CIOOS citation generator is usually found around 1/3 to 1/2 of the way down the page that contains the metadata record.

One of the most important parts of citation, is remembering the Digital Object Identifier (DOI). (See Module 18 for more information). The DOI is a permanent unchanging pathway for data that is cited, which will function even if the URL changes. This is extremely important when citing dataset that are only available online, as otherwise they might be unfindable, which makes your citations look unprofessional.

Here is an example citation in APA format, which some of you may be familiar with already.

Author(s). (Date). Title [Data Set]. URL or DOI

Authors: These are the authors of the dataset. List all authors in the order they appear in the dataset, or alphabetically if there is not obvious order already. The format should be Last Name, First Initial. Middle Initial.

Date: If possible be as specific as you can, otherwise just list the year.

Title: This is the title of the dataset. [Data Set] has to be included after the title to specify that it is a dataset instead of something else.

URL or DOI: A URL is perfectly fine, but a DOI, if available, is guaranteed to be a stable link back to the dataset. A DOI is best practice if available.

And here is the dataset in question (properly cited):

Bullock, T., McClintock, J. D., & Peddle, A. (2023). Grand Banks Wave Buoy [Data set]. https://catalogue.cioosatlantic.ca/dataset/ca-cioos_f3e3e156-2a39-4f77-8d3c-9031b79168e7?local=en  

Citation can be a lot of work, but having a system in place will save your project from many headaches as it progresses.

Before you go! Things to consider for the next module:

How important do you feel it is to cite the raw data of a project? What tools and terms do you think you’d need to properly cite data? Do you feel confident in your ability to cite data? And if you don’t, what would make you more confident?