In previous modules data sharing has been strongly emphasized, this is in part because our world is increasingly moving towards a data sharing culture. This culture exists because of the increased popularity of open access repositories, Tri-Agency grants for open data, and better indexing and machine reading capabilities for finding data. Data sharing grows year after year. This module will detail some of the advantages and concerns you might have!
There are valid concerns about data sharing, including data being used and uncredited to gain either a scholastic or economic advantage. For instance, someone getting ahead in their academic career for work they didn’t do, or AI using datasets for training without crediting you for the work to collect and curate that data.
There are also ethical concerns about sharing data from the people/places that the data is about. For instance, if you have a dataset of medical data, the data getting out could cause embarrassment at minimum to harassment or worse for the people whose data you collected. Medical data getting out is also a huge breach of privacy, and organizations that are associated with breaches of privacy lose the trust of their clients and researchers, so making sure data is shared properly is incredibly important. These ethical considerations should be clearly defined in an ethics review, and should be clearly laid out in your DMP. An example with ocean context is that within the ocean industry there is a profit motive for not sharing data about where the biggest catch is taken. If everyone knew where the most abundant fishing spot was, certain industries could lose out on money!
Important things to remember:
- Anonymize your data if it is required for your project
- Make sure the repository or journal you are submitting to has ethical sharing requirements listed.
How to do each of those things:
When anonymizing data, target data that could allow someone to figure out who the data is about. So, for example removing last names, and changing the first name to just the first letter followed by a unique number string would be one way of anonymizing name data. Making locations or workplaces as generalized as possible are also options. Instead of saying, ‘this participant who lived in Halifax’ you could say ‘this participant who was from the Maritime’s region of Canada’ would allow some specificity but not enough to identify someone. Similarly, instead of saying ‘McDonald’s’ say, ‘a fast food chain restaurant in Canada’ if workplaces are mentioned.
A critical step early in the anonymization process is thinking about who will be the person who has the keys to the data and keep it safe. If you are in a university environment there is likely already a procedure in place for the safekeeping of the data.
Anonymization is usually not as concerning within the ocean sector, because most of the data does not concern people, but there are a few instances where it is relevant to remember: particularly around endangered ocean species. Anonymizing locations and migration patterns of these species can prevent poaching.
Anonymize your own data! Take your name, home address, workplace and your school if you go to one and think about how you would anonymize each piece of information while still keeping it useful in a dataset.
Another aspect of data sharing concerns the principles of OCAP, and the process of CARE which is complementary to the FAIR process for open data.
OCAP and CARE
OCAP and CARE concern Indigenous Data Sovereignty. Historically, many Indigenous communities have not been included in decisions about how their important and personal data is collected, used and shared. The First Nations principles of OCAP- ownership, control, access, and possession – assert that First Nations have control over data collection processes, and that they own and control how this information can be used. The CARE Principles for Indigenous Data Governance are important when you work with Indigenous data. CAREs helps to advance Indigenous self-determination and innovation. Indigenous communities want to ensure that their data is not used in a way that they consider harmful; they are seeking “greater control over the application and use of Indigenous data.” See more info here.
OCAP stands for:
“Ownership refers to the relationship of First Nations to their cultural knowledge, data, and information. This principle states that a community or group owns information collectively in the same way that an individual owns his or her personal information.
Control affirms that First Nations, their communities, and representative bodies are within their rights to seek control over all aspects of research and information management processes that impact them. First Nations control of research can include all stages of a particular research project-from start to finish. The principle extends to the control of resources and review processes, the planning process, management of the information and so on.
Access refers to the fact that First Nations must have access to information and data about themselves and their communities regardless of where it is held. The principle of access also refers to the right of First Nations’ communities and organizations to manage and make decisions regarding access to their collective information. This may be achieved, in practice, through standardized, formal protocols.
Possession While ownership identifies the relationship between a people and their information in principle, possession or stewardship is more concrete: it refers to the physical control of data. Possession is the mechanism by which ownership can be asserted and protected.”
and “asserts that First Nations alone have control over data collection processes in their communities, and that they own and control how this information can be stored, interpreted, used, or shared.”
CARE stands for:
Collective Benefit: The data should have benefit to Indigenous peoples.
Authority to Control: Indigenous communities have control about how their data is used and represented. This includes withdrawing consent at any time should a project take a turn that they think will cause harm.
Responsibility: When working with Indigenous data it must be proven to help Indigenous self-determination. Resources must be provided that allows the Indigenous community full access to the data in the project, including translation to their native language, specific Indigenous repositories and storage methods, and open access to all Indigenous members of the community.
Ethics: Indigenous rights should be the first and primary concern across all facets of the research project.
Why might Indigenous communities want control over what is shared? Indigenous communities want to make sure that knowledge that is sacred and intra-community, stays within their stewardship. For this reason, the CARE principles were created!
Keeping the CARE principles in mind for sharing and working with Indigenous data will help create a more open and inclusive data culture!
See our next module on data licensing to learn more about how to share your data!
Before you go! Things to consider for the next module:
Can you think of ways people might accidentally be discovered by connecting different pieces of data in a dataset together? How would you go about making sure that those connections are harder to make?