Research data is regularly lost over time. Research data can be lost through accidental deletion, data storage breakdown, or simply through being unable to remember what the data records mean. Even when data is not lost, trying to locate old data files can be a nightmare when files are not clearly organized, labeled, and kept with the proper metadata. Research projects often extend over multiple years, even decades, and coming back to try to figure out some decade old data can be a massive challenge.
Good Research Data Management practices help researchers by making the research process easier. By implementing good data management, researchers will be at lower risk of losing data, and spend less time repeatedly deciphering old data. They also benefit from increased recognition, and increased research impact through data sharing. Researchers may also be required to do specific data management practices by funders or journals.
In addition, good research data management helps the whole research community and the public through open data sharing. Data sharing increases collaboration, aids in scientific validation, verification, and reproducibility. Open data sharing also recognizes that research data is a publicly supported good, and allows for accessibility to research by the general public.
The Portage Network, a Canadian national network of RDM professionals, has created a great document A Brief Guide: Research Data Management. The key recommendations they give to researchers are:
- Save the Raw data separately from the data you’re working on
- Backup your data – see the 3-2-1 rule in How do I back up my data?
- Describe your data – include metadata that describes all the data you produce
- Preserve and share your data – upload final data to a data repository – see What do I do with all this data now that my project is over? Should I share my data?
Many research funders worldwide including NSF and NIH (starting 2023) in the USA, and the Wellcome Trust and others in the UK require researchers to submit data management plans as part of their funding applications. The Tri-Agencies will begin to require DMPs in Spring 2022 for some grants, with requirements expanding over time.
Even if a funder does not require a DMP, it is useful for the researcher to organize all the information about research project data in a specific place. This gives them a document to refer to in the future as the project is ongoing. See our Data Management Planning page for more details.
Where appropriate, we encourage researchers to openly share their research data so that it can be re-used. Many scientific journals now require data (and software code) to be openly shared. The main benefits for researchers to share their data are:
- Increased collaboration
- Increased confidence in research results (since others can verify analyses)
- Increased recognition and citation of datasets
- Contributing to a culture where other researchers will be more likely to share data they may find useful
We encourage researchers to deposit their data in a domain-specific repository if one exists for their field. Otherwise, we run a data repository called McMaster Dataverse where researchers can upload their data for others to use. See our pages on Publishing Data and on Archiving Data for more details. If you have questions about what is appropriate to deposit in a data repository, or want help getting started with Dataverse or another data repository, contact us at rdm@mcmaster.ca.
A simple rule is the 3-2-1 rule: save at least 3 copies of the data on 2 different storage devices or platforms, with 1 copy off site. Some storage services including OneDrive and MacDrive provide automatic backup of data and the ability to look at previous versions of a file. See our page on Data Storage & Backups for more.
Yes, working with Indigenous data raises many concerns over data management, resulting from the historically problematic relationship between Indigenous Peoples and researchers, academics, and other data collectors.
The Tri-Agency RDM Policy states:
In line with the concept of Indigenous self-determination and in an effort to support Indigenous communities to conduct research and partner with the broader research community, the agencies recognize that data related to research by and with the First Nations, Métis, or Inuit whose traditional and ancestral territories are in Canada must be managed in accordance with data management principles developed and approved by these communities, and on the basis of free, prior and informed consent. This includes, but is not limited to, considerations of Indigenous data sovereignty, as well as data collection, ownership, protection, use, and sharing.
The principles of Ownership, Control, Access and Possession (OCAP®) are one model for First Nations data governance, but this model does not necessarily respond to the needs and values of distinct First Nations, Métis, and Inuit communities, collectives and organizations. The agencies recognize that a distinctions-based approach is needed to ensure that the unique rights, interests and circumstances of the First Nations, Métis and Inuit are acknowledged, affirmed, and implemented.
The First Nations Information Governance Centre provides the OCAP guiding principles for researchers. The principles are Ownership, Control, Access, and Possession. Other principles to consider are the CARE principles for Indigenous Data Governance.
There are 3 main survey software options we recommend: Microsoft Forms, LimeSurvey, and REDCap. Microsoft Forms is useful for simple one-off forms and is supported as part of Office365 on campus. LimeSurvey is an excellent tool for more complicated surveys with complex and branching conditions. LimeSurvey is supported campus-wide through RHPCS. REDCap is designed for longitudinal studies where participants may need to fill out multiple surveys or the same survey multiple times. Computer Services Unit (CSU) in the Faculty of Health Sciences (FHS) coordinates REDCap for FHS and can provide more information for researchers in other faculties. Qualtrics is also available through Spark in the Faculty of Social Sciences. We do not recommend the use of other publicly available survey tools like SurveyMonkey or Google Forms.
For a more in depth comparison of survey tools, take a look at the Survey Software Options Matrix developed by RHPCS.
You can deposit your data in Dataverse as soon as it is complete and you have ethics approval! All that we will ask is that you add the PID or Handle of your thesis once the document is submitted to our institutional publication repository, MacSphere.
If you submit your publication first, you can add the PID at the time of submission. If you submit your data first, you can update your dataset with the publication Handle once this has been minted.
Once your data deposit is published, it will be assigned a DOI which you can connect in MacSphere as related data.
When you deposit data in Dataverse, you can add a set of files as a dataset. You will have one DOI for the whole dataset, but it can have many files included within it.