Services
Good data deserves to be remembered. The best way to keep data from being lost is to make sure it is properly archived. Many important data have been lost over time as storage devices become corrupted and fail, are lost, or are destroyed.
Research data repositories
Wherever possible, the best way to archive important research data is to upload it to a reputable data repository. When data is in your custody as a researcher, you often have to pay for storage and maintain appropriate regular backups to ensure files are not corrupted. Depositing in a research data repository means the repository curators take care of your dataset...so you don't have to! If you're depositing data, you can also easily make it openly available or share it with restrictions. Even if your dataset is too sensitive to be published openly, it can often be de-identified and deposited under restricted access to archive it.
Careful long-term storage
However, sometimes data is so sensitive it can't be shared publicly ever, even in a de-identified format. Even if you're not able to share or publish data openly, it's important to make sure data is carefully archived once a project is complete. You don't want to have a dusty hard drive somewhere full of disorganized files that is confusing when you open it up again. Hard drives and external storage devices also have lifetimes and will eventually fail and aren't suitable for long term archival. Our Research Data Storage Finder Tool has an overview of storage possibilities.
Before archiving data for long term storage, there are some preparations to make to your files:
- Ensure that the data is stored in sustainable, open file formats. Open file formats help ensure access to your data over the long term as proprietary software can disappear when companies go out of business.
- Clean up your files. Delete any files that are no longer relevant. Has anyone had their parents give them the entire contents of their childhood bedrooms? Make sure this doesn’t happen with your datasets.
- Organize your files into understandable file names and folders to create an archive-ready package. Research data should be packaged with metadata or ‘data about data’. These files help your future self or another researcher unpack and understand your files and how they were created. Data documentation is commonly included in readme files, codebooks, or data dictionaries. Metadata should include information about:
- File title, file format, language, creator, and date
- Data variable descriptions, including data type, allowable values, and calculations used (if applicable)
- Determine what data goes where. Datasets can be archived in a long-term data repository such as Dataverse, along with documentation and any software or code developed for the project (including statistical analyses). Text and media files that comprise a qualitative dataset can also be deposited in Dataverse. Non-data text (such as manuscripts, questionnaires, etc.) and media files (like figures and tables, photos, conference presentations, etc.) can be archived in MacSphere. Other information like consent forms, administrative documents, and duplicate files can be kept as needed on your McMaster research data storage platform until you’re ready to delete them. Both MacSphere and Dataverse will provide DOIs for the material deposited, so if one project spans both repositories, make sure to cross-reference!