Skip to Main Content

Research Data Management

Why are file formats important?

The choice of the file format(s) in which you record, store, and transmit your data will have an large impact on the ability of others to use your data in the future.  Because of the rapid changes in technology, researchers should always consider the possibility for both hardware and software obsolescence. How will your data be read if the software used to produce it becomes unavailable?  A good DMP will take into account these possibilities and will list all of the software involved in the project and, if possible, plan to have that software stored along with the data.

Guides to File Formats

The following are links to various sites that will provide information on different types of file formats.

Open vs. Closed Formats

Formats can be either closed or open (also called free formats).  Open formats are free or low cost, widely distributed, and not tied to particular vendors.  They are managed by organizations where the specifications are documented and they can be used by anyone.

Closed formats are controlled by organizations and vendors where the specifications are unpublished.  A closed format is covered by copyright, trademark, or patents and covered with a variety of restrictions on use.  They usually require specific device or application to use them.

File formats for archiving

While you may wish to work in and your data using proprietary software such as Word or SPSS, you should archive your data in formats most likely to survive the test of time. File formats most likely to remain useable in the future share the following characteristics:

* complete and open documentation
* platform-independence
* non-proprietary (vendor-independent)
* no "lossy" or proprietary compression
* no embedded files, programs or scripts
* no full or partial encryption
* no password protection

Finally, consider using data formats that allow for re-use (e.g., .txt or .csv rather than .pdf). If you do store data in proprietary formats, be sure to document the software necessary to view the data. If data will be stored in one format during collection and analysis and then transferred to another format for preservation, consider documenting features that may be lost in data conversion such as system specific labels.

For more information, this table contains guidance on file formats recommended and accepted by the UK Data Service for data sharing, reuse and preservation.