Skip to Main Content

Research Data Management (RDM)

This LibGuide provides guidance on research data management (RDM) during the reseach lifecycle.

File formats

Proprietary (closed) file formats

Proprietary file formats are files that you can only open with the specific software that they were created with, e.g. Microsoft Word. Proprietary formats can provide rich highly-specified functionality, but may limit the usability of your data and be high-risk in the long-term, as they are commercial products, available under license only and prone to obsolescence.

Your file format influences your and other researchers' ability to open a file at a later stage. Non-proprietary, or open, formats are more inter-operable and thus more durable.

 If it is only possible for you to save your data in a proprietary file format, consider providing the following information in an accompanying readme.txt file for future users:

  • Software name
  • Software version
  • Parent company of the software

 

Open file formats

Open formats may lack rich functionality and be more generic, but they have the following advantages:

  • Provide high usability
  • Carry a low risk over the long term because there are no licenses fees
  • Their specifications are publicly available
  • They can be rendered by multiple software packages

For long-term preservation, where possible, you should store data in open or widely-used formats, and plan for conversion from proprietary formats where necessary. 

For more in-depth discussion, see the Library of Congress’ Sustainability of Digital Formats web site.

Mate Type of data

Type of data

Recommended formats

Text
  • Plain text (.txt)
  • Portable Document Format (.pdf)
  • LaTeX documents (.tex)
  • Hypertext Markup Language (.html)
  • Open Document Format (.odt)
  • Extensible Markup Language (.xml)
Tables, spreadsheets, and databases
  • Tab-separated tables (.txt — sometimes .tsv or .tab)
  • Comma-separated tables (.csv or .txt)
  • Other standard delimiter (e.g. colon, pipe)
  • Fixed-width
  • OpenDocument Spreadsheet (.ods)
  • OpenDocument Database (.odb)
Image Files
  • TIFF (.tiff or .tif)
  • JPEG (.jpg or .jp2)
  • Portable Network Graphics (.png)
  • Scalable Vector Graphics (.svg)
  • Portable Document Format (.pdf)
  • Graphics Interchange Format (.gif)
  • Microsoft Windows Bitmap Format (.bmp)
Sound Files
  • WAVE (.wav)
  • FLAC (.flac)
  • MPEG-3 (.mp3 — usually suitable for human voice and moderate-quality audio, but may not be suitable for high-fidelity audio)
  • Audio Interchange File Format (.aiff)
Video Files
  • MPEG-4 (.mp4)
  • Material Exchange Format (.mxf)
Databases
  • Extensible Markup Language (.xml)
  • Comma-separated tables (.csv)
Geospatial Data
  • Geo-Referenced TIFF (.tiff)
  • ESRI Shapefile (.shp, .shx, .dbf)
  • Keyhole Markup Language (.kml)
  • Network Common Data Format (.nc)
Web Data
  • Javascript Object Notation (.json)
  • Extensible Markup Language (.xml)
  • Hypertext Markup Language (.html)
Web Archive
  • WebARChive (.warc)
Multidimensional Arrays
  • Common Data Format (.cdf)
  • Network Common Data Format (.nc)
  • Hierarchical Data Format (usually .hdf or .h5)
E-books
  • Electronic Publication (.epub)

Source: Ohio State University. University libraries. 2022. Research data management: Best practises. 

https://guides.osu.edu/c.php?g=707751&p=5027409

 

NMU Library Website       Connect with us on: FaceBook   YouTube

G-GVBBM8RVQV