Skip to Main Content

Working with Data

What kind of data are you looking for?

Data or Statistics?

When using data in your research, think about whether you need data or statistics. Data is what you would use to run your own analyses or create your own visualizations. For example, exporting data from your Fitbit or Apple Watch to see when you are most active. Statistics are already summarized or analyzed in some way. A chart in your Fitbit app showing your average steps per day is an example of statistics. This difference becomes important when choosing a path towards data analysis. 

Structured and unstructured data 

Remember the two general classifications of data: structured and unstructured. Spreadsheets, an example of structured data, are suitable in many instances. Your method of analysis or data visualization will determine what type of data you need. 

Interoperable data formats

Interoperable data formats refer to data that is usable across different research tools or platforms. Proprietary file formats specific to a certain software program may not work with other programs. 

With spreadsheet data, always try to get a CSV (comma separated value) format. Avoid proprietary file formats like Excel (file extension .xls or .xlsx).

With unstructured text data, always try to get a plain text (file extension .txt). Avoid proprietary file formats like Word (file extension .doc or .docx). PDF files are sometimes not machine-readable, but this can sometimes be fixed in a PDF editor like Adobe Acrobat. 

Tracking your Data Journey

When working on a data project, keeping track of your research and analysis process is crucial. Not only does it help others understand how you arrived at your conclusion, it helps your future self remember what worked well, what didn't, and what changes you made yourself. Key questions to keep in mind are:

  • Where did you find this data? How did you find it (i.e., what search interface? What search terms and parameters?)
  • Did you make any changes to the data? Did you omit or add fields? Did you run statistics on it? 
    • How and why did you do those things?
  • What else would a future you, or someone else, need to know to do a similar project or reuse your data? 

Your actual documentation method is whatever works best for you. What are you most likely to do after a few stressful hours of research? For some, that's a Google doc with messy notes, a research notebook, or another spreadsheet or structured list of data sources. Just make sure it is easily findable and saved in multiple locations.Graphical representation of data journey questions

The questions above can be broken down into four basic steps of data analysis: gathering data, processing data, analyzing data, and sharing data. When gathering data, you'll answer "how did you find it?". When preparing data, you'll answer "how did you change it?". When analyzing data, you'll answer "how did you analyze it"?. When sharing data, you'll answer "what would others needs to know to re-use it?".