Almost everyone uses or creates data. We tend to assume all data is numeric, like baseball statistics or sales data, but data can also be text from qualitative research, images, points on a map, information about your research sources, or lab notes. Data is used by historians, social science researchers, biologists, data scientists, journalists, politicians, and more. Data looks different depending on the domain, but all should be managed and understood in the same way.
Wondering how to use datasets to enhance your research, or find data to make an infographic for your class? This guide will introduce you to data, including how to find it, how to prepare it, and how to analyze it.
Data, at its base level, is a representation of information from the real world. It is important to remember that data is collected by people who make decisions about what to include, omit, visualize, analyze, and present. As with any form of information, when encountering a claim backed up by data, consider the authority of the source to evaluate any claims made.
No! Much of the data we see on a regular basis is numeric (aka quantitative), but data can also be:
Different types of data appear in different fields of study, but many can be useful in multiple domains. For example, if you're a historian who usually works with text data, how might you incorporate geospatial data? Or, if you usually work with financial data, what kinds of text data could you incorporate in your research?
Data can generally be classified into two categories: structured and unstructured.
Just like it sounds, structured data is organized in some way. Structured data is commonly found in spreadsheet formats (file extensions like CSV, XLSX, or TSV). Spreadsheet data is also called tabular data. Other structured formats include HTML, XML, JSON, and others.
Unstructured data is different in that it is less organized than structured data. This can include groups of images (file extensions like JPG, PNG, or TIFF) or plain text files (file extensions like TXT). For example, if you have a JPG file of a data table, it may not be machine-readable, which limits its ability to be used for data analysis and visualization. Many computational methods require the data to be put into a structured format before analysis can be done, but some data analysis tools will do this step for you.