Remember that all data is gathered by people who make decisions about what to collect. A good way to evaluate a dataset is to look at the data's source. Generally, data from non-profit or governmental organizations is reliable. Data from private sources or data collection firms should be examined to determine its suitability for study. Here are some questions you can ask of a dataset:
The answers to these questions can often be found in data documentation or by web searching.
Ethical data use involves keeping an eye to privacy and reuse restrictions and interrogating how and why data was collected.
Data can include information that is potentially harmful if made public. For example, if a social scientist collects information from people addicted to drugs, and shares that information without appropriately anonymizing the dataset, that could affect someone's ability to get a loan, a job, or cause family issues. Ethical data use almost always include anonymizing data or limit these risks. Similarly, if reusing data that contains potentially harmful information, think about what you might be able to omit from your analysis to protect privacy.
Remember that data is only as good as its collection methods, and interrogate why data was collected in a certain way. Do you notice certain groups or factors are conspicuously missing? Could the data collection method have violated privacy?