When you begin an empirical research project, you should attempt to determine whether the data or statistics you need for your study have already been compiled. The terms "data" and "statistics" are often used interchangeably, but they are not really the same. Data is the raw information from which statistics are created. Datasets are files containing raw data that can be loaded into software like Excel, SAS, or SPSS for analysis. Statistics are the result of analyzing and interpreting data.
If you need statistics (the result of analyzing and interpreting data), you may find a source by using the reference book Statistics Sources or searching the database Statista. You may also find references to compiled and analyzed statistics by searching for articles, as discussed in our Articles for Legal and Non-Legal Research Guide. You should also try thinking about who might already produce the type of statistics you want. Maybe there is a relevant government agency, non-governmental organization, think tank, or trade organization that publishes its statistics?
Datasets are often compiled in the context of original empirical research that is reported in scholarly articles. Research articles related to your topic of interest may include a "data availability statement" that tells where and how the data can be accessed. The datasets analyzed in relevant articles may have been archived for future analysis and reuse in one of several Research Data Archives. You should also try thinking about who might already collect the type of data you want: Maybe there is a relevant government agency, non-governmental organization, think tank, or trade organization that makes its data available?
If you find an existing dataset on your topic, you still need to evaluate the dataset before you use it. Who collected the data, and why? What variables were observed, and how were they defined? What level(s) of data aggregation are provided? (E.g., are the data reported at the level of the individual, family, county, state, country?) Have the researchers who collected the data provided enough information about the collection process for you reproduce it? Look for documentation of the procedures used to collect and code the data. If you cannot find documentation sufficient to allow you to reproduce the data collection process if you wanted to do so, or if variables were defined differently than you would define them, you should probably not rely on that data.
The data sources listed in this guide (see the list of subjects on the left) are necessarily limited. For help finding data-producing organizations not listed here, please consult the following resources: