Skip to Main Content Skip to footer content

IPUMS (census and survey data)

About IPUMS Data

IPUMS Data are harmonized across time and space.  Why is this important?  If you've worked with survey or census data, you might notice that variable names and responses will change between census/survey years, and depending on who is conducting the survey.  For example, the 1850 US Census queries if the respondent has been "married within the past year," while the 1940 US Census queries "Times married."  Even if the same question is on both surveys, such as birthplace, there might be different coded responses (or codes in general) due to a variety of reasons.  The same example questions might be on census for other countries, perhaps with slightly different responses.

IPUMS Data is harmonized to general codes, as much as possible, and integrates the data so it's easier to conduct research using data from different time periods and places.

Links from variables listed in the extract builder take you to a data dictionary which provides more information about the variable, such as description, availability of variables on datasets, questionnaire text, source variables, and available codes (and frequency).  Not all of these headings are available in all data collections.

Things You Might Ignore, But Shouldn't

The main page for each IPUMS data project (USA, CPS, International, etc) will have a left-side menu of links (15+) with headings Data, Documentation, Support, and Research (and sometimes Supplemental Data).  If you're like me, you will ignore those at first, and dive into the "Get Data" extract builder.  But, if you want to use the data, the FAQ and Documentation will help (such as describing the weights used in each sample, if there are tricky aspects of the data, major limitations, etc).

A Quick Introduction to IPUMS (2020)