IPUMS TERRA FAQ

General

Do I have to combine data types, or can I get just one type?

You do not have to combine data types. You can get an extract containing just microdata without attached contextual variables. You can also get an area-level extract containing only original area-level data, or a raster extract containing just original raster data. If you do not wish to include any data of a particular type, just use the Skip button in the step for selecting that type of data. (The only data selection step you cannot skip is selecting microdata for a microdata extract.)

Can I get IPUMS Terra data without having international microdata access?

Yes, area-level and raster data are accessible without international microdata access. In addition, U.S. microdata are public data and are available to all IPUMS Terra users. You can register for IPUMS Terra, leaving the “Check here if you would like access to Microdata” box unchecked. If you do request access to microdata, your request may take a day or two to be approved, but you can begin accessing U.S. microdata, area-level and raster data right away.

How do I suggest additional data for inclusion in IPUMS Terra?

Click the 'Provide feedback' link in the top section of the extract builder, or send e-mail to ipums@umn.edu.

Boundaries and Geographic Units

What is the difference between harmonized and year-specific geographic levels?

Harmonized geographic units have boundaries that are consistent over a specific time period. These units simplify spatio-temporal analysis by removing the complications of shifting boundaries. Units are harmonized by aggregating units that changed boundaries so that each harmonized unit represents the smallest possible consistent footprint. If a unit split into two or more smaller units, the smaller units would be recombined to create the harmonized unit. If a border between two units changes, the two units would be combined to create the harmonized unit.

If your research only requires one time step, year-specific units are available. Year-specific geographic units are the units associated with a particular census, though some units may be combined for confidentiality purposes. Year-specific geographic levels generally have finer geographic detail than harmonized geographic levels.

Both year-specific geographic levels and harmonized geographic levels (where applicable) are available for use with microdata and area-level data.

Why are some geographic units combined in a year-specific geographic level?

For confidentiality purposes, microdata records cannot be associated with a geographic unit that has a population of less than 20,000. To ensure that this doesn’t occur, IPUMS Terra merges units with fewer than 20,000 individuals with neighboring units until the resulting unit has more than 20,000 individuals. Units below the population threshold are merged with neighboring units that are most similar in population density. This process is called ‘regionalization’.

Why are some geographic units combined in a harmonized geographic level when I know they do not change during the given time period?

For confidentiality purposes, microdata records cannot be associated with a geographic unit that has a population of less than 20,000. To ensure that this doesn’t occur, IPUMS Terra merges units with fewer than 20,000 individuals with neighboring units until the resulting unit has more than 20,000 individuals. Units below the population threshold are merged with neighboring units that are most similar in population density. This process is called ‘regionalization’.

Microdata

What are microdata?

Microdata are individual records, with each record representing the responses of one person to the questions asked in a census. Individual records are organized into households, enabling the study of household relationships. Household-level variables (e.g., access to piped water) will have the same value for every person in the household.

Each microdata dataset consists of a sample of records from among all the individual responses to the census. Samples are drawn by household, so each household in the microdata includes all of the individuals within the household.

IPUMS Terra can attach area-level data and raster data summarized to area-level data to microdata records. For these attached contextual variables, the value will be the same for all individuals in a given geographic unit.

Why do I have to register separately for international microdata?

Microdata for IPUMS Terra are provided by IPUMS International. IPUMS International has its own usage license to which researchers wishing to use their data must agree. Because these are microdata, individual person records, the usage license includes security guidelines that are not usually included for area-level data. The user registration requirements are stipulated in the agreements made with the statistical agencies in our partner countries that provide the data.

What types of files will I get in my microdata extract?

Your extract will include:
.csv.gz: Compressed comma-separated value file containing person records with the variables you selected (including contextual variables)
.txt: Text file codebook describing your extract
.do, .sas, .sps: Syntax files for importing data into STATA, SAS, and SPSS, respectively
.xml: XML structured DDI metadata for your extract

_info.txt: Brief technical information about your extract
boundaries*.zip: (If you included area-level and/or raster data and checked the Include GIS boundary files box) Geographic boundary data for the geographic levels included in your extract. Each .zip will contain the files constituting a shapefile for one geographic level.

How do the microdata in IPUMS Terra differ from IPUMS International data?

The microdata in IPUMS Terra are obtained directly from IPUMS International, so the data are the same in both systems. IPUMS Terra includes only the integrated variables from IPUMS International. Unharmonized variables, applicable only to specific samples, are only available through IPUMS International. In addition, IPUMS Terra does not currently include the case selection options available in IPUMS International. When creating a microdata extract in IPUMS Terra, you will always receive all records in the dataset.

Area-Level Data

What are area-level data?

Area-level data describe places. These data are compiled statistics, such as number of males in a given geographic unit. Therefore, the unit of observation is the geographic unit (e.g. state, county), and not the individual males. Geographic units are grouped into geographic levels. The units in a geographic level cover the entire country. Units at finer geographic levels generally nest within units at coarser geographic levels. For example, in Mexico, municipalities are nested within states.

For countries participating in IPUMS International, the area-level data in IPUMS Terra are tabulated from the microdata. IPUMS International microdata are samples drawn from the full census. While individual records are weighted to be nationally representative, the weighted totals tabulated from the microdata sample may not match values published by the national statistical agency based on full count data. For countries not participating in IPUMS International, the area-level data in IPUMS Terra are published values.

IPUMS Terra can derive area-level data from raster data by summarizing the values of the raster cells that fall within each geographic unit. IPUMS Terra can also transform area-level data into raster data by distributing data about geographic units over the grid cells within each unit. Area-level data can also be attached to microdata as contextual variables describing the geographic units in which individuals and households are located.

What types of files will I get in my area-level extract?

Your extract will include:
.csv: Comma-separated value file containing geographic unit records with the variables you selected
.txt: Text file codebook describing your extract
.do, .sas, .sps: Syntax files for importing data into STATA, SAS, and SPSS, respectively
.xml: XML structured DDI metadata for your extract
boundaryfiles*.zip: (If you checked the Include GIS boundary files box) Geographic boundary data for the geographic levels included in your extract. Each .zip will contain the files constituting a shapefile for one geographic level. You can join the data in the CSV to the shapefile using the geographic unit codes.

How do U.S. area-level data in IPUMS Terra differ from NHGIS data?

Like the other IPUMS Terra area-level population data for countries in IPUMS-International, the U.S. area-level data are tabulated from microdata. NHGIS data are published aggregate data tables. IPUMS Terra includes the same set of tabulations for the U.S. as for other IPUMS International countries. NHGIS includes all tables published by the Census Bureau, and therefore has a much wider set of tables available. IPUMS Terra U.S. area-level data are available for states and PUMAs (public use microdata areas). NHGIS includes tables for many more geographic levels.

Raster Data

What are rasters?

Rasters are a matrix of cells (pixels) that represent a place on the earth. Each cell corresponds to a single data value. The value can represent something categorical, like land cover, or something numeric, like average annual precipitation. Within a single raster, each cell will have only one value. These values can be integers or float (decimal). Raster data may be derived from satellite imagery or interpolated from data collected at points such as weather stations.

Each cell within a raster is the same size (in the raster’s unit of measurement). The cell size is a raster’s spatial resolution. For example, a 30-meter raster is a raster whose cells are 30m x 30m, and a 1-kilometer raster has 1km x 1km cells. The smaller the cell size, the higher the spatial resolution. In rasters in geographic coordinate systems (lat/long), resolution is measured in portions of degrees of latitude and longitude (e.g. 0.5 degrees, 30 arc seconds). The actual surface area covered by each cell varies with latitude, as the distance covered by a degree of longitude approaches zero at the poles.

IPUMS Terra raster data come from academic, government, and other research organizations and are available as TIFFs.

IPUMS Terra can transform area-level data into raster data by distributing data about geographic units over the grid cells within each unit. IPUMS Terra can also derive area-level data from raster data by summarizing the values of the raster cells that fall within each geographic unit. Area-level data derived from rasters can be attached to microdata as contextual variables describing the geographic units in which individuals and households are located.

What types of files will I get in my raster extract?

Your extract will include:

.tiff: geoTIFF raster files. You will receive one geoTIFF for each unique combination of variable, country, and time point in your extract.
.tiff.xml: ISO-19115 metadata accompanying each geoTIFF
.txt: Text file codebook describing your extract

Data Transformation & Integration

What transformations can IPUMS Terra perform?

Attaching area-level & raster contextual variables to microdata - Area-level and raster-based contextual variables describe the geographic unit in which a person lives. Contextual variables can be drawn from area-level data (e.g. % unemployment in a district) or derived from raster data (e.g. % of district area with tree cover).
Raster to area-level - IPUMS Terra can summarize the values of the raster cells that fall within each geographic unit to create area-level data.
Area-level to raster - IPUMS Terra can transform area-level data into raster data by distributing data about geographic units over the grid cells within each unit.

How does IPUMS Terra attach area-level data and raster data to microdata?

Microdata records include codes identifying the geographic unit within which an individual lives and area-level data include similar codes identifying the unit described by the record. These codes can be used to attach area-level data to microdata and to connect both types of data to real-world locations through boundary data (shapefiles). To attach contextual variables derived from raster data, the raster data are first summarized to create area-level data, and the resulting area-level data are attached to the microdata using the geographic unit codes. The value for a contextual variable will be the same for all individuals in a given geographic unit.

How does IPUMS Terra transform rasters to area-level data?

IPUMS Terra overlays the boundaries of geographic unit on top of raster data like a cookie cutter to identify the raster cells that fall within the unit. The values of those cells are then summarized to generate a single value for the unit. Operations that can be used to summarize the cell values depend on the type of raster data. For continuous raster variables (e.g., temperature, precipitation), available operations are minimum, maximum, and mean. For categorical variables (e.g., land cover classifications), available operations are mode (most common class) and number of classes. For variables related to particular land cover/land use categories, like cropland, available operations are % area and total area.

How does IPUMS Terra transform area-level data to rasters?

IPUMS Terra overlays the boundaries of a geographic unit on a template of raster grid cells to identify the cells that fall within the unit. Each of those cells is then assigned a value based on the area-level variable value for the unit. IPUMS Terra currently uses a uniform distribution assumption, meaning that every grid cell in a given geographic unit is assigned the same value. For percentage or median-type variables, such as % unemployment, the value for the geographic unit is assigned to every cell in the unit. For count variables, like total population, the total count for the geographic unit is divided by the number of cells in the geographic unit and that value is assigned to each cell.

Supported By

National Science Foundation University of Minnesota