Supplementary Material

Data Extract System Preview

Association of American Geographers - poster presentation

Presentations

TerraPop: Data Domains and Location-Based Integration (download .pdf)

 

Press Release

University of Minnesota receives $8M NSF grant for first-of-its-kind integration of global population and environment data
Terra Populus: A Global Population / Environment Data Network (TerraPop) will integrate the world’s largest population database with global-scale data on land use, land cover, climate change and more.

Contacts:
Catherine Fitch, Minnesota Population Center, fitch@umn.edu, (612) 626-3923
Todd Reubold, Institute on the Environment, reub0002@umn.edu, (612) 624-6140
Jeff Falk, University News Service, jfalk@umn.edu, (612) 626-1720

MINNEAPOLIS / ST. PAUL  – New research at the University of Minnesota will create new opportunities for understanding the relationship between population and the environment on a global scale.

The project – Terra Populus: A Global Population / Environment Data Network or TerraPop for short – was recently awarded a five-year, $8M grant from the National Science Foundation’s Office of Cyber Infrastructure. The Minnesota Population Center will lead the effort with support from the Institute on the Environment, the U of M Library and faculty from the College of Liberal Arts and College of Science and Engineering. Additional partners include the Center for International Earth Science Information Network at Columbia University and the Inter-university Consortium for Political and Social Research at the University of Michigan.

Terrapop will combine two centuries of census data from with global environmental data including land cover, land use and climate records. Beyond the goal of integrating this information into a common database, the team plans to disseminate the newly available data to researchers around the world.

Although a plethora of high quality environmental and population datasets are currently available, they are widely dispersed, have incompatible or inadequate documentation, and include incompatible geographic identifiers. Newly available population data closely integrated with data on the environment will more clearly describe the unfolding transformation of human and ecological systems.

“By creating a framework for locating, analyzing and visualizing the world's population and environment in time and space, TerraPop will provide unprecedented opportunities for investigating the agents of change, assessing their implications for human society and the environment, and developing policies to meet future challenges,” said Steven Ruggles, director of the Minnesota Population Center.

The organizations collaborating on TerraPop are uniquely qualified to undertake a project of this scale. The Minnesota Population Center is home to the largest collection of census data in the world, while the Institute on the Environment has one of the most extensive databases of global land use in the country. The University Libraries are leaders in digital preservation and data management. The Inter-university Consortium for Political and Social Research (ICPSR) is the world’s largest social science data archive, and the Center for International Earth Science Information Network (CIESIN) is a leading research and data center focused on human-environment interactions.

"This project represents a quantum leap in our ability to see and map the relationships between people and the environment at the global scale,” said Jon Foley, director of the Institute on the Environment. “And it represents an exciting new frontier of scientific collaboration -- between computer scientists, demographers and environmental scientists -- to break down our old disciplinary barriers and focus our collective energy on some of the world's biggest problems.”

TerraPop aims to accomplish four specific tasks over the coming years including:

  1. Collecting, preserving, integrating and describing datasets that measure changes in the world’s population and environment over the past two centuries
  2. Developing tools and procedures to manage and disseminate the data collections
  3. Carrying out education and outreach to engage the scientific community and the public and reach the broadest possible audience
  4. Establishing an organizational structure to ensure the long-term sustainability of the project.

The lead investigators from the University of Minnesota are Steven Ruggles (Minnesota Population Center), Jonathon Foley (Institute on the Environment), Victoria Interrante (Computer Science and Engineering), Wendy Pradt Lougee (University of Minnesota Libraries), Steven Manson (Geography), Jaideep Srivastava (Computer Science and Engineering) and Shashi Shekhar, (Computer Science and Engineering).

With this award, TerraPop will be an NSF Sustainable Digital Data Preservation and Access Network (DataNet) Partner. The DataNet initiative aims to provide reliable digital preservation, access, integration and analysis capabilities for science and/or engineering data over a decades-long timeline.

About the Minnesota Population Center

The Minnesota Population Center (MPC) is a University-wide interdisciplinary cooperative for demographic research. The MPC serves more than 80 faculty members and research scientists from eight colleges and institutes at the University of Minnesota. As a leading developer and disseminator of demographic data, MPC also serve a broader audience of some 50,000 demographic researchers worldwide. Learn more at www.pop.umn.edu.

 

TerraPop Reserch Examples

The following examples illustrate the types of research that will be facilitated by the TerraPop data access system. TerraPop will significantly reduce the amount of time scientists working on human-environment issues need to spend collecting, processing, and integrating data from a variety of sources.

Scenario 1: Explaining mortality and health outcomes at the district level in Ghana, Malawi, and Tanzania

Hypothesis/research question:
Health outcomes depend on local variations in the physical environment.

Research objective:
Demonstrate enhanced understanding of health outcomes related to environmental conditions at sub-national levels. Both physical and social environmental effects are crucial to the study as inequalities in health outcomes can be derived from disparities in access to natural resources as well as gaps in socioeconomic conditions.

Data required:

Variables Source Data Structure
Temperature WorldClim Raster
Rainfall WorldClim Raster
Elevation ASTER-GDEM Raster
Infant and child moratlity IPUMS-I Microdata
Age structure IPUMS-I Microdata
Literacy rate IPUMS-I Microdata
Household access to toilet IPUMS-I Microdata

Pre-TerraPop data processing steps:

  • Obtain district boundaries from SALB, GADM, Malawi National Statistical Office
  • Obtain temperature and rainfall data from WorldClim
  • Obtain elevation data from ASTER-GDEM
  • Obtain population data from IPUMS
  • Match district boundaries to IPUMS geography variables
  • Use ArcGIS to process environmental datasets
    • Extract relevant portions
    • Remove no data points and other artifacts
    • Calculate mean rainfall, temperature, and elevation by district
  • Use a statistical package to summarize IPUMS population data by district
  • Use ArcGIS to join all variables to boundary shapefiles

Obtaining data through TerraPop:

  • Select desired environmental and population variables
  • Choose area-level output
  • Select operations to summarize data to district level
  • Obtain extract of variables summarized by district

Approximate time saved by using TerraPop:
25 hours
 

 

Scenario 2: Modeling deforestation and agriculture in the Yucatan Peninsula of Mexico

Hypothesis/research question:
Where and why do farmers cut down tropical rainforest to plant agricultural crops?

Research objective:
Establish the social and environmental factors in farm-level decision making. The land change science community has identified a suite of these factors and has made one of its research goals finding out how these factors play out in specific places, times, contexts.

Data required:

Variables Source Data structure
Temperature Interpolated stations Raster
Rainfall Interpolated stations Raster
Elevation Interpolated topographic Vector and Raster
Socioeconomic/Demographic IPUMS Microdata
Land use/land cover Interpreted RS or GLC Raster

 

Pre-TerraPop data processing steps:

  • Obtain district boundaries from SALB, GADM, INEGI (Mexican census)
  • Obtain temperature and rainfall data from variety of sources, interpolate, and validate
  • Obtain elevation data from a variety of sources, interpolate, and validate
  • Obtain raw remotely sensed data, classify, and validate
  • Obtain population data from IPUMS
  • Match district boundaries to IPUMS geography variables
  • Use ArcGIS and Idrisi to process environmental datasets
    • Extract relevant portions
    • Remove no data points and other artifacts
    • Calculate mean rainfall, temperature, and elevation by district
    • Further validate and verify interpolations via cross-comparisons among data
  • Use a statistical package to summarize IPUMS population data by district
  • Use ArcGIS to join all variables to boundary shapefiles

Obtaining data through TerraPop:

  • Select desired environmental and population variables
  • Choose area-level output
  • Select operations to summarize data to district level
  • Obtain extract of variables summarized by district

Approximate time saved by using TerraPop:
100 hours base-case scenario; 200 additional hours if GLC data was substituted for remotely sensed data