University of Minnesota researchers collaborate with Ancestry.com to create the most comprehensive database of the 1940 Census
MINNEAPOLIS / ST. PAUL (04/02/2012) —A collaboration between the University of Minnesota and Ancestry.com will create the largest database of detailed information about people and their households ever made available for scientific research. The National Archives and Records Administration today released images of the enumeration manuscripts from the 1940 Census of Population. The Minnesota Population Center at the University of Minnesota will leverage a substantial investment by Ancestry.com in digitizing information on the entire population of the United States.
The database will include all of the information collected on the 132 million Americans recorded in the Census of 1940. The project will involve transcription of 7.8 billion keystrokes of data describing the demographic and economic characteristics of all individuals, families, households, and group quarters present in the United States in 1940. This database will be an extraordinary new resource for economists, demographers, geographers, epidemiologists, other social science and health researchers, and the general public.
Ancestry.com has extensive experience in converting historical census records into a searchable format. The company will oversee the keying of the 1940 census records and expects the data indexed will help answer important questions related to population and health.
"This joint project represents the largest single collaboration ever conducted between the genealogy and academic research communities,” said Dan Jones, vice president of Global Content for Ancestry.com. “We are proud of our relationship with the University of Minnesota and the many federal agencies who are contributing to this effort. It is a privilege to make what will be the most complete index of the 1940 Census freely available to researchers throughout the country."
The 1940 census was far richer and more detailed than any previous census. Many of the core concepts of today’s American Community Survey—such as educational attainment, migrations status, labor force status, wage and salary income, hours worked per week, weeks worked last year, and veteran status—made their first appearance on the 1940 census. The critical timing of the 1940 Census at the end of the depression and beginning of World War II will make this database an important baseline for studies of social and economic change in the twentieth century.
Capturing 100 percent of the U.S. population recorded in the census, the 1940 database will be significantly larger than any other census datasets created for social science and health research. These datasets normally only include a 1-10 percent sample of the population, and many studies are hindered by these small samples. The new database will allow much richer studies of small populations in 1940, such as Dust Bowl migrants to California, Native Americans, and working mothers with young children.
Researchers will also be able to link recent economic and health surveys and mortality records to the 1940 database. These linkages will allow researchers to study the impact of early life conditions—including socioeconomic status, parental education, and family structure—on later health and mortality. In addition to individual and family information, the database will provide contextual information on childhood neighborhood characteristics, labor-market conditions, and environmental conditions.
“Existing research has shown a powerful relationship between family financial well-being in childhood and health in later life,” said Steven Ruggles, director of the Minnesota Population Center. “With the 1940 data linked to recent surveys, researchers will be better able to test and understand this relationship.”
The data will be intensively used by thousands of scholars, and will form a permanent and substantial element of the nation’s statistical infrastructure. The impact of the microdata will be especially profound in the areas of aging, health, and population. According to Ruggles, “The 1940 data have the potential to transform our understanding of the effects of early life conditions on health and well-being, multigenerational mobility, the spatial organization of human activity across multiple scales, and the dramatic shifts in American demographic and economic behavior since the mid-twentieth century.”
All numerically-coded fields in the database will be made freely available to the scientific community and the public. Data and documentation will be distributed through the Integrated Public Use Microdata Series (IPUMS) data access system (www.ipums.org). The IPUMS data access system pioneered web-based distribution of large-scale datasets and the Minnesota Population Center continues to innovate at the cutting edge of information technology. The system offers capabilities for navigating database documentation, defining datasets, constructing customized variables that capitalize on the individual and household information in the census, and adding neighborhood information.
The project will be supported by grants from the National Science Foundation, the National Institute of Aging and the Eunice Kennedy Shriver Institute for Child Health and Human Development. The project also benefits from investments and support by the National Archives and Records Administration and the U.S. Census Bureau.