Principal Investigator(s): Lee, James Z., Hong Kong University of Science and Technology. School of Humanities and Social Science; Campbell, Cameron D., University of California-Los Angeles. Department of Sociology, and California Center for Population Research
The China Multi-Generational Panel Dataset - Liaoning (CMGPD-LN) is drawn from the population registers compiled by the Imperial Household Agency (neiwufu) in Shengjing, currently the northeast Chinese province of Liaoning, between 1749 and 1909. It provides 1.5 million triennial observations of more than 260,000 residents from 698 communities. The population mainly consists of immigrants from North China who settled in rural Liaoning during the early eighteenth century, and their descendants. The data provide socioeconomic, demographic, and other characteristics for individuals, households, and communities, and record demographic outcomes such as marriage, fertility, and mortality. The data also record specific disabilities for a subset of adult males. Additionally, the collection includes monthly and annual grain price data, custom records for the city of Yingkou, as well as information regarding natural disasters, such as floods, droughts, and earthquakes. This dataset is unique among publicly available population databases because of its time span, volume, detail, and completeness of recording, and because it provides longitudinal data not just on individuals, but on their households, descent groups, and communities.
One or more files in this study are not available for download due to special restrictions ; consult the restrictions note to learn more. You can apply online for access to the data. A login is required to apply for access.
The Liaoning Basic File is available for download without restrictions. The Liaoning Restricted File requires a signed agreement before access. See ICPSR restricted data contract portal for information and instructions.
WARNING: Because this study has many datasets, the download all files option has been suppressed, and you will need to download one dataset at a time.
Lee, James Z., and Cameron D. Campbell. China Multi-Generational Panel Dataset, Liaoning (CMGPD-LN), 1749-1909. ICPSR27063-v10. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2014-07-10. http://doi.org/10.3886/ICPSR27063.v10
Persistent URL: http://doi.org/10.3886/ICPSR27063.v10
This study was funded by:
- United States Department of Health and Human Services. National Institutes of Health. Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01 HD057175-01A1)
Scope of Study
Subject Terms: agricultural production, agriculture, demographic characteristics, disabilities, disasters, eighteenth century, exports, family history, family structure, generations, historical data, households, immigrants, imports, municipalities, nineteenth century, occupational status, rural population
Smallest Geographic Unit: Chinese banners (8)
Date of Collection:
Data Types: administrative records data, census/enumeration data
Data Collection Notes:
There was no R file produced for the Kinship dataset due to a memory issue that could not be resolved. If you are an R user, it is recommended that you download the data in a different format and use the Foreign package to call in the data.
The documentation for parts of this study is currently being updated. The data is expected to be available again in June 2014.
The data are drawn from population registers.
In the Basic dataset, the DATASET variable was reformatted.
In the Basic dataset, the variable RELATIONSHIP contains unknown codes such as "ILLEGIBLE" where Chinese characters existed in the original population register data but were not legible. The variable RELATIONSHIP also contained blank values that were recoded to "MISSING".
In the Basic dataset, The variables WIFE_1_ID and WIFE_2_ID do not have value labels for the missing values -98 and -99, due to limitations with string variables in Stata.
All values of -98 "Not applicable", -99/-999 "Missing" are recognized as missing values in the ICPSR codebook.
Preparation of the CMGPD-LN dataset and documentation for public release via ICPSR DSDR was supported by the National Institute of Child Health and Development (NICHD), grant R01 HD057175-01A1 "Multi-Generation Family and Life History Panel Dataset", with funds from the American Recovery and Reinvestment Act.
In the Restricted dataset, the variable NAME contains numbers and non-alphanumeric characters such as asterisks (*) and bars (|). The asterisks represent Chinese characters that were illegible and could not be translated. The number of asterisks equals the number of illegible characters (two asterisks = two illegible characters). All other non-alphanumeric characters and numbers result from errors in transcription or from transformations between input systems. NAME may include invalid or incorrect romanizations as a result of such errors. NAME also contains unknown codes such as "Illegible" where the name was completely illegible and no determination could be made as to the number of Chinese characters. SURNAME_YIHU may contain errors propagated from the surname portion of NAME.
In the Restricted dataset, the variables NAME and SURNAME_YIHU contained blank values recoded to "Missing".
In the Analytic dataset, the variable UNIQUE_HH_ID does not have a value label for the missing value -98 in Stata, due to limitations with string variables in Stata.
For the Kinship dataset, automated counts of older and younger kin of a specific type are currently vulnerable to exceptional situations occurring in the linkage of individuals across the register. Under some circumstances, they may be coded as -1 when they should be 0. In this release, -1 may be safely recoded to 0. In the future, the code will be modified to handle the exceptional cases that lead to the erroneous counts.
For the Kinship dataset, the variables F_ID_1 through M_ID_4 do not have value labels for the missing value -99 in Stata, due to limitations with string variables in Stata.
The Disability and Position files are intended to be merged with the Analytic dataset using the variables DATASET, POSITION_CODE, and DISABILITY_CODE.
For the Disability and Position files, the variables CONDITION_PINYIN, POSITION_PINYIN, and POSITION_CORE have values that contain characters such as parenthesis, question marks, and box symbols. These reflect cases where the original information was illegible and the coder made their best guess, or where computer programs converted illegible characters to system default characters. Please see the tables in the Analytic Appendix for complete listings of Chinese disability and position codes.
Study Purpose: Possible applications of the dataset include the study of relationships between demographic behavior, family organization, and socioeconomic status across the life course and across generations, the influence of region and community on demographic outcomes, and development and assessment of quantitative methods for the analysis of complex longitudinal datasets.
Sample: The data are from 725 surviving triennial registers from 29 distinct populations. Each of the 29 register series corresponded to a specific rural population concentrated in a small number of neighboring villages. These populations were affiliated with the Eight Banner civil and military administration that the Qing state used to govern northeast China as well as some other parts of the country. 16 of the 29 populations are regular bannermen. In these populations adult males had generous allocations of land from the state, and in return paid an annual fixed tax to the Imperial Household Agency, and provided to the Imperial Household Agency such home products as homespun fabric and preserved meat, and/or such forest products as mushrooms. In addition, as regular bannermen they were liable for military service as artisans and soldiers which, while in theory an obligation, was actually an important source of personal revenue and therefore a political privilege. 8 of the 29 populations are special duty banner populations. As in the regular banner population, the adult males in the special duty banner populations also enjoyed state allocated land free of rent. These adult males were also assigned to provide special services, including collecting honey, raising bees, fishing, picking cotton, and tanning and dyeing. The remaining populations were a diverse mixture of estate banner and servile populations. The populations covered by the registers, like much of the population of rural Liaoning in the eighteenth and nineteenth centuries, were mostly descendants of Han Chinese settlers who came from Shandong and other nearby provinces in the late seventeenth and early eighteenth centuries in response to an effort by the Chinese state to repopulate the region.
Extent of Processing: ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection:
- Performed consistency checks.
- Created variable labels and/or value labels.
- Standardized missing values.
- Created online analysis version with question text.
- Performed recodes and/or calculated derived variables.
- Checked for undocumented or out-of-range codes.
Original ICPSR Release: 2010-06-22
- 2014-07-10 Releasing new study level documentation that contains the tables found in the appendix of the Analytic dataset codebook.
- 2014-06-10 The data and documentation have been updated following re-evaluation.
- 2014-01-29 Fixing variable format issues. Some variables that were supposed to be string were numeric in Parts 1 and 3.
- 2013-08-21 Question text was added to Parts 1, 3, 7, 8, 9, 10, and 11 in order to include additional information about the data's historical context, Chinese language terminology, and collection methods.
- 2012-11-27 The User Guide and Training Guide were updated.
- 2012-11-21 Parts 1 and 3 were updated, and parts 7 through 11 were added. Specifically, part 1 was updated to correct a problem with the "Zu-zhang" variable, and other variables associated with positions and statuses may have experienced minor changes. Part 3 was updated to add the "position_2_code" variable. The new parts added include the Position 2 Supplement data (Part 7), Monthly Grain Prices data (Part 8), Annual Grain Prices Data (Part 9), Yingkou Custom Records (Part 10), and Natural Disasters data (Part 11). Finally, the user guide was updated to reflect the aforementioned changes, and an additional documentation file ("Training Guide") was added.
- 2011-09-02 Two parts are being added, Disability and Position. This release will also include updates to the Analytic and Kinship files.
- 2011-06-27 The Analytic and Kinship datasets were added to the study, along with 2 tables regarding disabilities and positions.
- 2011-03-16 The User Guide has been updated.
- 2011-02-07 The User Guide has been updated, and the Restrictions field was updated.
- 2011-01-12 The Restricted dataset was added, along with a location translation table document.
- 2010-12-03 The user guide has been updated.
- 2010-10-01 An updated version of the user guide has been added, as well as an updated version of the ICPSR codebook.
- 2010-08-25 A User Guide has been added to this study.
- 2010-08-17 The study title, principal investigator information, summary, and sampling fields have been updated.
- List all ~60 citations associated with this study
- View citations for the entire series
Most Recent Publications
- Citations exports are provided above.
Export Study-level metadata (does not include variable-level metadata)
If you're looking for collection-level metadata rather than an individual metadata record, please visit our Metadata Records page.