Gateway to Think Tanks
来源类型 | Working Paper |
规范类型 | 报告 |
DOI | 10.3386/w24324 |
来源ID | Working Paper 24324 |
Linking Individuals Across Historical Sources: a Fully Automated Approach | |
Ran Abramitzky; Roy Mill; Santiago Pérez | |
发表日期 | 2018-02-19 |
出版年 | 2018 |
语种 | 英语 |
摘要 | Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these estimated probabilities to choose which records to use in the analysis. In the second part of the paper, we apply the method to link historical population censuses in the US and Norway, and use these samples to estimate measures of intergenerational occupational mobility. The estimates using our method are remarkably similar to the ones using IPUMS’, which relies on hand linking to create a training sample. We created an R code and a Stata command that implement this method. |
主题 | Econometrics ; Estimation Methods ; Labor Economics ; Demography and Aging ; History |
URL | https://www.nber.org/papers/w24324 |
来源智库 | National Bureau of Economic Research (United States) |
引用统计 | |
资源类型 | 智库出版物 |
条目标识符 | http://119.78.100.153/handle/2XGU8XDN/581997 |
推荐引用方式 GB/T 7714 | Ran Abramitzky,Roy Mill,Santiago Pérez. Linking Individuals Across Historical Sources: a Fully Automated Approach. 2018. |
条目包含的文件 | ||||||
文件名称/大小 | 资源类型 | 版本类型 | 开放类型 | 使用许可 | ||
w24324.pdf(671KB) | 智库出版物 | 限制开放 | CC BY-NC-SA | 浏览 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。