Gateway to Think Tanks
来源类型 | Working Paper |
规范类型 | 报告 |
DOI | 10.3386/w25825 |
来源ID | Working Paper 25825 |
Automated Linking of Historical Data | |
Ran Abramitzky; Leah Platt Boustan; Katherine Eriksson; James J. Feigenbaum; Santiago Pérez | |
发表日期 | 2019-05-13 |
出版年 | 2019 |
语种 | 英语 |
摘要 | The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods. |
主题 | Econometrics ; Data Collection ; History |
URL | https://www.nber.org/papers/w25825 |
来源智库 | National Bureau of Economic Research (United States) |
引用统计 | |
资源类型 | 智库出版物 |
条目标识符 | http://119.78.100.153/handle/2XGU8XDN/583497 |
推荐引用方式 GB/T 7714 | Ran Abramitzky,Leah Platt Boustan,Katherine Eriksson,et al. Automated Linking of Historical Data. 2019. |
条目包含的文件 | ||||||
文件名称/大小 | 资源类型 | 版本类型 | 开放类型 | 使用许可 | ||
w25825.pdf(1711KB) | 智库出版物 | 限制开放 | CC BY-NC-SA | 浏览 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。