Dr. Mauricio Sadinle, an assistant professor of biostatistics from the University of Washington School of Public Health, received a two-year, $150,000 grant from the National Science Foundation (NSF) to develop tools to identify and link information on individuals who appear in different datasets. The methodologies will allow researchers to confidently combine pre-existing files to conduct powerful, larger scale analyses.
“It is increasingly common to find complementary information on individuals scattered across multiple data sources,” Dr. Sadinle said. For example, a study participant may have health outcome data in one file and biological measures in another. In other cases, researchers might seek to combine datasets and need to identify individuals who appear in both to avoid skewing their analysis.
“To take full advantage of such data sources, researchers need to be able to link information on the same individuals,” he said. “In many applications, however, there are no unique identifiers of the individuals in the datafiles,” and alternative methods of matching individuals, such as by birthday and name, are prone to uncertainty. The research will help to reduce the uncertainty in the process of linking records while also developing methodologies to account for remaining uncertainty in statistical analyses that utilize the linked data.Friday Letter Submission, Publish on September 06