Methods for Integrating Data from Multiple Sources
In the era of data science, it is common that data collected from multiple sources all provide useful information for answering the same scientific question. An analysis based on a single data source may yield biases in estimation or results that are not accurate enough. Integrating data from multiple sources becomes essential in order to pull together different pieces of information to draw more accurate conclusions and to make more insightful decisions. Many issues and challenges arise for data integration. Different sources usually generate data in different forms: some studies release the actual collected individual data whereas others only release aggregate data after the analysis. Study designs, data types, model assumptions and analysis results oftentimes vary across different studies, even for the same scientific problem. The sets of variables collected and the measurements for these variables may also differ considerably from study to study. The overarching goal of this research project is to develop methods to address some of these issues and challenges. These methods will provide rigorous and powerful tools for integrating data from multiple sources.