Visualization of Longitudinal Phenotypes in the Norwegian Mother and Child Cohort Study
Abstract
The Norwegian Mother and Child Cohort Study (MoBa) is a pregnancy cohort study with over 100,000 children enrolled. Data was gathered through questionnaires mailed to the mothers, but also in the form of biological samples where more than 15,000 trios (mother, father, and child) have been genotyped so far. Data collected by MoBa is sensitive and its access is therefore restricted to protect the privacy of the study participants. This can make it difficult (or even impossible) to access the data, not only for parents and the general public, but also for scientists and medical professionals. To solve this issue, it is necessary to provide access to the data in a manner that is high-resolution without compromising participant privacy. The MoBa data is multidimensional and contains longitudinal information on several phenotypes (such as height and weight) for the children, as well as data on certain variables for the parents. Based on the recorded variables, the MoBa cohort can be divided into various subgroups that can be studied separately or compared with each other. Furthermore, the genotyping data can be viewed at different scales: (i) genetic variants can be considered individually, (ii) in the context of their genomic location, or (iii) the entire genome can be considered as a whole. Finally, a good presentation of the data has to account for and take advantage of the complexity of the MoBa data. Hundreds of gigabytes of summary statistics can be generated from the genotyping data from MoBa. Depending on the use case, only a small subset of this data is relevant to present to a user at a given time point. In order to present these subsets to the user quickly upon request, a bioinformatics system that can find and dispatch data in a short amount of time must be implemented. This thesis demonstrates how the issues related to large-scale sensitive data access and dissemination can be solved through a publicly available web application able to handle the associated data volumes efficiently.