Evaluation of incomplete maternal smoking data using machine learning algorithms: a study from the Medical Birth Registry of Norway
Journal article, Peer reviewed
MetadataShow full item record
Original versionBMC Pregnancy and Childbirth. 2020, 20, 710. 10.1186/s12884-020-03384-y
Background The Medical Birth Registry of Norway (MBRN) provides national coverage of all births. While retrieval of most of the information in the birth records is mandatory, mothers may refrain to provide information on her smoking status. The proportion of women with unknown smoking status varied greatly over time, between hospitals, and by demographic groups. We investigated if incomplete data on smoking in the MBRN may have contributed to a biased smoking prevalence. Methods In a study population of all 904,982 viable and singleton births during 1999–2014, we investigated main predictor variables influencing the unknown smoking status of the mothers’ using linear multivariable regression. Thereafter, we applied machine learning to predict annual smoking prevalence (95% CI) in the same group of unknown smoking status, assuming missing-not-at-random. Results Overall, the proportion of women with unknown smoking status was 14.4%. Compared to the Nordic country region of origin, women from Europe outside the Nordic region had 15% (95% CI 12–17%) increased adjusted risk to have unknown smoking status. Correspondingly, the increased risks for women from Asia was 17% (95% CI 15–19%) and Africa 26% (95% CI 23–29%). The most important machine learning prediction variables regarding maternal smoking were education, ethnic background, marital status and birth weight. We estimated a change from the annual observed smoking prevalence among the women with known smoking status in the range of − 5.5 to 1.1% when combining observed and predicted smoking prevalence. Conclusion The predicted total smoking prevalence was only marginally modified compared to the observed prevalence in the group with known smoking status. This implies that MBRN-data may be trusted for health surveillance and research.