Features impacting the mesopelagic layer in the ocean: a machine learning-based approach
Master thesis
View/ Open
Date
2022-09-01Metadata
Show full item recordCollections
- Master theses [218]
Abstract
Context: Recently the United Nations proclaimed a Decade of Ocean Science for Sustainable Development (2021–2030) due to threats to the productivity and health of the ocean due to human impact. The One Ocean Expedition (OOE), a circumnavigation of the world by the Norwegian tall ship Statsraad Lehmkuhl, is part of the Ocean Decade, intending to create attention and share knowledge around the crucial role of the ocean for sustainable development. There is little knowledge about the deep sea. Still, the amount and type of organisms here could be an essential factor in predicting global carbon dynamics and the effects of climate change, as well as food safety in the coming years. In marine science, the process of turning data into knowledge has long been manual or semi-manual, and automated processes are necessary for scaling up monitoring programs and making use of the extensive amount of data collected. Research goal: In this thesis, the aim is to investigate the use of machine learning in predicting marine biomass and discovering possible correlations between biogeochemical or physical factors and the biomass of the mesopelagic zone (200-1,000 m depth) using data collected during the OOE. This will eventually lead to more knowledge about the workings of the organisms in the mesopelagic layer and the use of machine learning in marine science. Methodology: The methods used in this thesis follow the paradigm of Design Science, where the aim is to answer questions relevant to human problems via the creation of artifacts, thereby contributing new knowledge to both the fields of marine science and data science. Results: The findings show that it is possible to predict mesopelagic biomass with reasonable accuracy using tree-based algorithms such as random forest, which may be further enhanced using historical data. Different correlations between biomass and various biogeochemical or physical factors based on geographical area are discovered using feature importance calculated using random forests.