Vis enkel innførsel

dc.contributor.authorTveita, Sebastian Rojas
dc.date.accessioned2019-09-18T06:35:33Z
dc.date.available2019-09-18T06:35:33Z
dc.date.issued2019-06-29
dc.date.submitted2019-06-28T22:00:08Z
dc.identifier.urihttps://hdl.handle.net/1956/20860
dc.description.abstractWith machine learning rising to prominence over the last decades, a lot of companies are doing research on how it can be applied in their products or production. Some of these companies have used machine learning with a good deal of success. This thesis proposes a solution for integrating speaker recognition in a video-editing system. The proposed solution is a proof-of-concept pipeline that is hooked into a web-based video-editing system made by a company called Vizrt\cite{vrt}. This pipeline takes a video and performs speaker diarization and classification on the audio from the video. To achieve this, two types of models are applied; Gaussian Mixture Models to create a Universal Background Model, and models applying the i-vector approach for use in the clustering. From the results of the machine learning algorithms, the pipeline will produce timecodes that are sent to the video-editing system. These timecodes show where different speakers are talking. This information will be presented to the user in the system UI, where the user will have the option of correcting the results from the diarization and classification. The pipeline also adopts the results from the algorithms to provide further functionality. By using the generated timecodes, the pipeline is able to extract training data that is partitioned according to the speakers. This training data will be saved and can later be used to generate new models for the different speakers. These new models can be used in later runs through the pipeline to recognize known speakers and be improved by gathering more training data for the known speakers. The thesis shows how machine learning can be applied in a pipeline to partition an audio track without any prior trained model. Using this information could be time-saving in a video-editing process, or in the process of creating training data. The pipeline also has the potential to be expanded with further functionality. This would require the pipeline to be further integrated into the video-editing system.en_US
dc.language.isoeng
dc.publisherThe University of Bergenen_US
dc.rightsCopyright the Author. All rights reserved
dc.titleSpeaker recognition implemented as part of a video-editing system
dc.typeMaster thesis
dc.date.updated2019-06-28T22:00:08Z
dc.rights.holderCopyright the Author. All rights reserveden_US
dc.description.degreeMasteroppgåve i informatikken_US
dc.description.localcodeINF399
dc.description.localcodeMAMN-PROG
dc.description.localcodeMAMN-INF
dc.subject.nus754199
fs.subjectcodeINF399
fs.unitcode12-12-0


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel