Efforts towards accessible and reliable bioinformatics
Original version
https://doi.org/10.5281/zenodo.33715Abstract
The aim of the presented work was contributing to making scientific computing more accessible, reliable, and thus more efficient for researchers, primarily computational biologists and molecular biologists. Many approaches are possible and necessary towards these goals, and many layers need to be tackled, in collaborative community efforts with well-defined scope. As diverse components are necessary for the accessible and reliable bioinformatics scenario, our work focussed in particular on the following: In the BioXSD project, we aimed at developing an XML-Schema-based data format compatible with Web services and programmatic libraries, that is expressive enough to be usable as a common, canonical data model that serves tools, libraries, and users with convenient data interoperability. The EDAM ontology aimed at enumerating and organising concepts within bioinformatics, including operations and types of data. EDAM can be helpful in documenting and categorising bioinformatics resources using a standard “vocabulary”, enabling users to find respective resources and choose the right tools. The eSysbio project explored ways of developing a workbench for collaborative data analysis, accessible in various ways for users with various tasks and expertise. We aimed at utilising the World-Wide-Web and industrial standards, in order to increase compatibility and maintainability, and foster shared effort. In addition to these three main contributions that I have been involved in, I present a comprehensive but non-exhaustive research into the various previous and contemporary efforts and approaches to the broad topic of integrative bioinformatics, in particular with respect to bioinformatics software and services. I also mention some closely related efforts that I have been involved in. The thesis is organised as follows: In the Background chapter, the field is presented, with various approaches and existing efforts. Summary of results summarises the contributions of my enclosed projects – the BioXSD data format, the EDAM ontology, and the eSysbio workbench prototype – to the broad topics of the thesis. The Discussion chapter presents further considerations and current work, and concludes the discussed contributions with alternative and future perspectives. In the printed version, the three articles that are part of this thesis, are attached after the Discussion and References. In the electronic version (in PDF), the main thesis is optimised for reading on a screen, with clickable cross-references (e.g. from citations in the text to the list of References) and hyperlinks (e.g. for URLs and most References). A PDF viewer with “back“ function is recommended.
Has parts
Paper I: Matúš Kalaš, Pål Puntervoll, Alexandre Joseph, Edita Bartaševičiūtė (now Karosiene), Armin Töpfer, Prabakar Venkataraman, Steve Pettifer, Jan Christian Bryne, Jon Ison, Christophe Blanchet, Kristoffer Rapacki, and Inge Jonassen (2010). BioXSD: the common data-exchange format for everyday bioinformatics web services. Bioinformatics, 26(18): i540–i546. 10.1093/bioinformatics/btq391. The article is available at: http://hdl.handle.net/1956/10660.Paper II: Jon Ison, Matúš Kalaš, Inge Jonassen, Dan Bolser, Mahmut Uludag, Hamish McWilliam, James Malone, Rodrigo Lopez, Steve Pettifer, and Peter Rice (2013). EDAM: An ontology of bioinformatics operations, types of data and identifiers, topics, and formats. Bioinformatics, 29(10): 1325–1332. 10.1093/bioinformatics/btt113. The article is available at: http://hdl.handle.net/1956/10659.
Paper III: Kidane Tekle, Håkon Sagehaug, Prabakar Venkataraman, Armin Töpfer, Matúš Kalaš, Paweł Sztromwasser, Anne-Kristin Stavrum, Siv Midtun Hollup, Michael Dondrup, Sattanathan Subramanian, Francisco Roque, Inge Jonassen, Kjell Petersen, and Pål Puntervoll (Unpublished). eSysbio: a workbench proposal for collaborative computational biology. Full text not available in BORA.