Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows

Sztromwasser, Paweł; Puntervoll, Pål; Petersen, Kjell

Sztromwasser, Paweł; Puntervoll, Pål; Petersen, Kjell

Peer reviewed, Journal article

Published version

View/Open

Published version (1.320Mb)

URI

https://hdl.handle.net/1956/7904

Date

2011

Abstract

Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a com- mon practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinfor- matics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.

Publisher

IMBio e.V.

Journal

Journal of Integrated Bioinformatics

Copyright

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs CC BY-NC-ND