Analysis of sequencing data in environmental genomics. Exploring the diversity of the microbial biosphere
MetadataShow full item record
Most life on this planet is microbial and for the last two decades, environmental genomics has contributed to reveal an impressive biodiversity of this microbial life. This approach applies DNA sequencing to environmental samples, with the significant advantage of not relying on cell cultures, since only a minority of microorganisms are easily cultured in the laboratory. This thesis deals primarily with analysis of microbial diversity based on community profiling. This variant of environmental genomics targets defined marker genes to study the structure of microbial communities. The use of the small subunit ribosomal RNA as a phylogenetic marker is discussed and evaluated, with emphasis on taxonomic classification, estimation of diversity and comparison of community structure between samples. Thanks to improved sequencing technologies, community profiling is an increasingly powerful and cost-efficient technique. Like all methodologies it has limitations and sources of random- and systematic errors, many of which remain poorly understood. In relation to this, a number of recommendations and novel analysis methods are developed and provided. These are subsequently applied to study environmental communities, targeting issues like the “rare biosphere” concept, and variation of community structure across space and environmental gradients.
Taxonomic classification is the process of placing environmental sequences in context of previously studied organisms. Thus, ecologically meaningful information such as putative metabolic functions can be derived. In Paper I, a set of resources for taxonomic classification is provided and evaluated. The performance of the resulting framework, CREST (Classification Resources for Environmental Sequence Tags), is shown to compare favourably to existing methods. It also provides a manually curated taxonomy and functionality for comparing composition across datasets. In Paper II, a hydrothermal vent-associated microbial mat community is studied, using a set of different environmental genomics methods. Based on this study, several important sources of bias and reproducibility of community profiling are evaluated and discussed. The results highlight the importance of applying complementary methods. They also illustrate the influence of primer choice, PCR bias and whether RNA or DNA is targeted. Random variation, or noise, is another important factor to consider in community profiling studies. Papers III and IV, examines the effect of such noise from PCR amplification and pyrosequencing. Currently, this is the most common sequencing method applied to environmental samples. The results of Paper III demonstrate that early community profiling studies using pyrosequencing have significantly overestimated the extent of biodiversity, because of noise. To compensate for such noise in amplicon sequence datasets, the program AmpliconNoise was developed. Using “mock communities”, a mix of clones with known sequences, the performance of AmpliconNoise is demonstrated and compared to alternative methods. Analyses of diversity in the microbial mat community studied in Paper II utilise AmpliconNoise. Resulting estimates are compared to previous findings, from similar environments.
In addition to biodiversity per se, the underlying diversity structures of communities and the mechanisms shaping them, remain important but poorly understood issues in microbial ecology. Because of their many useful characteristics, alkaline soda lakes are used as model ecosystem to study several such issues, in Paper V. Results reveal that these extreme environments harbour surprisingly high microbial diversity. Interestingly, the most alkaline and saline lakes studied also appear to be the most diverse. Further, it is shown that pH, oxygen level, and sodium- and potassium concentrations can explain 30% of the compositional variance between the lakes studied. The existence of organisms endemic to individual lakes is also indicated. Although soda lakes are relatively uncommon environments, this study provides an example of how fundamental biogeographical questions can be targeted using a careful choice of experimental design and analysis methodology. The results call into question several established notions such as extreme environments generally being less diverse and that few prokaryotic organisms are endemic. Hopefully the findings will inspire future studies, exploring these relationships further.
In summary, the work presented here illustrates the importance of evaluating and optimising the methodology used in environmental genomics, particularly for amplicon sequencing, taxonomic classification, and estimation of phylogenetic diversity. It is likely that methodological limitations have biassed and slowed down data analysis and interpretation of important ecological issues like the rare biosphere and microbial biogeography.