Biodiversity Exploratories Information System

CC-BY-NC Dataset Public since: 2018-11-05 Prototype 18S rDNA data for Protistan microbiome of grassland soils, NanoFaun, 2011

DOI: No DOI issued.
Citation: Arndt, Hartmut (2018): 18S rDNA data for Protistan microbiome of grassland soils, NanoFaun, 2011. v1.1.10. Biodiversity Exploratories Information System. Dataset.


id 22610
versionID 1.1.10
title 18S rDNA data for Protistan microbiome of grassland soils, NanoFaun, 2011
owner1 Hartmut Arndt
projectName NanoFaun
datasetManagerName Hartmut Arndt
institute University of Cologne


noOfGP 150
noOfEP 150
noOfMIP 75
noOfVIP 27
grassland yes
forest no
experimentalManipulation no
aboveGround no
belowGround yes
numberOfRepetitions 0
numberOfSubPlots 0
taxon1 Microbes
processOrService1 None
environmentalDescriptor1 Soil
bioticDataType1 Genetic


introduction The Protistan Microbiome of Grassland
Soil: Diversity in the Mesoscale

DNA isolation, PCR amplification and NGS: Whole genomic DNA was extracted from 1 g of each composite soil sample using the PowerSoil® DNA isolation kit (Mo Bio Laboratories, Inc., Carlsbad, CA) and quantified using a Nanodrop 100 spectrophotometer (Thermo Fisher Scientific, Germany). DNA quality was assessed by 2% Agarose gel electrophoresis. Overall DNA concentration was adjusted to 100 ng/µl using DEPC deionized water (ddH2O). The highly variable V4 region of the 18S rRNA gene was directly amplified from the samples using the eukaryotic specific primers 590F (5’-3’:CGGTAATTCCAGCTCCAATAGC) and 1300R (5’-3’:CACCAACTAAGAACGGCCATGC). To separate the sequences, the Titanium primer design and the recommended multiplex identifier (MID) adaptor complex design (Roche, Germany) method was used. The pre-454 sequencing PCR reaction contained: 2µl (100 ng/µl) DNA, 2 µl 10x DNA polymerase buffer with 20 mM MgSO4, 2µl (1 μM) 590F primer [590F + Key + MID + Adaptor A] and 2µl (1 μM) 1300R primer, 2µl (2 mM) dNTP each and 0.4 µl (2.5 U/µl) Pfu (Pyrococcus furiosus) DNA polymerase (Fermentas, Germany) and filled up to a total volume of 25 µl with ddH2O. Pfu polymerase was used because of its high fidelity (2.6x10-6 error rate) through 5' to 3' exonuclease activity. Cycling conditions were: initial denaturation at 95°C for 3 min; 30 cycles at 95°C for 30 s, 55°C for 30 s, 72°C for 2 min; and a final extension at 72°C for 10 min. PCR product quality was evaluated by agarose gel electrophoresis (2%) and its quantity was determined by spectrophotometry on a Nanodrop 100 (Thermo Fisher Scientific, Germany). The pre-454 sequencing PCR reaction was amplified in triplicate and pooled to a final concentration of 20 ng/µl to eliminate possible PCR bias. NGS using the GS-FLX sequencer with Titanium sequencing kit XLR70 (Roche, Germany) was performed by GATC Biotech AG, Germany. Sequencing was done as from the forward primer (adaptor A).
theory Data collection and soil sampling: Soil samples for the mesoscale were collected in May 2011 as part of the German Biodiversity Exploratories initiative ( (Fischer et al. 2010). They cover 150 grassland soil samples from three temporally and spatially scaled geo-referenced study plots: the UNESCO biosphere reserve Schorfheide-Chorin (SEG) in north-eastern Germany, Hainich-Duen national park (HEG) in central Germany and the Schwaebische Alb UNESCO biosphere reserve (AEG) in south-western Germany. The co-ordinates and parameters at the time of sampling are given in Table S1 (also see map in Fig. 5). Standardized field sampling (Fischer et al. 2010; Brabender et al. 2012) was performed. To summarize the procedure, fourteen soil cores (diameter, 8.3 cm) were taken from 20 x 20 m size subareas, selected to represent a range of land-use intensities (LUI, Bluethgen et al. 2012). Soil samples were cored out from the upper 10 cm of the A horizon (core size 8.3 cm), the top most 5 cm root-layer was removed as well as any deadwood and roots larger than 2 cm in diameter. Samples were homogenized and stored at 4°C while still at field moisture content. LUI index for the year 2011 was calculated from fertilization intensity (organic and mineral fertilization excluding livestock dunging), mowing frequency and grazing intensity (livestocks). Data was obtained from land owners by questionnaires and was applied in this comparative study to test land management and its effects on species richness. The three regions differ in climate, geology and topography and are representative of large parts of Central Europe.
type 18S rDNA Data
instruments GS-FLX sequencer and Titanium sequencing kit XLR70 (Roche, Germany)
calibration Removal of biased amplicons: Unidirectionally sequenced DNA traces received back from GATC Biotech AG were demultiplexed by means of the barcode in the pyrotags (adaptor A) and received in FastA format. Using standard command line, raw sequences were filtered for (1.) 100% forward primer match to remove false positive PCR amplifications of non-rRNA genes; (2.) minimum sequence length of 200bp to remove possible artifacts; (3.) a maximum sequence length of 710bp; and (4.) ambiguities (N’s), to exclude sequences containing uncertain base pairs. Raw sequences are available from the authors upon request. Sequences reported in this paper have been deposited in the GenBank database under accession numbers XXX to XXX.
procedures Bioinformatical identification of OTUs: Raw sequences were scanned for chimeric sequences against the curated Protist Ribosomal Reference (PR2 v203) database using the uchime_ref algorithm in the USEARCH v. 7.0.1090 package. All sequences were trimmed to a maximum length of 530bp to avoid terminal read errors and focus downstream analyses on the V4 region of interest. Dereplication into unique individual reads (UIRs) was performed using the VSEARCH script to cluster 100% identical amplicons, to identify singletons (read abundance = 1; see Results and Discussion). Singletons are likely artifacts of pyrosequencing and were therefore removed. UIRs were aligned to the PR2 (v203) database (Guillou et al. 2013) using the nucleotide basic local alignment search tool (BLASTn v. 2.2.31+) algorithm. Default BLASTn parameters (open gap penalty 5, cost gap extension penalty 2, nucleic match 2, nucleic penalty mismatch -3 and word size 11) were applied. A single hit for an UIR was retained if E-value ≤e-100. UIRs that clustered to the same accession number in the PR2 database were clustered to that accession number. One accession number was counted as a single operational taxonomic unit (OTU) and only UIRs with 100% query coverage (100% length of the query sequence matched a part of a full length reference sequence) were considered suitable for further analysis. This included most of the reads (78.5%). Even though UIRs (unique individual reads) may be 100% identical to a reference sequence, ambiguous identification is still possible. Some OTUs (~530bp) containing UIRs with 100% sequence identity may represent more than one species because the sequence similarity is limited to the barcoding region only. All hits were inspected for accuracy using the metagenome analyzer (MEGAN v. 5) program for conserved sequences. Using 50 BLASTn hits per UIR sequence, conserved sequences were correctly identified to the high-order taxa in database, due to the lowest common ancestor (LCA) algorithm. On the other hand, although 454 sequencing presents a low error rate, different query sequences might represent the same morphospecies; where genetic delineation is not known yet or genetic diversity is unclear due to the accumulation of “neutral” mutations in a single species. To manually inspect and weight the pairwise distances between UIRs and centroid reference sequences within each OTU or taxonomic group, the Kimura 2-parameter (K2P) in MEGA6 was used. Distance values could be multiplied by 100 to obtain the percent pairwise distance between two sequences (e.g. 0.03 = 3% pairwise distance). Furthermore, OTUs with high similarity annotations to (1.) taxa for which the V4 region of the 18S rRNA gene is not a suitable barcoding gene and (2.) non-protistan taxa were removed. OTUs with tags for Metazoa and Fungi within the supergroup Opisthokonta, and Streptophyta within the supergroup Archaeplastida were removed.


acronym1 bp
meaning1 Base pairs
keyword1 Soil, Protist, Biodiversity


format MM.yyyy
startDate 05.2011
endDate 05.2011
dateEntry 2018-02-15
dateLastModified 2018-03-07


fileType structuredData
qualityLevel processed
dataStatus complete
name typeOfVariable units description block
1 EP_PlotID string EP Plot ID 0
2 QuerySequence string MD5-Hash-Wert nach Dereplizierung 0
3 SequenceLength integerNumber Query sequence length 0
4 AccessionNumber string Closest reference – Often used to identify OTUs (closed reference) 0
5 Domain string Taxonomic lineage 0
6 Supergroup string Taxonomic lineage 0
7 Division string Taxonomic lineage 0
8 Clas string Taxonomic lineage 0
9 Order string Taxonomic lineage 0
10 Family string Taxonomic lineage 0
11 Genus string Taxonomic lineage 0
12 Species string Taxonomic lineage 0
13 PairwiseIdentity realNumber % Sequence similarity to the reference sequence 0
14 Exploratory string Biodiversity exploratory: AEG – Schwäbische Alb; HEG – Hainich Dün; SEG – Schorfheide Chorin 0


database Raw sequence data can be found here: (Accession number SRP101780))
paper1 Venter PC, Nitsche F, Domonell A, Heger P, Arndt H (2017) The protistan microbiome of grassland soil: Diversity in the mesoscale. Protist 168:546-564
dataset1 Sequence Read Archive (SRP101780)
citation Arndt, Hartmut (2018): 18S rDNA data for Protistan microbiome of grassland soils, NanoFaun, 2011. v1.1.10. Biodiversity Exploratories Information System. Dataset.