Sorry for the dust! We’re working hard to make this website available.
Links might fail, content might be incomplete and layout might be very ugly.
To obtain a comprehensive view of the microbial communities in different environments we combined four major/primary marine metagenomic datasets, which cover all the ocean regions at various depths and the Human Microbiome Project dataset  The Global Ocean Sampling Expedition (GOS) , the Tara Oceans expedition (TARA) , Malaspina [Council SNRC (CSIC). Malaspina expedition. Available at: http://www.expedicionmalaspina.es/, 2010] and Ocean Sampling Day (OSD) , form together one of the most extensive public marine data sets. The data from GOS originated from 80 samples at 70 different sampling sites; the Malaspina data set comprises data from 116 samples, taken at 30 different stations; the TARA data covers 141 different locations for a total 242 samples and OSD data belongs to 146 metagenomic samples taken at 139 different stations. We added to this dataset 1,249 HMP metagenomes, coming from 5 main body sites (“gastrointestinal tract”, “oral”, “airways”, “urogenital tract” and “skin”) and 18 specific sites. The numbers are shown in Table 1.
Metagenomic data sets
Ocean distribution of the metagenomic samples
The data were collected in the form of single-reads from GOS and at the stage of metagenomic assemblies from the other four projects. Specifically, GOS single-reads came from shotgun sequencing performed with the Sanger sequencing techniques, which leads to sufficiently long reads  (GOS Sanger data have an average read length of ~800 nucleotides ). TARA, OSD, Malaspina and the HMP data are, instead, metagenomic assemblies of Illumina pair-end reads. TARA reads were assembled using MOCAT , Malaspina with RAY-Meta , OSD using SPAdes  and the HMP with SOAPdenovo (V 1.04 28) .
The Genome Taxonomy Database (GTDB): 127,318 genomes, BACTERIA (125,243), ARCHAEA (2,075), Release 03-RS86 (19th August 2018)
We downloaded the protein sequences for bacterial and archaeal genomes from the Annotree website at: https://data.ace.uq.edu.au/public/misc_downloads/annotree/r86/.
We collected 90,621,864 proteins from 27,372 bacterial genomes, and 3,101,326 from 1,569 archaeal genomes
OM-RGC-v2 reference paper: “Gene Expression Changes and Community Turnover Differentially Shape the Global Ocean Metatranscriptome” https://www.sciencedirect.com/science/article/pii/S009286741931164X
OM-RGC.v2 contains 46,775,154 non-redundant genes.
It can be downloaded from the https://www.ocean-microbiome.org/ portal.
 F. Sanger, S. Nicklen, and A. R. Coulson, “DNA sequencing with chain-terminating inhibitors.,” Proceedings of the National Academy of Sciences of the United States of America, vol. 74, no. 12, pp. 5463–5467, Dec. 1977.
 A. Bankevich et al., “SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.,” Journal of computational biology: a journal of computational molecular cell biology, vol. 19, no. 5, pp. 455–477, May 2012.