A workflow to unify the Known and Unknown

WARNING: Under construction!!!


Sorry for the dust! We’re working hard to make this website available.

Links might fail, content might be incomplete and layout might be very ugly.

We implemented a computational workflow (Agnostos) to structure and explore the large pool of genes with unknown functions found in microbial genomes and metagenomes. We used a protein domain-based approach to partition more than 400 million predicted genes from 1,628 metagenomes and 28,941 genomes into the different categories of known and unknown.

workflow.jpg Brief schematic of the workflow

The workflow is based on Snakemake for the easy processing of large datasets in a reproducible manner. It provides three different strategies to analyze the data. The module DB-creation creates the gene cluster database, validates and partitions the gene clusters (GCs) in the main functional categories. The module DB-update allows the integration of new sequences (either at the contig or predicted gene level) in the existing gene cluster database. In addition, the workflow has a profile-search function to quickly screen the gene cluster PSSM profiles in the database

Follow the links for a detailed description of the methods and results for each of the steps in the workflow:

  1. Gene prediction
  2. Pfam annotations
  3. Deep clustering
  4. Gene cluster validation
  5. Gene cluster refinement
  6. Gene cluster classification
  7. Gene cluster category refinement
  8. Gene cluster communities inference
You can try the workflow here.
A description of the data used for the manuscript can be found here.

Let's Get In Touch!


Ready to start your next project with us? That's great! Give us a call or send us an email and we will get back to you as soon as possible!