Functional validation scripts usage

WARNING: Under construction!!!


Sorry for the dust! We’re working hard to make this website available.

Links might fail, content might be incomplete and layout might be very ugly.

Functional validation scripts usage

Validation of clusters annotated to Pfam domains, in terms of intra-cluster functional homogeneity.

R required packages:

tidyverse
data.table
proxy
stringr
textreuse
parallel

Addintional data required (found in this folder)

“files/pfam_shared_all” : a list of pfam terminal or middle domains of the same proteins

Usage

Rscript eval_shingl_jacc.r "data/annot_and_clust/marine_hmp_db_03112017_clu_ge10_annot.tsv" "data/cluster_validation/functional/shingl_jacc_val_annot.tsv"
  • output: tab-formatted table with 7 fields:
    • clusters old (MMseqs2) representative
    • jaccard average similarity value not scaled by the number of annotated members/ORFs in the cluster
    • jaccard average similarity value scaled by the number of annotated members/ORFs in the cluster
    • Type of annotation (completely homogeneous, Not homogeneous only mono-domain, not homogeneous multi-domain and singl-domain in the same cluster)
    • Proportion of that type of annotation in the cluster
    • Proportion of partial/complete ORFs in the cluster
    • Based on the annotation type, 3 different categories HA, MoDA or MuDA

Let's Get In Touch!


Ready to start your next project with us? That's great! Give us a call or send us an email and we will get back to you as soon as possible!