Compositional validation scripts usage:

Compiling of the scripts for the SSN filtering:

cd scripts/Cluster_validation/compositional/
gcc is_connected.c -I${HOME}.linuxbrew/Cellar/igraph/0.7.1_6/include -L${HOME}/.linuxbrew/Cellar/igraph/0.7.1_6/lib/ -ligraph -o is_connected
gcc -O3 filter_graph.c -I${HOME}/.linuxbrew/Cellar/igraph/0.7.1_6/include -L${HOME}/.linuxbrew/Cellar/igraph/0.7.1_6/lib/ -ligraph -o filter_graph

Running the evaluation script using ffindex in mpi mode (ffindex)

/bioinf/software/openmpi/openmpi-1.8.1/bin/mpirun -np 32 ~/opt/ffindex_mg_updt/bin/ffindex_apply_mpi \
 data/mmseqs_clustering/marine_hmp_db_03112017_clu_fa \
 data/mmseqs_clustering/marine_hmp_db_03112017_clu_fa.index \
 -- scripts/Cluster_validation/compositional/compos_val.sh
  • output: tab-separated file with 24 fields:
    • info about the Sequence Similarity Network
    • as above
    • as above
    • as above
    • as above
    • as above
    • min intra-cluster identity (trimmed)
    • mean intra-cluster identity (trimmed)
    • median intra-cluster identity (trimmed)
    • max intra-cluster identity (trimmed)
    • min intra-cluster identity (original graph)
    • mean intra-cluster identity (original graph)
    • median intra-cluster identity (original graph)
    • max intra-cluster identity (original graph)
    • min ORF length in the cluster
    • mean ORF length in the cluster
    • median ORF length in the cluster
    • max ORF length in the cluster
    • number of bad aligned sequences
    • number of good sequences
    • proportion of bad aligned sequences per cluster