-
Notifications
You must be signed in to change notification settings - Fork 17
Inferring resolved species tree using fastoma
Consider a scenario that NCBI tree was used with FastOMA. Since such tree is not resolved (binary), we can use gene markers (Orthologous Groups) found with FastOMA to infer a high quality species tree.
This is based on the F1000 paper here and mostly codes from here. Updated codes are available in here.
After running FasTOMA, you should see a folder OrthologousGroupsFasta in the output folder.
Usually there many small OrthologousGroups (OGs), using the code filter_groups_fastoma_v2.py from here we can filter and select big ones (with more species).
You need to change min_species=50 in the following bash script, the goal is to have >100 or >1000 gene markers in the output folder OrthologousGroupFasta_gt*. One suggestion is to use 0.8*proteomes.
cp -r ../out/OrthologousGroupsFasta .
cd OrthologousGroupsFasta
gunzip -k *
cd ..
min_species=64
python filter_groups_fastoma_v2.py --min-nr-species ${min_species} --input OrthologousGroupsFasta --output OrthologousGroupFasta_gt${min_species}
cd OrthologousGroupFasta_gt${min_species}
mkdir ../aligned/
thread=20
for i in $(ls -1 *.fa); do
mafft --maxiterate 1000 --localpair --thread ${thread} ${i} > ../aligned/${i}.aln
done
Note that in code here I changed the function get_species_from_header to get fastoma species names in ||speciesNames||
cd ..
python concat_alignments_v2.py aligned/*aln --format-output fasta > concat.fasta
trimal -in concat.fasta -out concat_gappyout.fasta -gappyout
thread=48
iqtree -T ${thread} -s concat_gappyout.fasta -bb 1000 # -m LG+I+G