ART Illumina is required to create the simulated datasets
Create simulated datasets using the simulate_reads.sh script in each subfolder of the /simulated_exp folder. Please set the art_bin configuration in each script to the path of your ART Illumina binary.
Note: the simulated datasets used in the manuscript will be available on Zenodo soon.
Make sure the /Strainify_paper directory and the /Strainify directory are located in the same parent directory. Run Strainify from the /Strainify directory using the following command:
../Strainify_paper/simulated_exp/strainify/run_strainify.sh
../Strainify_paper/4_strain_mock_community/strainify/run_strainify.shNote: these scripts assume that you installed Strainify via git. For runtime and memory benchmarking, please uncomment the lines related to the benchmark feature in the Snakefile (lines 47, 78, 88, 104, 117, 133, 143, 154, 166, 176, 190, 209, 224, 243, 262, 283). The runtime and memory benchmarking results analysis scripts are in the strainify_timings directory.
Please find each tool's own folder in this repository and run the scripts inside. Please change the paths in all scripts to match your local setup.
All scripts required for this dataset are in the B_ovatus directory.
- Download the fastq files by running the
retrieve_srr.shscript - The reference genomes for this dataset are in the
all_fastasfolder (the script to download the reference genomes isretrieve_fastas.shand it will download the assemblies inPRJNA544527_AssemblyDetails.txt) - Run Strainify via
run_strainify.sh. Make sure to run it from the Strainify directory. Please modify the paths in theconfig.yamlfile to match your local setup. - To analyze the data, run the following scripts:
reorder_srr.py,extract_genome_names.pyandstack_plot.py. Please change the paths in these scripts to match your local setup. You will need thesrr_list.txtfile for thereorder_srr.pyscript and theassembly_names.treefiles for theextract_genome_names.pyscript.