Skip to content
This repository was archived by the owner on Nov 9, 2023. It is now read-only.

adds convergent mode to pick_open_reference_otus#1958

Closed
gregcaporaso wants to merge 5 commits into
biocore:masterfrom
gregcaporaso:new-open-ref-workflow
Closed

adds convergent mode to pick_open_reference_otus#1958
gregcaporaso wants to merge 5 commits into
biocore:masterfrom
gregcaporaso:new-open-ref-workflow

Conversation

@gregcaporaso
Copy link
Copy Markdown
Contributor

This replaces #1951.

We still need to do some more testing before this is merged though. @josenavas, how is the EMP run going with this? Can you confirm that all sequences are accounted for after different iterations as an additional test. The count of input sequences should be the same as the count of sequences in the iteration's OTU map before singleton filtering.

@josenavas
Copy link
Copy Markdown
Member

Thanks for adding more documentation @gregcaporaso
Agree we still need more testing. I'm thinking in doing still some modifications to the code, so the performance can be improved.

The problem that I found is that if the size of the input files are quite different (e.g. we have in the EMP input files of 80GB while others are less than 1 GB) once this files are processed, the amount of sequences per iteration that are analyzed is reduced. The change that I'm planning to do is to modify the number of sequences per input file included in each step dynamically; so in each iteration we can analyze approx the same amount of sequences. Does this sound reasonable to you @gregcaporaso ?

Another change will be to allow the convergent mode also in a single input file; so we can analyze extremely large datasets in a convergent manner. Do you also agree with this change @gregcaporaso ?

@ghost
Copy link
Copy Markdown

ghost commented Mar 16, 2015

Build results will soon be (or already are) available at: http://ci.qiime.org/job/qiime-github-pr/1603/

@gregcaporaso
Copy link
Copy Markdown
Contributor Author

Both of those sound like good additions, but I think you should focus on the first one since we have an immediate application (EMP). Does the process still seem to be working for that analysis?

@josenavas
Copy link
Copy Markdown
Member

Yeah, I will focus on the first one. The process seems to be working correctly on that data.
Another addition that I think is going to be awesome and it is going to be somewhat required is the ability of checkpointing, i.e. if X iterations have been already executed and the process failed, start from that iteration rather than re-analyze everything. I'm going to work in both of these issues today, as we are moving the compute to our local cluster in UCSD and being able to resume the work that has been already done will be extremely useful.

@josenavas
Copy link
Copy Markdown
Member

closing in favor of #1959

@josenavas josenavas closed this Mar 17, 2015
@gregcaporaso gregcaporaso deleted the new-open-ref-workflow branch December 19, 2022 16:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants