-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix: regeneration of voice on web page #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -272,10 +272,12 @@ def optimize(self, disable: bool = False): | |
| self.residual_lm.forward_step = torch.compile( | ||
| self.residual_lm.forward_step, mode="reduce-overhead", fullgraph=True | ||
| ) | ||
| # Use default mode for feat_encoder to avoid CUDA graphs issues with dynamic shapes | ||
| # reduce-overhead mode requires fixed input shapes (CUDA graphs) | ||
| self._feat_encoder_raw = self.feat_encoder | ||
| self.feat_encoder = torch.compile(self.feat_encoder, mode="reduce-overhead", fullgraph=True) | ||
| self.feat_encoder = torch.compile(self.feat_encoder, mode="default") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same question here: was this change driven by a specific runtime issue with dynamic shapes or CUDA graphs? A short note on the original problem and any perf / stability validation would make this easier to evaluate.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when I try to regenerate the voice based on the voice I uploaded before of another text, I got an error like this "Traceback (most recent call last): addict 2.4.0
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the detailed context and stack trace — that helps a lot. I have two follow-up questions before I feel fully comfortable with this change:
If the answer is that
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There seems to be an issue with my local computer. During the voice generation process, there are extremely long wait times, whether I modify the mode or not. I might not be able to proceed with testing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ran a quick local comparison on an RTX 4090 using the current package environment, with reference_wav_path only and no prompt_wav_path or prompt_text, and two sequential generations after warmup. With a longer input of about 365 characters, both main and this PR completed successfully without reproducing the failure, but I did see a small slowdown with this change.
So this does look directionally safer, but it may also carry some runtime cost. Given that, I wonder if it would be better to keep the library default as-is for now, and instead expose an option in app.py to disable optimization when needed, for example a --no-optimize flag or a similar demo-only switch. That would make it easier to validate whether the web demo issue can be mitigated by turning optimization off explicitly, without changing the default compile behavior for all users. Would that be a reasonable compromise?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Warm up VoxCPMModel...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this related to #201? |
||
| self.feat_decoder.estimator = torch.compile( | ||
| self.feat_decoder.estimator, mode="reduce-overhead", fullgraph=True | ||
| self.feat_decoder.estimator, mode="default" | ||
| ) | ||
| except Exception as e: | ||
| print(f"Warning: torch.compile disabled - {e}", file=sys.stderr) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share a bit more context on this
torch.compilechange? The switch fromreduce-overhead/fullgraph=Truetomode="default"looks reasonable if this was running into dynamic-shape or CUDA graph issues, but it would be helpful to know what concrete failure or regression prompted it, and whether the new setting was tested for any noticeable perf impact.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could just ignore the change of app.py