Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about CausalConvTranspose1d in conv_layer.py #11

Closed
Jacksonroad opened this issue Nov 9, 2023 · 3 comments
Closed

Some questions about CausalConvTranspose1d in conv_layer.py #11

Jacksonroad opened this issue Nov 9, 2023 · 3 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@Jacksonroad
Copy link

hello, thanks for your useful code. I don't figure out the class of CausalConvTranspose1d. why we select nn.ReplicationPad1d for stream pad not similar to CausalConv1d which pads constant 0?In CausalConvTranspose1d, I found self.pad_length is equal to 1 no matter kernel _size change values.But in CausalConv1d self.pad_length is relevant to kernel_size.
Does self.pad_length have no links to kernel_size in CausalConvTranspose1d?So don't we change self.pad_length in CausalConvTranspose1d when we change its any parameters?

@Jacksonroad
Copy link
Author

@bigpon bigpon added enhancement New feature or request question Further information is requested labels Nov 9, 2023
@bigpon
Copy link
Contributor

bigpon commented Nov 9, 2023

Hi,
Thanks for the question.
You are right. The pad_length should be related to both the kernel_size and stride.
Since we fix the ratio of kernel_size/stride = 2, we can fix the pad_length to 1.

We handcrafted it because we followed that of the ParallelWaveGAN repo.
However, it will be better to make it flexible for arbitrary kernel_size and stride settings.
I may rewrite it later.

More details can be found in the following discussion.
kan-bayashi/ParallelWaveGAN#326
kan-bayashi/ParallelWaveGAN@25c4b9a

bigpon added a commit that referenced this issue Jan 3, 2024
1.	According to issue #9, we implement the codec version (activate_audiodec) with more activations like HiFiGAN and release the pre-trained model “symAAD_vctk_48000_hop300”.
2.	We fix the MSTFT 2D conv padding issues mentioned in issue #9 and release the updated “symADuniv_vctk_48000_hop300” and “AudioDec_v3_symADuniv_vctk_48000_hop300_clean”.
3.	We implement the more flexible CausalConvTranspose1d padding for arbitrary kernel_size and stride according to issue #11.
4.	We release a 24kbps model, “symAD_c16_vctk_48000_hop320”, which achieves better speech quality and robustness to unseen data.
@bigpon
Copy link
Contributor

bigpon commented Jan 3, 2024

The self.pad_length of CausalConvTranspose1d has been updated to "(math.ceil(kernel_size/stride) - 1)" for arbitrary kernel_size and stride settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants