ConvTasNet pretrained huggingface model inference setup 

I'm trying to do some inferencing on this [pretrained ConvTasNet](https://huggingface.co/JorisCos/ConvTasNet_Libri1Mix_enhsingle_16k) single source enhancement model on hugging face and I'm getting notably poor output.  

I tried passing an ~18.5 sec, 16kHz clean speech clip mixed with -40dB white Gaussian noise and the output seemed to have about the SNR and the scaling ballooned well passed +/-1 (max sample value around 1500). Additionally, the speech itself sounds slightly distorted.

I should note that I also tried passing just the clean speech to the model and got similar results, as far as added distortion goes.
![image](https://github.com/asteroid-team/asteroid/assets/37642463/7db014c0-d1f3-4e59-866b-19a5b5316c71)

I'm trying to figure out if I've configured everything correctly to inference using `LambdaOverlapAdd`. I mostly used the [Process large audio files](https://github.com/asteroid-team/asteroid/blob/master/notebooks/04_ProcessLargeAudioFiles.ipynb) notebook as reference. Here's my code.

```
kernel_size = 32
stride = 16

model = torch.hub.load('mpariente/asteroid', 'conv_tasnet', 'JorisCos/ConvTasNet_Libri1Mix_enhsingle_16k')

continuous_nnet = LambdaOverlapAdd(
    nnet=model,
    n_src=1,
    window_size=kernel_size,
    hop_size=stride,
    window=None,
    reorder_chunks=False
)

in_tensor = torch.from_numpy(noisy_audio[None, None, :])
out_tensor = continuous_nnet.forward(in_tensor)

out_wav = out_tensor.numpy().squeeze()
```
Where `noisy_audio` is the 1-D noisy speech signal, and `window_size` and `hop_size` were inferred from the config provided on the hugging face page for the model.

Is there something I'm missing or doing wrong here?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ConvTasNet pretrained huggingface model inference setup #697

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ConvTasNet pretrained huggingface model inference setup #697

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions