Refactoring of the 'fit_batch' function by Adel-Moumen · Pull Request #2010 · speechbrain/speechbrain

Adel-Moumen · 2023-05-31T13:31:29Z

This PR aims at solving the fp16 nan issue with CTC common voice wav2vec #1803.

The problem was coming from the loss. basically, we were not asking the accumulation of the gradients when the loss was NaN which led to NaN gradients. It is based on the pr #2018.

I also added bfloat16 support.

This pr change the name check_gradients to check_loss_isfinite (#1496).

WIP results:
FP16 A100 80Gb - CommonVoice FR - Wav2vec2
100%|██████████████████████████████████████████████████████████████████████████| 17829/17829 [1:35:12<00:00, 3.12it/s, train_loss=0.367]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 3978/3978 [05:22<00:00, 12.32it/s]
speechbrain.utils.train_logger - epoch: 1, lr_model: 1.00e+00, lr_wav2vec: 1.00e-04 - train loss: 3.67e-01 - valid loss: 1.50e-01, valid
CER: 3.89, valid WER: 12.58

Related issues:
#1517 #1803 #1496 #2085

Related code:
ESPNet way to handle NaNs: https://github.com/espnet/espnet/blob/5d0758e2a7063b82d1f10a8ac2de98eb6cf8a352/espnet/asr/pytorch_backend/asr.py#L301 -> if a bwd leads to NaN gradients, they still continue to accumulate. There is no such thing as "valid_step". This is a simpler approach but it could lead to "wrong" accumulation steps. If grad_accu is set to 3, and during training there's one bwd which leads to NaN then it means there are going to step the opt with only 2 bwd of grads not 3 as expected.

Furthermore, there are not removing the loss from the autocast part. And also, they are using the "old" way of handling autocast not the new one with GradScaler.

NeMO way to handling NaNs/Infs = https://github.com/NVIDIA/NeMo/blob/1c10b3677040cde4785ea73f7d9ec887578885c7/nemo/collections/asr/models/asr_model.py#L133 basically there are setting the value of the grad to be zero.

PyTorch Discussion which indicates that the loss should be excluded from the fp16 region.

Adel-Moumen · 2023-06-20T10:01:34Z

    def check_loss_isfinite(self, loss):
        """Check if loss is finite and that gradients are not too large.
        Automatically clips large gradients.

        Arguments
        ---------
        loss : tensor
            The loss tensor after ``backward()`` has been called but
            before the optimizers ``step()``.

        Returns
        -------
        bool
            Whether or not the optimizer step should be carried out.
        """
        if not torch.isfinite(loss):
            self.nonfinite_count += 1

            # Print helpful debug info
            logger.warning(f"Loss is {loss}.")
            for p in self.modules.parameters():
                if not torch.isfinite(p).all():
                    logger.warning("Parameter is not finite: " + str(p))

            # Check if patience is exhausted
            if self.nonfinite_count > self.nonfinite_patience:
                raise ValueError(
                    "Loss is not finite and patience is exhausted. "
                    "To debug, wrap `fit()` with "
                    "autograd's `detect_anomaly()`, e.g.\n\nwith "
                    "torch.autograd.detect_anomaly():\n\tbrain.fit(...)"
                )
            else:
                logger.warning(
                    "Patience not yet exhausted, ignoring this batch."
                )
                return False

        if self.max_grad_norm > 0.0:
            torch.nn.utils.clip_grad_norm_(
                (p for p in self.modules.parameters()), self.max_grad_norm
            )

        return True

There's something that I don't understand. This function is doing too much. First, it checks if the loss is not nan, then checks the gradients, and finally perform gradient clipping. However, with the latest PR #2018, we now use this function to check whether the loss is finite, if it's the case then we backpropagate. However, doing so is skipping the clipping / check gradients part because we are are adding gradients AFTER having clipped/checked grads... Furthermore, I saw many recipes that were using check_loss_isfinite to just check/clip grads (and not to check the loss).

I propose one solution that would be to 1. re-rename check_loss_isfinite to check_gradients, and 2. remove the part "if not torch.isfinite(loss)" in a new function called check_loss_isfinite such as:

def check_loss_isfinite(self, loss):
    """Check if loss is finite and that gradients are not too large.
    Automatically clips large gradients.

    Arguments
    ---------
    loss : tensor
        The loss tensor after ``backward()`` has been called but
        before the optimizers ``step()``.

    Returns
    -------
    bool
        Whether or not the optimizer step should be carried out.
    """
    if not torch.isfinite(loss):
        self.nonfinite_count += 1
        # Check if patience is exhausted
        if self.nonfinite_count > self.nonfinite_patience:
            raise ValueError(
                "Loss is not finite and patience is exhausted. "
                "To debug, wrap `fit()` with "
                "autograd's `detect_anomaly()`, e.g.\n\nwith "
                "torch.autograd.detect_anomaly():\n\tbrain.fit(...)"
            )
        else:
            logger.warning(
                "Patience not yet exhausted, ignoring this batch."
            )
        return False
    else:
        return True


def check_gradients(self):
    """Check that gradients are finite and not too large.
    Automatically clips large gradients.
    """
    for p in self.modules.parameters():
        if not torch.isfinite(p).all():
            logger.warning("Parameter is not finite: " + str(p))

    if self.max_grad_norm > 0.0:
        torch.nn.utils.clip_grad_norm_(
            (p for p in self.modules.parameters()), self.max_grad_norm
        )

The idea is to be more versatile.

There's also some work to be done on improving the docstring that can be confusing (e.g., "Check if loss is finite and that gradients are not too large." which is not specific enough).

TLDR: The idea would be to have the following steps:
step 1. check loss is finite
step 2. if loss is finite then backward
step 3. checks grads (because can still nan) + clip
step 4. optimisers steps

Instead of:
step 1. check loss is finite + checks grads + clip
step 2. if loss is finite then backward (WARNING: we need to re-checks grads/clips but most recipes are forgetting about that because of the current functions we have...)
step 3. optimisers steps

What's your view on that please @TParcollet @asumagic (@pplantinga @RuABraun as you were in #1496) ?

asumagic · 2023-06-21T14:18:27Z

So, we discussed this a little at the lab, and the tl;dr of the steps above seem correct. There are still a bunch of implementation details to figure out, and we determined a few things. Mostly dumping what we talked about and what I think about it.

For context, here, when referring to "stepping", we mean running the optimizer step function after we accumulated as many gradients as we desired (if using that).

So basically, to reiterate, the steps while backwarding/accumulating gradients could become (more details later):

If using DDP, disable gradient sync for all but the last time we are accumulating gradients before a step
Run the forward pass
Check for non-finite loss: If bad, skip the batch
Run the backward pass
Check for non-finite gradients: If bad, flag the step as broken
Clip gradients

Then, at stepping time:

Check if the step is flagged as broken, if so, zero the gradient and early quit
Step the optimizer
Update the AMP scaler
Zero the gradient

Handling non-finite losses and gradients

Our current code to detect non-finite gradients is incorrect. It checks for the model parameters, not for their gradients. At this point, the parameters have already been "corrupt" (driven to NaN/+-infinity), so we can't recover that.
- It might still be useful to track non-finite parameters, to be fair?
- The check also only occurs if the loss was detected to be non-finite so it does not really do what it's advertised for.
We probably should use a counter for successful gradient accumulations (mostly because of the next point).
If we detect a non-finite loss, then we should skip the batch (but not the step). At this point, we have not run the backward pass yet, so the gradient is not corrupt. We can carry on with our step and not backpropagate the corrupt loss.
The patience mechanism would only be applied to non-finite loss detection, unless we believe it makes sense to apply it to non-finite gradients too.
If we detect non-finite gradients, then our gradient is "corrupt", which we cannot undo. Thus, we mark the step as corrupt, and only .zero_grad() will be called at stepping time.
- Yes, it is suboptimal as we could technically discard the step immediately. It is however not a big deal and keep things simpler when we think about DDP, etc.
There is some overlap with what AMP scalers do. Presumably, non-finite gradients would already be skipping the step. This is not the case for fp32, where we use no scaler, so it still makes sense to handle in SB.

Gradient clipping

We can apply it either after every batch, or we can apply it when stepping. It can make sense to be done after every batch since it reduces the odds that gradients explode to infinity, in particular with fp16.
@Adel-Moumen : There could be a good reason for doing it at step time: Gradient clipping requires us to unscale the gradient when using AMP. I don't know how much of a problem in practice it would be, would need to check

API design

Lots of recipes have a lot of duplicated code so we should be careful about API design so that it is modular enough and easy to override when really needed (which should be rare). So we should study the recipes to make sure we come with a good design.
- This is so easy to get wrong we should ensure that it is difficult to use things incorrectly. For the most part, that is solved by making the API flexible and natural enough.
- Example: With wav2vec models we might have split optimizers. In our new stock stepping functions, it would be trivial and clean to handle this usecase by taking in a list of optimizers.
We make a clear cut between accumulating gradients, and stepping the optimizer. This would split most of the backwards logic to two relatively self-contained functions.
Most of the new functionality would be included as Brain methods.
Be careful about naming, IMO:
- A function named check_something should not include patience logic inside, unless explicitly stated in the function name (merely "checking" feels like it should not carry any state)
As stated in [Bug]: DDP is going blblblblblblblbl #1802, our no_sync DDP logic is broken, resulting in extraneous synchronization. This may influence API design here if we want to end up with something clean.

For example the recipe code for the dual-optimizer wav2vec code could roughly look like:

active_optimizers = [self.adam_optimizer]
if not self.hparams.wav2vec2_frozen:
    active_optimizers.append(self.wav2vec_optimizer)

#...

if self.safe_backward(loss):
    if self.should_step():
        self.step_optimizer(active_optimizers)

Also OFC it might make sense to check what other toolkits do.

pplantinga · 2023-06-23T01:50:57Z

This seems well thought out, and I agree with most of the points. I wonder if we could add a basic test with a ridiculously high learning rate to see if it works as expected.

Adel-Moumen · 2023-06-23T13:09:35Z

This seems well thought out, and I agree with most of the points. I wonder if we could add a basic test with a ridiculously high learning rate to see if it works as expected.

Yes good point. Currently, we do have the recipe testing but it does only test the data flow. We also have some tests such as the quaternion one that test if the model is able to overfit or not (related to what you said). IMHO I don't think we should at this point start thinking about more tests (or at least tests similar to recipe testing data flow) as we already have some tests. We can think about this after the next release.

Adel-Moumen · 2023-06-23T13:13:19Z

Gentle ping @TParcollet, what do you think about this #2010 (comment) ?

Btw, @asumagic, I'd like to have something even more generic than what you are proposing. I prefer to override a function for example step_optimizer(self) so that we don't have to rewrite each time the fit_batch(self, batch). So we would have the generic code in the core.py, and then we would only override specific functions such as step_optimiser.

asumagic · 2023-06-23T13:20:59Z

Gentle ping @TParcollet, what do you think about this #2010 (comment) ?

Btw, @asumagic, I'd like to have something even more generic than what you are proposing. I prefer to override a function for example step_optimizer(self) so that we don't have to rewrite each time the fit_batch(self, batch). So we would have the generic code in the core.py, and then we would only override specific functions such as step_optimiser.

Could make sense, I'm not sure how many recipes would not need a custom fit_batch though. Seems like a bunch of them could, and that possibly would fix a few bugs/automatically add things like AMP support...

pplantinga · 2023-06-23T13:25:40Z

Hard agree that we should add the support needed to avoid overriding fit_batch where possible.

Adel-Moumen · 2023-06-23T13:59:55Z

Gentle ping @TParcollet, what do you think about this #2010 (comment) ?
Btw, @asumagic, I'd like to have something even more generic than what you are proposing. I prefer to override a function for example step_optimizer(self) so that we don't have to rewrite each time the fit_batch(self, batch). So we would have the generic code in the core.py, and then we would only override specific functions such as step_optimiser.

Could make sense, I'm not sure how many recipes would not need a custom fit_batch though. Seems like a bunch of them could, and that possibly would fix a few bugs/automatically add things like AMP support...

As you can see in the pr I started to change a bunch of recipes, and basically what I saw is that A LOT of recipes are actually doing the exact same things over and over. I really think that's its possible to have something generic. The only recipes that will have to change something are the one with multiple optimisers. (But we will have to look more closely)

asumagic · 2023-07-04T14:39:06Z

This is the mockup/prototype document I wrote that we discussed in the core meeting: https://gist.github.com/AsuMagic/8faaf3c15b24e845ab1767d06f1eb2d2

TParcollet

I like the changes. @Adel-Moumen please go ahead with this PR, merge it once tested and then open a new one with your idea.

pplantinga · 2023-08-18T19:44:32Z

Any progress on this?

Adel-Moumen · 2023-08-18T21:27:09Z

Any progress on this?

Hello, no progress so far. I will work on this after having finish CTC and Rescoring PRs.

…tc-and-w2v2-with-fp16-is-going-nan

…-and-w2v2-with-fp16-is-going-nan

…//github.com/Adel-Moumen/speechbrain into 1803-bug-ctc-and-w2v2-with-fp16-is-going-nan

…tc-and-w2v2-with-fp16-is-going-nan

mravanelli · 2023-11-14T00:45:51Z

Hi @Adel-Moumen, I ran the recipe test and pushed some fixes. However, there are still some recipe tests failing (mainly because the new recipes should be converted to the new fit_batch). In particular, I detected the following issues:

recipes/CVSS/S2ST/hparams/train_fr-en.yaml => convert to new fit_batch
recipes/IWSLT22_lowresource/AST/transformer/hparams/* => convert to new fit_batch
recipes/LJSpeech/TTS/vocoder/hifi_gan_unit/hparams/train.yaml => convert to new fit_batch
recipes/Tedlium2/ASR/transformer/hparams/branchformer_large.yaml => convert to new fit_batch
recipes/LibriTTS/TTS/mstacotron2/hparams/train.yaml. Here the issue is the following:

Traceback (most recent call last):
  File "/workspace/speechbrain/recipes/LibriTTS/TTS/mstacotron2/train.py", line 644, in <module>
    tacotron2_brain.fit(
  File "/workspace/speechbrain/speechbrain/core.py", line 1421, in fit
    self._fit_valid(valid_set=valid_set, epoch=epoch, enable=enable)
  File "/workspace/speechbrain/speechbrain/core.py", line 1335, in _fit_valid
    self.on_stage_end(Stage.VALID, avg_valid_loss, epoch)
  File "/workspace/speechbrain/recipes/LibriTTS/TTS/mstacotron2/train.py", line 363, in on_stage_end
    train_stats=self.last_loss_stats[sb.Stage.TRAIN],
                ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: <Stage.TRAIN: 1>

Adel-Moumen · 2023-11-15T21:14:31Z

Hi @Adel-Moumen, I ran the recipe test and pushed some fixes. However, there are still some recipe tests failing (mainly because the new recipes should be converted to the new fit_batch). In particular, I detected the following issues:

recipes/CVSS/S2ST/hparams/train_fr-en.yaml => convert to new fit_batch

recipes/IWSLT22_lowresource/AST/transformer/hparams/* => convert to new fit_batch

recipes/LJSpeech/TTS/vocoder/hifi_gan_unit/hparams/train.yaml => convert to new fit_batch

recipes/Tedlium2/ASR/transformer/hparams/branchformer_large.yaml => convert to new fit_batch

recipes/LibriTTS/TTS/mstacotron2/hparams/train.yaml. Here the issue is the following:
Traceback (most recent call last):
  File "/workspace/speechbrain/recipes/LibriTTS/TTS/mstacotron2/train.py", line 644, in <module>
    tacotron2_brain.fit(
  File "/workspace/speechbrain/speechbrain/core.py", line 1421, in fit
    self._fit_valid(valid_set=valid_set, epoch=epoch, enable=enable)
  File "/workspace/speechbrain/speechbrain/core.py", line 1335, in _fit_valid
    self.on_stage_end(Stage.VALID, avg_valid_loss, epoch)
  File "/workspace/speechbrain/recipes/LibriTTS/TTS/mstacotron2/train.py", line 363, in on_stage_end
    train_stats=self.last_loss_stats[sb.Stage.TRAIN],
                ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: <Stage.TRAIN: 1>

Hello @mravanelli, thanks for running the tests!

For CVSS : the issue is not related to this PR but from the fact that the HF hub is not yet sync with the new beam search :

ImportError: There is no such class as speechbrain.decoders.seq2seq.S2STransformerBeamSearch

For IWSLT22_lowresource, one issue was related to this PR, and the others were related to the keyword --skip_prep that was set twice in tests/recipes :

(1/6) Running test for IWSLT22_lowresource_row_02...
... 27.61s
(2/6) Running test for IWSLT22_lowresource_row_03...
... 91.50s
(3/6) Running test for IWSLT22_lowresource_row_04...
... 245.26s
(4/6) Running test for IWSLT22_lowresource_row_05...
... 96.67s
(5/6) Running test for IWSLT22_lowresource_row_06...
... 146.97s
(6/6) Running test for IWSLT22_lowresource_row_07...
... 386.77s
TEST PASSED

For LJSPEECH : done. It was related to a missing dict.

For Tedlium : done. We were using the wrong Beam Search class / passing an outdated parameter in the yaml file / using device in the load_collected() method / using device in the avg checkpoint method / and we were not handling the return output of the new beam search correctly.

(1/2) Running test for Tedlium2_row_02...
... 4.56s
(2/2) Running test for Tedlium2_row_03...
... 28.34s
TEST PASSED

For mstacotron2 : it does not seems to be linked to my PR :/

…tc-and-w2v2-with-fp16-is-going-nan

mravanelli · 2023-11-16T17:12:10Z

I did the last tests and everything looks now fine to me. Thank you @Adel-Moumen for working on this PR. This is another remarkable step toward speechbrain 1.0.

@Adel-Moumen

* Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * rename HF's files * fix docstrings * fix args docstrings * fix docstrings * change classes' names * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Refactor HF interface, adapt recipes * Fix docstrings * commonvoice * switchboard * update readme * update readme * update lionk in test file * remove unused space token * update torchaudio * remove deprecated language model path * fix merge * fix vocab * fix switchboard * commit * fix test * fix style * remove unsued hparam * fix consistancy blank_skip_threshold * text frames * CTCPrefixBeamSearcher timestamps * pre-commit * test * test 2 * fix prints * update ctcprefixbeamsearch timestamps * remove frames from prefix bs * ≈Revert "remove frames from prefix bs" This reverts commit 30900d9. * remove prefix bs * ≈Revert "remove prefix bs" This reverts commit 2f0c3cd. * Revert "update ctcprefixbeamsearch timestamps" This reverts commit ce09e19. * Revert "fix prints" This reverts commit bf36037. * Revert "test 2" This reverts commit 84cda94. * Revert "test" This reverts commit f17349f. * Revert "pre-commit" This reverts commit 4e1cf0d. * Revert "CTCPrefixBeamSearcher timestamps" This reverts commit c3d3cf7. * Revert "text frames" This reverts commit e67c761. * Revert "fix consistancy blank_skip_threshold" This reverts commit f97a391. * Update ctc.py * arg / timestamps * precommit * timesteps -> text_frames * ls seq2seq * transformer ls * fix naming * librispeech * aishell * fix linter * precommit * switchboard * timit * Dynamic batching fixed * authors * fix conformer large * indent * Revert "Fix dynamic batching" (#2173) * update doctest skip * Fix dynamic batching (#2174) * Revert "Revert "Fix dynamic batching" (#2173)" This reverts commit faa5e76. * Update interfaces.py * Update interfaces.py * Update text_to_sequence.py * fix w2v * aishell * cv * ls transformer * ls ssl * switchboard * timit * precommit * fix indent * fix arg * unit test sorting * unittests * remove if main * Small fixes in averaging checkpoints (#2181) * add ckpt avg unittest * avoid hard-coding number of averages * last fixes * fix recipe test * fix recipe test * convert print into logger * fix transducer recipe * remove typing * fix merge * precommit * Update LibriSpeech.csv * update to new dynamic batching args * Update unstable branch with new commits (#2196) * hyper branch/conf -former fixes * remove ctc.py from doctest * get back ctc.py * remove doctest for torchaudio * adapt gpt recipe * adapt gpt recipe * small follow up fix on openrir * remove doc test (for now) * fix issue greedy search * docstring * pre-commit * Fix issues unstable (#2216) Thank you @Adel-Moumen! I did the tests again and everything works now. As for your points on the recipe tests, I agree. We can eventually do that in another PR. * Fix missing file / import in huggingface_transformers (#2224) * init/imports * comment * add partial import * wav2vec -> wav2vec2 * fix ci * Text based HF (#2214) * add mbart * Add tristage scheduler * Add mbart beam search * Add IWLST recipes * Add new models' inteference interface * Add info of new models * Add nllb scores * Add new models' info * Add test info IWSLT recipe * Add test info IWSLT recipe * add docstrings for S2STransformerBeamSearcher * Update IWSLT recipes * Update IWSLT recipes * fix doctest * add requirements * add protobuf * fix doctest * small fixes * Add protobuf install * Minor reform * Remove protobuf * Fix docstings * Fix docstrings * minor reform * remove labse * change authorship * remove comments * minor changes * change authorship * Fix recipe test * add info * Update README.md * Update README.md * change recipe structure --------- Co-authored-by: Mirco Ravanelli <[email protected]> Co-authored-by: Adel Moumen <[email protected]> * Neural LM Rescoring (#2187) * baserescorerinterface * add rescorers * first attempt * update code * 1.57 wer * update * update code * update code * docstring example rnn * updata loader * docstring example * tests * docstring example * update * tmpdir * change path * update doc * docstring * docstring args * doctest * fix docstring example * unnittest * interface * yamls update * full_infernece tests * model link * readme * yaml/inference tests * update res * fix wav2vec with wav2vec2 --------- Co-authored-by: Mirco Ravanelli <[email protected]> * Add wrappers for Encodec and Vocos vocoders (#2231) * Add wrappers for Encodec and Vocos from Huggingface * Encodec: Add a comment * Encodec/Vocos: Add examples, restructure, fix masks * Vocos: Add a comment about the open pull request * Encodec/Vocos: Add the ability to customize save_path, fix a log message * Encodec/Vocos: Cosmetic changes * Vocos: Cosmetic changes * Encodec/Vocos: Remove the mandatory Vocos requirement * Vocos: Remove vocos from __init__.py * fix init * Vocos: Add a check for vocos in conftest.py * Vocos/Encodec: Update documentation, add bandwidth control * Fix old path in conftest.py * Cosmetic changes * Encodec/Vocos: Add support for embedding vectors * Encodec: Update example * Encodec/Vocodec: Add automatic reshaping, minor cosmetic changes --------- Co-authored-by: flexthink <[email protected]> Co-authored-by: Mirco Ravanelli <[email protected]> * Semantically-Aligned Multimodal Utterance-level (SAMU) pre-training (#2223) * add mbart * Add tristage scheduler * Add mbart beam search * Add IWLST recipes * Add new models' inteference interface * Add info of new models * Add nllb scores * Add new models' info * Add test info IWSLT recipe * Add test info IWSLT recipe * add docstrings for S2STransformerBeamSearcher * Update IWSLT recipes * Update IWSLT recipes * fix doctest * add requirements * add protobuf * fix doctest * small fixes * Add protobuf install * Minor reform * Remove protobuf * Fix docstings * Fix docstrings * minor reform * remove labse * Add attention pooling * Add labse * Add info about SAMU * add iwslt recipes with samu * fix recipe test * fix comments * fix recipe test * change recipe structure * fix test recipe * Add new recipes * minor doctest change * minor doctest change * small changes * add dropbox links --------- Co-authored-by: Mirco Ravanelli <[email protected]> * fix norm (#2237) * Discrete SSL (#2233) * clustering training recipies for LibriSpeech for different SSL model * add Discrete Hubert Model * load from HF, fix minor issues * fix hyper-param value * fix precommit * fix flake8 * fix batch_size and n_clus values in hyperparams * fix typos * fix typo and some cleaning * fix precommit * fix device incompatibility and memroty issue * use fit instead of partial fit * add README file * add test recipies * remove unused fields from hparams * fix precommmit-yamllint - extra whitespace * add docstring for load_kmeans for Discrete_hubert.py * add discrete wavlm, wav2vec * avoid docstring testing for discrete_ssl models * fix docstring failed issue * add discrete_interface to conftest.py * fix precommit * Fixes for Encodec (#2240) * Add wrappers for Encodec and Vocos from Huggingface * Encodec: Add a comment * Encodec/Vocos: Add examples, restructure, fix masks * Vocos: Add a comment about the open pull request * Encodec/Vocos: Add the ability to customize save_path, fix a log message * Encodec/Vocos: Cosmetic changes * Vocos: Cosmetic changes * Encodec/Vocos: Remove the mandatory Vocos requirement * Vocos: Remove vocos from __init__.py * fix init * Vocos: Add a check for vocos in conftest.py * Vocos/Encodec: Update documentation, add bandwidth control * Fix old path in conftest.py * Cosmetic changes * Encodec/Vocos: Add support for embedding vectors * Encodec: Update example * Encodec/Vocodec: Add automatic reshaping, minor cosmetic changes * Encodec: Decoupled token extraction, fixed CPU/GPU issues * Encodec: Add renormalization --------- Co-authored-by: flexthink <[email protected]> Co-authored-by: Mirco Ravanelli <[email protected]> * Refactoring of the 'fit_batch' function (#2010) * add dataclass * turn False * remove valid_step * update core.py * update core.py * update core.py * precommit * self.autocast + GradScaler enabled * freeze opt * naming * update core.py * comments * example transducer conformer * update core.py * small changes * naming + skip_grad_nans * doc * check * support cpu training * precision + doctrsting * name * change w2v * restore ckpt * remove file * remove casting * tests * whisper + fix tests * seq2seq ls * update transducer / transformer * remove on_optimizers_step_end + comments * update check yaml * remove default arg * add precision in yamls * add precision inside of the yamls * ckpt and scaler * run_opt outside brain + test * several recipe updates * improve w2v fit_batch fn * add arg * update name * timit * context manager * on_fit_batch_start * update CV * should_step with noam * add flag precision * naming * aishell * aishell * update recipes * so many recipes 0.0 * update recipes * last recipes * zero_grad * fix grad_accumulation_factor * update recipes * update auto_mix_prec flag * remove opt flag test * librispeech * cv ssl * audio mnist / realm * voicebank * fix rescuespeech * fix lr annealing * libritts * multiwoz * slurp nlu * should_step * update yamls * update yaml * update batch smpler tedlium * remove fit batch * precision flag * update sampler * add precision inside of the yamls * run_opt outside brain + test * fix auto_mix_prec flag * docstring * grad acc * failing test * update unittests * update jarod's pr * fix removed avg_checkpoint param * update path * fix some recipe tests * update samu recipe * fix hifigan/IWSLT * tedlium --------- Co-authored-by: Mirco Ravanelli <[email protected]> * Refactor Augmentation (#2206) * update * update * change folder * remove unnecesary file * update folder structure * add noise, add rev * augmenter refactor * refactor augment + example in templace * fix tests + linters * address comments * supporting variable-length augmentations in augmenter (e.g., speed change) * lib refactor (splitting time and freq augmentations) * fine tune freq drop * refactor of specaugment (freq-domain) - part 1 * converted specaument (freq domain) * refactor random shift * implemented cutcat, swap, and random selection * extended unittests + small fixes * improvements and fixes in augment * plugged feature augmentation + various fixes and improvements * add sum_batch noise (similat to babble) + various fixes * add drop bit resolution * added coded augmentation * added more unittests * restore all augmentations * making AddReveb more similar to AddNoise * fix device mismatch + fix last batch management * add workes to speed up AddNoise and AddRev * improve comments in template yaml * speed up template (sorting dev and test) * extend augmenter by adding activation provability * implemented enable augmentation flag (useful of hparam tuning) + other improvements * plugged coded augment * fixed coded augment * remove old files * fix integration test * remove knowledge distill TIMIT reicpes. Too many yaml files to maintain * convert TIMIT * fix recipe * converted templates using EnvCorr * converted voxceleb * converted GSC + fixes on voxceleb * convrted UrbanSound8k * converted voicebank * converted other recipes * converted CommonLanguage, VoxLingua, timers-and-such * converted all recipes using envcorr * CommonVoice * REAL-M * Aishell1Mix * LibriMix * converted all recipes! * fix linters - part1 * fix linters - part2 * add a note in the template regarding augmentation * fix docstring tests * fix yamls * remove coded tests from docstring * revised coded tests * fix identation in codec.py * try to fix doc issue * revise lib header in codec.oy * fix doc * fix doc attempt * rename sections * fix doc * fix (most) recipe tests * fix other recipe tests * address comments * fix yaml * fix * convert recipe * fix recipes * fix aug in rescoring recipes * Delete tmpdir_vocoder directory * Refactor Inference (files and folders) (#2252) * refactor inference files and folders * fix some tests * fix some tests * fix doctest * import lib * small fixes * Fix beam search (#2253) * fix starting pos prefix_length * block path ctc + fix default value to the old one * fix issue with score being -inf * remoev print * precommit * Fix ctc beam search (#2263) * fix logprobs / space_token / warnings * fix space_token * pre-commit * space_token * simplify parameters * simplify yamls * remove comma * update beam search * fix vocab/str (#2265) * Fix blank index ctc (#2266) * update blank_index * whisper * revert change * mistake * Cv unstable merge (#2254) * add fr preproccesing to Common_voice_prepare.py * add CV , CTC, new languages * fix precommit and test * add transducer recipie * add transformer recipies * update augmentation of CTC recipies * update seq-to-seq recipies * fix whisper HF interface bug. (return str insted of list) * fix recipe tests * add fr preproccesing to Common_voice_prepare.py * add CV , CTC, new languages * fix precommit and test * add transducer recipie * add transformer recipies * update augmentation of CTC recipies * update seq-to-seq recipies * fix whisper HF interface bug. (return str insted of list) * fix recipe tests * modify beamsearch for CTC: ar.es.pt and zh-CN * fix interface conflict * fix transducer interface bug --------- Co-authored-by: Mirco Ravanelli <[email protected]> * Add warnings and fix numba (#2271) * upperbound torch/tochaudio + remove opt dependancy * add back automix/bf flags * linters * oops * transformers back * test requirements * Fix Bug: CommonVoice Transformer Bug loading correct optimizer (#2278) * fix trnsfrm bug to load correct opt:adam vs sgd * add data_root to the path of common_voice_prepare.py * add epoch/_counter pretrainer to fr and it recepie * revert releative path change * fix opt bug without the need to add epoch_ckpt * add log and delete launch file * update the log message * update WeightedSSLModel (#2272) * update WeightedSSLModel * requirements.txt * fix pre-commit * Sg/dac (#2246) * introducing DAC * lint errors * black * documenttion * remove unused init file * Fixing tests * More doc strings * More doc strings * PR review * PR review * PR review * Update dac.py * Update dac.py * Update dac.py * make doctests smaller to avoid memory issues in CI * even smaller tests --------- Co-authored-by: Shubham Gupta <[email protected]> Co-authored-by: Mirco Ravanelli <[email protected]> * add quantization recipies fro IEMCAP, CV, LibriSpeech and LJSpeech (#2255) * add quantization recipies fro IEMCAP, CV, LibriSpeech and LJSpeech * update discrete_ssl models * add iemocap_prepare to main folder + add test * ix test for iemocap * fik typos * fix test recepies, minor dormat editting * fix typo in coomonvoice.csv * fix typo in yaml file * fix doctests (those that we do not run in the CI) --------- Co-authored-by: Mirco Ravanelli <[email protected]> * change emdedding type from long to float to vaoid getting al zeros embedding (#2292) * Update CVSS (#2285) * Update CVSS * Update train_fr-en.yaml * Update train_fr-en.yaml * Update HF interface (#2293) * RNN Tranducer Numba Loss: Add FP16 and BF16 support (code from Samsung AI Cambridge) (#2296) * Make lobes use fp32 when AMP is active (#2295) * Added utils.autocast with a fwd_default_precision function * Decorate all lobes to require float32 precision in AMP * Fix trailing space in docstring * Less confusing doc for fwd_default_precision * Be explicit that only fp inputs are affected by fwd_default_precision * Typo in docstring * Remove dtype annotation that is broken for some reason * Precommit checks will be the end of me * Fix tests * Add docstring to precision wrapper function * Fix style check again.. * adding support for fp16 transducer loss numba * adding support for fp16 transducer loss numba * fix fp16 transducer recipe * add note on half precision --------- Co-authored-by: asu <[email protected]> Co-authored-by: Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics <[email protected]> Co-authored-by: Mirco Ravanelli <[email protected]> * Fix recipe tests for TransformerASR (#2282) * fix position embedding (#2283) * fix position embedding * use speechbrain internal postional encoding and generate mask from sequence lengths * call mask function from core for tacotron * minor fix * fix device * reduce training epochs * update links --------- Co-authored-by: Mirco Ravanelli <[email protected]> * Gradscaler flags (#2281) * add flags for gradscaler * add check_loss_isfinite * update dict * typo * remove default * better message * fix pre-commit * remove checks * remove new arguments --------- Co-authored-by: Mirco Ravanelli <[email protected]> * add llama2 recipies (#2299) * add llama2 recipies * fix symbolic links * fix bug * remove unneccary input in docstring * fix typo * cleaning llama2 recepies * update readme * update interface and add licence to readme * fic doc string * fix precommit * fix extra-dependency * remove commented lines * inter epoch checkpoint * minor fixes * add extra req info in llama.py * fix linters --------- Co-authored-by: Mirco Ravanelli <[email protected]> * small fixes * make all recipes cpu-compliant + make recipe tests passing on both cpu and gpu * fix some broken links * remove link to private HF repo * remove link to private HF repo * fix libritts recipe test * fix ljspeech recipe test * Streamable Conformer-Transducer ASR model for LibriSpeech (#2140) * Introduce DCT+DCConv logic * DDP fix? * Batch of changes and things brought back * Streaming fixes (successfully trains) * WIP streaming code * WIP functional streaming code * Fix left context * Fix formatting * Cleanups and docs in streaming utils * Better comment hparams, change seed back to orig, improve naming * uncomment averaging stuff; it was some ipython issue * Remove pin_memory as it was not beneficial * More cleanups, comments on context stuff * More comments and TODOs * encode_streaming docstring * Dirty TransducerBeamSearcher change for streaming GS * Fix precommit * Fix encoders that do not support chunk_size * Pre-commit again * Make chunk_size type consistent * Fix formatting of doctest in split_wav_lens * Remove outdated TODO * Add hasattr streaming to retain model backcompat * Cleanup doc and naming for transducer_greedy_decode * Cite paper for chunked attention * Remove lost comment * Update comment in self-attention * Don't apply masked fill fix in the non-bool mask case * Added TODO README update * Revert change to custom_tgt_module; patching model instead * Remove added entry in README * Fix streaming conformer conv mismatch * More conformer conv adjustments * Adjust context size * Remove outdated comment * Fixed causal conformer decoder * Fix linting * Gate `custom_tgt_module` creation behind the presence of decoder layers * Re-enable checkpoint averaging * Change averaged ckpt count to 10 * Add new model results to README * WIP refactor: Introduce DCTConfig dataclass * Improved notice in README * Formatting and linting fixes * Attempt at fixing circular import? * utils can't depend on core it seems; move dct * Whoops, missed file * Add DCT test, fix issues * Remove now obsolete yaml variables for streaming * Formatting * Add dummy dct_config parameter to keep unsupported encoders working * Linting fix * Fix typo * Add note on runtime autocast accuracy * Fix very bad typo from refactor in YAML * Fix hasattr streaming check * Remove legacy comment * Fix left context size calculation in new mask code * Fix causal models in TransformerASR * Remove comment on high-level inference code * YAML formatting + commenting dynchunktrain stuff * Remove outdated comment about DCConv left contexts * Remove commented out debug prints from TransformerASR * Move DCT into utils again * Rename all(?) mentions of DCT to explicit dynamic chunk training * Clarify padding logic * Remove now-useless _do_conv, fix horrible formatting * Slightly fix formatting further * Add docstrings to forward_streaming methods * Add a reference on Dynamic Chunk Training * Rework conformer docstring docs * Update conformer author list, fix doc formatting for authors * Fix trailing whitespace in conformer * Improved comments in Conformer.forward * Added random dynchunktrain sampler example * More explicit names for mask functions in TransformerASR * Added docstring example on encode_streaming * Pre-commit fix * Fix typo in conformer * Initial streaming integration test * Precommit fix * Fix indent in YAML * More consistent spelling in streaming integration test * Update CommonVoice.csv * Add KenLM n-gram training recepie (#2304) * add kenlm training * fix precommit * update readmefile with new result * fix pre-commit * fix typo * fix commit reviews * fix bug in testing * add docstring and fix indentation * fix bug in ASR interface * change encoderasr interface to support ctc beam * add suppourt fro kenlm in enoderasr interface * fix typo * little changes in REAMDE files to improve clarity) * use binaries sources in bashrc * fix trailing-whitespace --------- Co-authored-by: Mirco Ravanelli <[email protected]> * Create Performance file (automatically) (#2314) * add performance readme builder * update recipe csv files * update README files * add not in prerelease test * added performance.md * fix linters * update info in README * Llama2 interface bug (#2318) * fix llama2 interface bug * fix minor bug * update multiwox.csv with correct db and HF link * New README file (#2315) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Optimize masked Dynamic Chunk Convolution (#2308) * Reorganized some conformer convolution module to be faster * Completely get rid of the list of slices in the conformer conv module * Fix linter check * Remove unused variable * More unused variables.. * Remove unused import * Add conformer streaming code path test * Fix test formatting * small fixes in tests * Update RNNLM.yaml * BayesSpeech (#2326) * Create train_bayesspeech.py * Create bayesspeech.yaml * Update README.md * Update LibriSpeech.csv * add extra-req --------- Co-authored-by: Mirco Ravanelli <[email protected]> * adding new controllable exp scheduler * adding new controllable exp scheduler * update performance file * Update PERFORMANCE.md * Update README.md --------- Co-authored-by: mhn226 <[email protected]> Co-authored-by: Adel Moumen <[email protected]> Co-authored-by: Adel Moumen <[email protected]> Co-authored-by: Ha Nguyen <[email protected]> Co-authored-by: flexthink <[email protected]> Co-authored-by: flexthink <[email protected]> Co-authored-by: Pooneh Mousavi <[email protected]> Co-authored-by: shubham-gupta-30 <[email protected]> Co-authored-by: Shubham Gupta <[email protected]> Co-authored-by: Parcollet Titouan <[email protected]> Co-authored-by: asu <[email protected]> Co-authored-by: Titouan Parcollet/Embedded AI /SRUK/Engineer/Samsung Electronics <[email protected]> Co-authored-by: Luca Della Libera <[email protected]> Co-authored-by: Yingzhi WANG <[email protected]> Co-authored-by: BenoitWang <[email protected]>

This was referenced Jun 12, 2023

Change gradient accumulation logic #2018

Merged

[Bug]: CTC and w2v2 with fp16 is going NaN #1803

Closed

mravanelli requested a review from TParcollet June 17, 2023 14:11

Adel-Moumen marked this pull request as ready for review June 19, 2023 14:20

Adel-Moumen requested a review from asumagic June 19, 2023 14:21

asumagic self-assigned this Jun 20, 2023

Adel-Moumen marked this pull request as draft June 20, 2023 10:03

Adel-Moumen assigned TParcollet Jun 20, 2023

asumagic mentioned this pull request Jun 21, 2023

[Bug]: DDP is going blblblblblblblbl #1802

Closed

Adel-Moumen mentioned this pull request Jun 27, 2023

Can NOT speed up with DDP #1517

Closed

Adel-Moumen changed the title ~~1803 bug ctc and w2v2 with fp16 is going nan + add bf16~~ Fix CTC loss NaN + refactoring of the 'fit_batch' function + fix DDP Jun 29, 2023

TParcollet marked this pull request as ready for review July 7, 2023 15:29

TParcollet approved these changes Jul 7, 2023

View reviewed changes

Adel-Moumen changed the title ~~Fix CTC loss NaN + refactoring of the 'fit_batch' function + fix DDP~~ [WIP] Fix CTC loss NaN + refactoring of the 'fit_batch' function + fix DDP Aug 18, 2023

Adel-Moumen assigned Adel-Moumen and unassigned TParcollet Aug 31, 2023

asumagic removed their assignment Sep 1, 2023

Adel-Moumen mentioned this pull request Sep 2, 2023

ValueError: Loss is not finite and patience is exhausted. #2085

Closed

Adel-Moumen closed this Sep 2, 2023

Adel-Moumen and others added 20 commits October 30, 2023 10:53

update yaml

db3babd

update batch smpler tedlium

8e16ee0

remove fit batch

43b5a40

Merge remote-tracking branch 'upstream/unstable-v0.6' into 1803-bug-c…

969e893

…tc-and-w2v2-with-fp16-is-going-nan

precision flag

b13520d

update sampler

3590ec3

add precision inside of the yamls

1aec0eb

run_opt outside brain + test

934f58f

Merge remote-tracking branch 'origin/unstable-v0.6' into 1803-bug-ctc…

c0693b7

…-and-w2v2-with-fp16-is-going-nan

fix auto_mix_prec flag

e3acdb6

docstring

07079eb

grad acc

01d5720

failing test

bf34e13

update unittests

dd3e39b

update jarod's pr

5d60820

fix removed avg_checkpoint param

d834a68

update path

3e283e5

Merge branch '1803-bug-ctc-and-w2v2-with-fp16-is-going-nan' of https:…

91ef4aa

…//github.com/Adel-Moumen/speechbrain into 1803-bug-ctc-and-w2v2-with-fp16-is-going-nan

Merge remote-tracking branch 'upstream/unstable-v0.6' into 1803-bug-c…

63d91f3

…tc-and-w2v2-with-fp16-is-going-nan

fix some recipe tests

0e90e38

Adel-Moumen added 3 commits November 15, 2023 21:08

update samu recipe

1f2f2f0

fix hifigan/IWSLT

0f77f03

tedlium

0ea08b8

Merge remote-tracking branch 'upstream/unstable-v0.6' into 1803-bug-c…

8a8ab8e

…tc-and-w2v2-with-fp16-is-going-nan

mravanelli merged commit 44dcf3d into speechbrain:unstable-v0.6 Nov 16, 2023

Adel-Moumen mentioned this pull request Jan 7, 2024

Brain's check_gradients seems to not do what name implies? #1496

Closed

Conversation

Adel-Moumen commented May 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adel-Moumen commented Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asumagic commented Jun 21, 2023

Handling non-finite losses and gradients

Gradient clipping

API design

Uh oh!

pplantinga commented Jun 23, 2023

Uh oh!

Adel-Moumen commented Jun 23, 2023

Uh oh!

Adel-Moumen commented Jun 23, 2023

Uh oh!

asumagic commented Jun 23, 2023

Uh oh!

pplantinga commented Jun 23, 2023

Uh oh!

Adel-Moumen commented Jun 23, 2023

Uh oh!

asumagic commented Jul 4, 2023

Uh oh!

TParcollet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pplantinga commented Aug 18, 2023

Uh oh!

Adel-Moumen commented Aug 18, 2023

Uh oh!

mravanelli commented Nov 14, 2023

Uh oh!

Adel-Moumen commented Nov 15, 2023

Uh oh!

mravanelli commented Nov 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Adel-Moumen commented May 31, 2023 •

edited

Loading

Adel-Moumen commented Jun 20, 2023 •

edited

Loading

TParcollet left a comment •

edited

Loading