Skip to content

Commit

Permalink
Merge 2.3.x in main (RasaHQ#8064)
Browse files Browse the repository at this point in the history
* restrict cosine and inner for ted, add warnings to remove cosine if used elsewhere

* add changelog and fix migration guide

* review

* reverted back auto-config

* completely remove cosine as model confidence

* reverted back auto-config

* use linear normalization for all model confidences

* remove ranking length

* normalized version of inner confidences

* add docs

* Apply suggestions from code review

* add dimension to reduce sum

* fix tests

* copy edits

* change inner to linear_norm_inner and add some tests.'

* rename to linear_norm, add docs

* edit docs in components

* formatting

* Update changelog/8014.bugfix.md

Co-authored-by: Vladimir Vlasov <[email protected]>

* bump version in docs

* remove redundant test

* Update GPU cluster

* Update zone

* Update resource request

* Add a missing secret

* Apply suggestions from code review

Docs review

Co-authored-by: Akela Drissner-Schmid <[email protected]>

* Update docs/docs/migration-guide.mdx

* more docs comments

* final edits

* format lists

* remove 'also' and fix typo CI'

* prepared release of version 2.3.4

* Update links to Sanic docs in our documentation

* Changelog entry

* Added 1.10.23 updates to 2.3.X Changelog (RasaHQ#8077)

* Updated 2.X changelog to include 1.10.23 update

* Update 2.3.X changelog.mdx

* Updated link formatting

* Update CHANGELOG.mdx

Co-authored-by: Tobias Wochinger <[email protected]>

Co-authored-by: Tobias Wochinger <[email protected]>

Co-authored-by: Vladimir Vlasov <[email protected]>
Co-authored-by: tczekajlo <[email protected]>
Co-authored-by: Akela Drissner-Schmid <[email protected]>
Co-authored-by: alwx <[email protected]>
Co-authored-by: Ben Quachtran <[email protected]>
Co-authored-by: Tobias Wochinger <[email protected]>
Co-authored-by: m-vdb <[email protected]>
  • Loading branch information
8 people authored Mar 3, 2021
1 parent 18db587 commit 8eb2ea4
Show file tree
Hide file tree
Showing 23 changed files with 208 additions and 181 deletions.
30 changes: 30 additions & 0 deletions CHANGELOG.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,27 @@ https://github.com/RasaHQ/rasa/tree/main/changelog/ . -->

<!-- TOWNCRIER -->

## [2.3.4] - 2021-02-26


### Bugfixes
- [#8014](https://github.com/rasahq/rasa/issues/8014): Setting `model_confidence=cosine` in `DIETClassifier`, `ResponseSelector` and `TEDPolicy` is deprecated and will no longer be available. This was introduced in Rasa Open Source version `2.3.0` but post-release experiments suggest that using cosine similarity as model's confidences can change the ranking of predicted labels which is wrong.

`model_confidence=inner` is deprecated and is replaced by `model_confidence=linear_norm` as the former produced an unbounded range of confidences which broke the logic of assistants in various other places.

We encourage you to try `model_confidence=linear_norm` which will produce a linearly normalized version of dot product similarities with each value in the range `[0,1]`. This can be done with the following config:
```yaml
- name: DIETClassifier
model_confidence: linear_norm
constrain_similarities: True
```
This should ease up [tuning fallback thresholds](./fallback-handoff.mdx#fallbacks) as confidences for wrong predictions are better distributed across the range `[0, 1]`.

If you trained a model with `model_confidence=cosine` or `model_confidence=inner` setting using previous versions of Rasa Open Source, please re-train by either removing the `model_confidence` option from the configuration or setting it to `linear_norm`.

`model_confidence=cosine` is removed from the configuration generated by [auto-configuration](model-configuration.mdx#suggested-config).


## [2.3.3] - 2021-02-25


Expand Down Expand Up @@ -118,6 +139,9 @@ https://github.com/RasaHQ/rasa/tree/main/changelog/ . -->
Configuration option `loss_type=softmax` is now deprecated and will be removed in Rasa Open Source 3.0.0 . Use `loss_type=cross_entropy` instead.

The default [auto-configuration](model-configuration.mdx#suggested-config) is changed to use `constrain_similarities=True` and `model_confidence=cosine` in ML components so that new users start with the recommended configuration.

**EDIT**: Some post-release experiments revealed that using `model_confidence=cosine` is wrong as it can change the order of predicted labels. That's why this option was removed in Rasa Open Source version `2.3.3`. `model_confidence=inner` is deprecated as it produces an unbounded range of confidences which can break the logic of assistants in various other places. Please use `model_confidence=linear_norm` which will produce a linearly normalized version of dot product similarities with each value in the range `[0,1]`. Please read more about this change under the notes for release `2.3.4`.

- [#7817](https://github.com/rasahq/rasa/issues/7817): Use simple random uniform distribution of integers in negative sampling, because
negative sampling with `tf.while_loop` and random shuffle inside creates a memory leak.
- [#7848](https://github.com/rasahq/rasa/issues/7848): Added support to configure `exchange_name` for [pika event broker](event-brokers.mdx#pika-event-broker).
Expand Down Expand Up @@ -1316,6 +1340,12 @@ https://github.com/RasaHQ/rasa/tree/main/changelog/ . -->
- [#5784](https://github.com/rasahq/rasa/issues/5784), [#5788](https://github.com/rasahq/rasa/issues/5788), [#6199](https://github.com/rasahq/rasa/issues/6199), [#6403](https://github.com/rasahq/rasa/issues/6403), [#6735](https://github.com/rasahq/rasa/issues/6735)


## [1.10.23] - 2021-02-22

### Bugfixes
- [#7895](https://github.com/rasahq/rasa/issues/7895): Fixed bug where the conversation does not lock before handling a reminder event.


## [1.10.22] - 2021-02-05

### Bugfixes
Expand Down
1 change: 1 addition & 0 deletions changelog/8080.docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update links to Sanic docs in the documentation.
3 changes: 0 additions & 3 deletions data/test_config/config_empty_en_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,10 @@ pipeline:
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand All @@ -32,5 +30,4 @@ policies:
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
1 change: 0 additions & 1 deletion data/test_config/config_empty_en_after_dumping_core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,4 @@ policies:
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
2 changes: 0 additions & 2 deletions data/test_config/config_empty_en_after_dumping_nlu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,10 @@ pipeline:
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand Down
3 changes: 0 additions & 3 deletions data/test_config/config_empty_fr_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,10 @@ pipeline:
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand All @@ -32,5 +30,4 @@ policies:
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
1 change: 0 additions & 1 deletion data/test_config/config_with_comments_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ policies: # even here
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy

# comments everywhere
47 changes: 33 additions & 14 deletions docs/docs/components.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1474,6 +1474,19 @@ Intent classifiers assign one of the intents defined in the domain file to incom
set `weight_sparsity` to 1 as this would result in all kernel weights being 0, i.e. the model is not able
to learn.

* `constrain_similarities`:
This parameter when set to `True` applies a sigmoid cross entropy loss over all similarity terms.
This helps in keeping similarities between input and negative labels to smaller values.
This should help in better generalization of the model to real world test sets.

* `model_confidence`:
This parameter allows the user to configure how confidences are computed during inference. It can take two values:
* `softmax`: Confidences are in the range `[0, 1]` (old behavior and current default). Computed similarities are normalized with the `softmax` activation function.
* `linear_norm`: Confidences are in the range `[0, 1]`. Computed dot product similarities are normalized with a linear function.

Please try using `linear_norm` as the value for `model_confidence`. This should make it easier to tune fallback thresholds for the [FallbackClassifier](./components.mdx#fallbackclassifier).


The above configuration parameters are the ones you should configure to fit your model to your data.
However, additional parameters exist that can be adapted.

Expand Down Expand Up @@ -1623,16 +1636,13 @@ However, additional parameters exist that can be adapted.
| | | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each intent |
| | | is computed. It can take three values |
| | | is computed. It can take two values: |
| | | 1. `softmax` - Similarities between input and intent |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all intents sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and intent |
| | | embeddings. Confidence for each intent is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and intent |
| | | embeddings. Confidence for each intent is in an unbounded |
| | | range. |
| | | 2. `linear_norm` - Linearly normalized dot product similarity|
| | | between input and intent embeddings. Confidence for each |
| | | intent will be in the range `[0,1]` |
| | | This parameter does not affect the confidence for entity |
| | | prediction. |
+---------------------------------+------------------+--------------------------------------------------------------+
Expand Down Expand Up @@ -2688,6 +2698,18 @@ Selectors predict a bot response from a set of candidate responses.
set `weight_sparsity` to 1 as this would result in all kernel weights being 0, i.e. the model is not able
to learn.

* `constrain_similarities`:
This parameter when set to `True` applies a sigmoid cross entropy loss over all similarity terms.
This helps in keeping similarities between input and negative labels to smaller values.
This should help in better generalization of the model to real world test sets.

* `model_confidence`:
This parameter allows the user to configure how confidences are computed during inference. It can take two values:
* `softmax`: Confidences are in the range `[0, 1]` (old behavior and current default). Computed similarities are normalized with the `softmax` activation function.
* `linear_norm`: Confidences are in the range `[0, 1]`. Computed dot product similarities are normalized with a linear function.

Please try using `linear_norm` as the value for `model_confidence`. This should make it easier to tune fallback thresholds for the [FallbackClassifier](./components.mdx#fallbackclassifier).

The component can also be configured to train a response selector for a particular retrieval intent.
The parameter `retrieval_intent` sets the name of the retrieval intent for which this response selector model is trained.
Default is `None`, i.e. the model is trained for all retrieval intents.
Expand Down Expand Up @@ -2841,16 +2863,13 @@ However, additional parameters exist that can be adapted.
| | | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+-------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each response label |
| | | is computed. It can take three values |
| | | is computed. It can take two values: |
| | | 1. `softmax` - Similarities between input and response label |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all labels sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and response |
| | | label embeddings. Confidence for each label is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and |
| | | response label embeddings. Confidence for each label is in an|
| | | unbounded range. |
| | | 2. `linear_norm` - Linearly normalized dot product similarity|
| | | between input and response label embeddings. Confidence for |
| | | each label is in the range `[0, 1]`. |
+---------------------------------+-------------------+--------------------------------------------------------------+
```

Expand Down
44 changes: 36 additions & 8 deletions docs/docs/migration-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,32 @@ description: |
This page contains information about changes between major versions and
how you can migrate from one version to another.

## Rasa 2.3.3 to Rasa 2.3.4

:::caution
This is a release **breaking backwards compatibility of machine learning models**.
It is not possible to load previously trained models if they were trained with `model_confidence=cosine` or
`model_confidence=inner` setting. Please make sure to re-train the assistant before trying to use it with this improved version.

:::

### Machine Learning Components

Rasa Open Source `2.3.0` introduced the option of using cosine similarities for model confidences by setting `model_confidence=cosine`. Some post-release experiments revealed that using `model_confidence=cosine` is wrong as it can change the order of predicted labels. That's why this option was removed in Rasa Open Source version `2.3.4`.

`model_confidence=inner` is deprecated as it produces an unbounded range of confidences which can break
the logic of assistants in various other places.

We encourage you to try `model_confidence=linear_norm` which will produce a linearly normalized version of dot product similarities with each value in the range `[0,1]`. This can be done with the following config:
```
- name: DIETClassifier
model_confidence: linear_norm
constrain_similarities: True
```

If you trained a model with `model_confidence=cosine` or `model_confidence=inner` setting using previous versions of Rasa Open Source, please re-train by either removing the `model_confidence` option from the configuration or setting it to `linear_norm`.


## Rasa 2.2 to Rasa 2.3

### Machine Learning Components
Expand All @@ -20,22 +46,24 @@ components `DIETClassifier`, `ResponseSelector` and `TEDPolicy`. These include:
2. The default loss function (`loss_type=cross_entropy`) can add an optional sigmoid cross-entropy loss of all similarity values to constrain
them to an approximate range. You can turn on this option by setting `constrain_similarities=True`. This should help the models to perform better on real world test sets.

Also, a new option `model_confidence` has been added to each ML component. It affects how a model's confidence for each label is computed during inference. It can take one of three values:
1. `softmax` - Similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
A new option `model_confidence` has been added to each ML component. It affects how the model's confidence for each label is computed during inference. It can take one of three values:
1. `softmax` - Dot product similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
2. `cosine` - Cosine similarity between input and label embeddings. Confidence for each label will be in the range `[-1,1]`.
3. `inner` - Dot product similarity between input and label embeddings. Confidence for each label will be in an unbounded range.
The default value is `softmax`, but we recommend using `cosine` as that will be the new default value from Rasa Open Source 3.0.0 onwards.
The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier` and `TEDPolicy`.
3. `linear_norm` - Dot product similarities between input and label embeddings are post-processed with a linear normalization function. Confidence for each label will be in the range `[0,1]`.

The default value is `softmax`, but we recommend trying `linear_norm`. This should make it easier to [tune thresholds for triggering fallback](./fallback-handoff.mdx#fallbacks).
The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier`.

With both the above recommendations, users should configure their ML component, e.g. `DIETClassifier`, as:
We encourage you to try both the above recommendations. This can be done with the following config:
```
- name: DIETClassifier
model_confidence: cosine
model_confidence: linear_norm
constrain_similarities: True
...
```
Once the assistant is re-trained with the above configuration, users should also tune fallback confidence thresholds.
Once the assistant is re-trained with the above configuration, users should also [tune fallback confidence thresholds](./fallback-handoff.mdx#fallbacks).

**EDIT**: Some post-release experiments revealed that using `model_confidence=cosine` is wrong as it can change the order of predicted labels. That's why this option was removed in Rasa Open Source version `2.3.4`.

## Rasa 2.1 to Rasa 2.2

Expand Down
Loading

0 comments on commit 8eb2ea4

Please sign in to comment.