Support disabling loading of quadrigram and fivegram models #136
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relates to #101
Adds the function
LanguageDetectorBuilder.withoutQuadrigramAndFivegramModels()
which disables loading of quadrigram and fivegram models. Quadrigram and fivegram models take up the majority of memory during runtime; if my measurements are correct, all language models preloaded require ~1783 MB, whereas only unigram, bigram and trigram models require ~110 MB. However, for larger textsLanguageDetector
does not actually use them.Therefore, for use cases where it is known beforehand that most or all texts will be longer than ~120 chars, it should be relatively safe to disable of loading quadrigram and fivegram models.
Any feedback, especially regarding the builder function name and documentation, is appreciated.