preface: if you're reading this now, you're an alpha tester of this code - your feedback is super valuable! thanks for trying it out
rasa_nlu is a tool for intent classification and entity extraction. You can think of rasa_nlu as a set of high level APIs for building your own language parser using existing NLP and ML libraries. The intended audience is mainly people developing bots. It can be used as a drop-in replacement for wit or LUIS, but works as a local service rather than a web API.
The setup process is designed to be as simple as possible. If you're currently using wit or LUIS, you just:
- download your app data from wit or LUIS and feed it into rasa_nlu
- run rasa_nlu on your machine and switch the URL of your wit/LUIS api calls to
localhost:5000/parse
.
Reasons you might use this over one of the aforementioned services:
- you don't have to hand over your data to FB/MSFT/GOOG
- you don't have to make a
https
call every time. - you can tune models to work well on your particular use case.
These points are laid out in more detail in a blog post.
rasa_nlu is written in Python, but it you can use it from any language through a HTTP API. If your project is written in Python you can simply import the relevant classes.
rasa is a set of tools for building more advanced bots, developed by LASTMILE. This is the natural language understanding module, and the first component to be open sourced.
python setup.py install
python -m rasa_nlu.server -e wit &
curl 'http://localhost:5000/parse?q=hello'
# returns e.g. '{"intent":"greet","entities":[]}'
There you go! you just parsed some text. Important command line options for rasa_nlu.server
are as follows:
- emulate: which service to emulate, can be 'wit' or 'luis', or just leave blank for default mode. This only affects the format of the json response.
- server_model_dir: dir where your trained models are saved. If you leave this blank rasa_nlu will just use a naive keyword matcher.
run python -m rasa_nlu.server -h
to see more details.
Click the button to deploy this to heroku. For now this only runs the server, you can't yet train models through the HTTP API.
rasa_nlu itself doesn't have any external requirements, but to do something useful with it you need to install & configure a backend.
The MITIE backend is all-inclusive, in the sense that it provides both the NLP and the ML parts.
pip install git+https://github.com/mit-nlp/MITIE.git
and then download the MITIE models. The file you need is total_word_feature_extractor.dat
You can also run using these two in combination. spaCy is an excellent library for NLP tasks. scikit-learn is a popular ML library.
pip install -U spacy
python -m spacy.en.download all
pip install -U scikit-learn
OR if you prefer (especially if you don't already have numpy/scipy
installed), you can install scikit-learn by:
- installing anaconda
conda install scikit-learn
As of now, rasa_nlu doesn't provide a tool to help you create & annotate training data.
If you don't have an existing wit or LUIS app, you can try this example using the data/demo-restaurants.json
file, or create your own json file in the same format.
Download your data from wit or LUIS. When you export your model from wit you will get a zipped directory. The file you need is expressions.json
.
If you're exporting from LUIS you get a single json file, and that's the one you need. Create a config file (json format) like this one:
{
"path" : "/path/to/models/",
"data" : "expressions.json",
"backend" : "mitie",
"backends" : {
"mitie": {
"fe_file":"/path/to/total_word_feature_extractor.dat"
}
}
}
and then pass this file to the training script
python -m rasa_nlu.train -c config.json
you can also override any of the params in config.json with command line arguments. Run python -m rasa_nlu.train -h
for details.
After training you will have a new dir containing your models, e.g. /path/to/models/model_XXXXXX
.
Just pass this path to the rasa_nlu.server
script:
python -m rasa_nlu.server -e wit -d '/path/to/models/model_XXXXXX'
When the rasa_nlu server is running, it keeps track of all the predictions it's made and saves these to a log file. By default this is called rasa_nlu_log.json
You can fix any incorrect predictions and add them to your training set to improve your parser.
- training models through the HTTP API
- entity normalisation: as is, the named entity extractor will happily extract
cheap
&inexpensive
as entities of theexpense
class, but will not tell you that these are realisations of the same underlying concept. You can easily handle that with a list of aliases in your code, but we want to offer a more elegant & generalisable solution. Word Forms looks promising. - parsing structured data, e.g. dates. We might use parsedatetime or parserator or wit.ai's very own duckling.
- python 3 support
- support for more (human) languages
- not tested with python 3, so probably won't work
- any other issues, reach out to [email protected]
Copyright 2016 LastMile Technologies Ltd
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.