Your first Google Assistant skill
How to build conversational app for Google Home or Google Assistant
Smart home speakers, assistant platforms and cross-device solutions, so you can talk to your smartwatch and see the result on your TV or car’s dashboard. Personal assistants and VUIs are slowly appearing around us and it’s pretty likely that they will make our lives much easier.
Because of my great faith that natural language will be the next human-machine interface, I decided to start writing new blog posts series and building an open source code where I would like to show how to create new kind of apps: conversational oriented, device-independent assistant skills which will give us freedom in platform or hardware we use.
And will bring the most natural interface for humans - voice.
This post is a part of series about building personal assistant app, designed for voice as a primary user interface. More posts in series:
- This post
- Personalize Google Assistant skill with user data
- Surface capabilities in Google Assistant skills
WaterLog assistant skill
In this post we’ll start with the simples implementation of assistant skill. WaterLog is an app which lets us track daily water intake by talking or writing in natural language directly to Google Assistant. First version of the app will have ability to log how much liters or milliliters of water we have drunk during the day.
For the sake of simplicity we’ll skip theory behind VUI design focus only on technical aspects of how to build fully working implementation.
Here are scenarios of possible conversations (happy paths):
New user
User: Ok Google, Talk to WaterLog
WaterLog: Hey! Welcome to Water Log. Do you know that you should drink about 3 liters of water each day to stay healthy? How much did you drink so far?
User: I drunk 500ml of water
WaterLog: Ok, I’ve added 500ml of water to your daily log. In sum you have drunk 500ml today. Let me know when you drink more! See you later.
Returning user
User: Ok Google, Talk to WaterLog
WaterLog: Hey! You have drunk 500ml today. How much water should I add now?
User: 100ml
WaterLog: Ok, I’ve added 100ml of water to your daily log. In sum you have drunk 600ml today. Let me know when you drink more! See you later.
Returning user asking for logged water
User: Ok Google, Ask WaterLog how much water have I drunk today?
WaterLog: In sum you have drunk 600ml today. Let me know when you drink more! See you later.
In case you would like to test this skill on your device, it’s available live in Google Assistant directory, or on website:
Getting started
The app is extremely simple but even this kind of project still requires to tie some pieces together to make it working. While we have a lot of freedom when it comes to platform selection (we could build our app in many different languages and host it on any cloud solutions like Google Cloud Platform or Amazon Web Services), at the beginning we choose the most recommended tech stack:
- Firebase Cloud Functions and Realtime Database for app backend,
- Dialogflow for conversation definitions and natural language understanding,
- JavaScript/Node.js for app code (at this moment this is the only supported language by Firebase Cloud Functions),
- Google Actions SDK for Google Assistant integration (in the future we would give a try to other platforms like Amazon Alexa or Facebook Messenger Platform).
I won’t write detailed explanation how to connect all of those together. Google Actions website has really good step-by-step guide how to do this:
In short:
- Start with new project on Actions on Google console.
- When it’s done, you will be asked to pick a tool or platform to build assistant skill. Like I said, it’ll be Dialogflow. If you do it right, your apps (Actions and Dialogflow) should be connected. You can check this in Dialogflow agent setting (see Google Project property):
Dialogflow agent
First big piece of our assistant app is conversational agent, which is built on Dialogflow platform in our case. The most important role of it is to understand what user says to our app and convert natural language sentence into actions and properties which can be handled by our code. And this is exactly what Dialogflow Intents do.
According to the documentation:
An intent represents a mapping between what a user says and what action should be taken by your software.
Let’s start defining our intents. Here are the list of sentences which we would like to handle:
Default Fallback Intent
The only one which we leave untouched for now. Like the name says, this intent is triggered if a user’s input is not matched by any of the regular intents or enabled domains. Documentation. It’s worth mentioning that this intent isn’t even passed into our application code. It’s entirely handled by Dialogflow platform.
welcome_user
Event used to greet our user. It’s used always when user ask for our app (e.g. Ok Google, talk to WaterLog) without any additional intention.
— Config —
Action name: input.welcome
Events: WELCOME
, GOOGLE_ASSISTANT_WELCOME
— events are additional mappings which allow to invoke intents by an event name instead of a user query.
Fulfillment: ✅ Use webhook
— Intent welcome_user will be passed to our backend.
log_water
Event is used to save how much water user would like to log during the conversation. There will be a couple cases which we would like to handle in the same way. Let’s list some of them:
- Ok Google, Talk to WaterLog to log 1 liter of water — intent is triggered immediately when user invoke our action. In this case welcome intent is skipped. More about assistant invocation can be found in Google Actions documentation.
- Log 500ml of water — told in the middle of conversation, when app is waiting for user’s input.
- 500ml — usually as an answer for assistant question:
WaterLog: …how much water did you drink today?
User: 500ml
To handle similar cases we need to provide example utterances which could be told by users. Examples then are used by Dialogflow Machine Learning to teach our agent to understand user input. The more examples we use, the smarter our agent becomes.
Additionally we need to annotate fragments of our examples which needs to be handled in special way, so e.g. our app knows that utterance:
I have drunk 500ml of water
contains number and units of volume of water that has been drunk. All we have to do is to select fragment and pick correct entity (there are plenty of built-in entities, see the documentation).
— Config —
Action name: log_water
User says (should be much more examples, esp. in more complex apps):
Fulfillment: ✅ Use webhook
Google assistant: ✅ End conversation
— pick this to let Google Assistant know that conversation should be finished here.
get_logged_water
Event used to user how much water he or she has drunk in current day. Similarly to log_water, there different ways to invoke this intent:
- Ok Google, ask WaterLog how much water did I drink today? — called instead of welcome intent when the action is known,
- How much did I drink? — asked in the middle of conversation with our app.
— Config —
Action name: get_logged_water
User says:
Fulfillment: ✅ Use webhook
Google assistant: ✅ End conversation
And that’s it for Dialogflow configuration for now. If you would like to see full config, you can download it and import into you agent from the repository (WaterLog.zip file).
The code
If you followed Actions on Google guide (Build fulfillment), you should already have basic code structure, deployed fulfillment into Firebase Cloud Functions and connected it with Dialogflow agent through fulfillment config.
Now let’s build a code for WaterLog app. Repository with final implementation is available on Github:
Basically what we have to do is to handle all Intents defined in DialogFlow app. We’ll define them in functions/assistant-actions.js
file:
The core of our app is index.js file which is also the implementation of HTTP Trigger for Firebase Cloud Function (an endpoint in short 🙂):
In our Cloud Function we defined mapping of Intents into functions which need to be called as a fulfillment for conversation.
As an example let’s see conversation.actionLogWater()
(fulfillment for log_water Intent):
Here is what happens:
- App is getting argument from utterance extracted by Dialogflow. For input: Log 500ml of water we’ll get an object
{“amount”:500,”unit”:”ml”}
. - Through waterLog implementation app is saving this data into Firebase Realtime Database.
- At the end app is getting sum of logged water and pass it as a response into dialogflowApp object.
tell()
function responses to user and closes the conversation (docs).
See full code of Conversation class: conversation.js.
The rest of code doesn’t do much more interesting things. Conversation class is responsible for handling user input. WaterLog saves and retrieves data from Firebase Realtime Database (about logged water). UserManager adds some helpers for (anonymous) user handling.
Unit testing
While this paragraph isn’t directly connected with assistant apps or voice interfaces, I believe it’s still extremely important in each kind of software we build. Just imagine that every time you change something in the code, you need to deploy function and start conversation with you app. In WaterLog app it was relatively simple (but still it took at least tens of deployments). In bigger apps it will be critical to have unit tests. It will speed up development time by order of magnitude.
All unit tests for our classes can be found under functions/test/
directory. Tests in this project aren’t extremely sophisticated (they use sinon.js and chai libraries without any additional extensions) they still helped a lot with going to production in relatively short time.
Here is the output from $ npm test
:
Source code
Full source code of WaterLog app with:
- Firebase Cloud Functions
- Dialogflow agent configuration
- Assets required for app distribution
can be found on Github:
Thanks for reading! 😊