Hello Android developers, 2023 was the year that machine learning and artificial intelligence really became mainstream, and we covered both topics with a focus on Android implementations. We published series on using the ONNX machine learning runtime, building Android apps with Microsoft Graph, and tutorials for Jetpack Compose developers! Take a look back at all […]
The post 2023 year in review appeared first on Surface Duo Blog.
]]>Hello Android developers,
2023 was the year that machine learning and artificial intelligence really became mainstream, and we covered both topics with a focus on Android implementations. We published series on using the ONNX machine learning runtime, building Android apps with Microsoft Graph, and tutorials for Jetpack Compose developers! Take a look back at all the best posts from 2023…
The blog focused heavily on working with OpenAI on Android using Kotlin, starting with some basic API access and then building out the JetchatAI demo using a variety of techniques including embeddings, “RAG” (retrieval augmented generation), images, functions, and more. Here are some of our favorite posts:
Topic |
Blog posts |
Assistants API |
|
DALL*E image generation |
|
Embeddings API |
|
Chat API |
“Infinite chat” with history summarization |
Other topics |
We also talked about responsible AI and speech-to-speech conversations with AI. All these posts used the OpenAI API but the same features are available via the Azure OpenAI Service.
ONNX runtime is a cross-platform machine learning foundation that can be added to Android apps. There was a series of posts to get started with ONNX on Android:
You can also use ONNX Runtime with Flutter!
Microsoft Graph provides a unified programming model to a rich data set across Microsoft applications and services, that can also be integrated into custom applications. This series of posts showed how to authenticate with MSAL and then integrate data with a custom Android app:
The Jetpack Compose highlight of the year was this animation series inspired by a droidcon talk:
Some of these animation ideas were applied to the Jetchat AI Compose sample.
There was also a series on Relay for Figma and creating styles to import designs from Figma to your Jetpack Compose application UI.
And finally, some tips for foldable Android developers – a new FoldAwareColumn in Accompanist and Improved navigation support in TwoPaneLayout.
If you have any questions about OpenAI, ONNX, Microsoft Graph, or Jetpack Compose, use the feedback forum or message us @surfaceduodev.
The post 2023 year in review appeared first on Surface Duo Blog.
]]>Hello Flutter developers! After recently reading about how Pieces.app uses ONNX runtime inside a Flutter app, I was determined to try it myself. This article shows a summary of the journey I took and provides a few tips for you if you want to do the same. Since we have FFI in Dart for calling […]
The post Use ONNX Runtime in Flutter appeared first on Surface Duo Blog.
]]>Hello Flutter developers!
After recently reading about how Pieces.app uses ONNX runtime inside a Flutter app, I was determined to try it myself. This article shows a summary of the journey I took and provides a few tips for you if you want to do the same.
Since we have FFI in Dart for calling C code and ONNX Runtime offers a C library, this is the best way to integrate across most platforms. Before I walk down that path, I decide to have a look at pub.dev to see if anyone did this before me. My thinking here is that anything running ONNX Runtime is a good starting point, even if I must contribute to the project to make it do what I need. In the past, if a plugin lacked functionality, I would fork it, write what was missing and then use the fork as a git dependency. If it was appropriate, I would also open a PR to upstream the changes.
Figure 1: Searching for ONNX on pub.dev
If I’m searching for ONNX, four packages show up. As it sometimes happens on pub.dev, some packages are started and published but not finished. After looking at the code, I concluded that only onnxruntime has enough work put into it that it’s worth giving a shot. At first glance, it seems to only run on Android and iOS, but after looking at the code, I see it is based on the ONNX Runtime C Library and it uses Dart FFI, which means I can make it run on other platforms down the line. Off I go with a brand new flutter project flutter create onnxflutterplay
and then flutter pub add onnxruntime
.
The library comes with an example. It seems to be an audio processing sample, which is far too complicated for where I am right now. I want to understand the basics and run the simplest ONNX model I can think of. This will also prove to me that the plugin works. I start searching for the simplest model I can think of and end up with the model from the ONNX Runtime basic usage example. It takes two float numbers as input and outputs their sum. I follow the instructions and generate my first ever ORT model. This is how the model looks like in Netron.
Figure 2: Netron app showing a simple model
To figure out how to use the model, I have a few resources at my disposal. First, I have the sample code from the model repo, which is Swift code and might be intimidating, but is well documented and quite similar to Kotlin and Dart. I need to be comfortable looking at other languages anyway, since most AI researchers use Python. I see the names “A”, “B” and “C” and the float type being used explicitly. The other resource I have is a test from the flutter plugin. It uses simple data types for input and output, which shows me how to pack “A” and “B” inputs properly. You can see the complete code on GitHub. This is what I end up with:
Figure 3: Code for inferring the simple model
I run into some exceptions with the session.release()
call. From my investigations, this library might expect to be called from an isolate and I am not doing that yet. To move past the errors, I simply commented that line – but if I was doing this for a production app I would give the isolate a try and investigate further. For now, this will do.
Next step in my journey is to try a larger model. My end goal here is to work with images, and I feel prepared to start using the simplest model I can find. The perfect model to continue with is one that takes an image input and only applies some color filter or other easy to debug operation. I start looking for such a model but can’t find one. I land on a style transfer model from the ONNX Model Zoo archive. I pick the mosaic pretrained model and I immediately open it in Netron.
Figure 4: Netron showing a complex model
You can clearly see the input and output there: float32[1,3,224,224]. The numbers in brackets represent the shape of the tensor. The shape is important because we process our input and output to match that shape. When that shape was not respected, I usually got a runtime error telling me it expected something else. You can feed some models raw PNG or JPEG files, but not this model. This model requires a bit of processing.
I did not know about tensor shapes before this work, so maybe it’s worth pausing a bit to discuss what it means. If you have a simple matrix with 10 rows of 100 elements each, the shape is [10, 100]. The shape is the number of elements on each of the axes of the tensor. For an experienced computer vision machine learning developer, I expect that something like [1, 3, 224, 224] immediately screams “one image with 3 channels per pixel (Red, Green, Blue) of size 224 by 224 pixels”.
I first convert the ONNX file into ORT format and then add it to the app. I also prepare an image. I do not want to fiddle with resizing and transforming the input or output yet, so I fire up mspaint and make a 224 by 224 pixels, completely red image. During debugging, I also make a half red, half green image.
Figure 5: Red square
Figure 6: Half red, half green square
A red image of the exact size I need provides me with a simple to debug input. Working with ONNX Runtime or Machine Learning in general proves to be a lot of pre- or post-processing.
For example, colors for each pixel are represented differently in Flutter or Android compared to these ONNX models. To drive this point, let’s consider an unusual 1×10 image. We have 10 pixels in total. Each has 4 color components. Let’s number each pixel 1 to 10 and each color component R (Red), G (Green), B (Blue) and A (Alpha). In the sample below, Flutter stores the image as:
R1 G1 B1 A1 R2 G2 B2 A2 R3 G3 B3 A3 […] R10 G10 B10 A10
From what I see, due to how tensor reshaping works, to get the right ONNX Runtime Tensor, the image data must look like this:
R1 R2 R3 […] R10 G1 G2 G3 […] G10 B1 B2 B3 […] B10
Reordering the colors and dropping the Alpha component to fit this format is our pre-processing and the code looks like this:
Figure 7: Code for converting image to tensor
Working with a red image here helps me debug the actual numbers I see in the tensor data. I expect to see 50176 (224×224) occurrences of the value 255 (maximum for red), followed by all zeros (green and blue). The result I get back from the model output also needs to be processed back to a Flutter image. This does the exact opposite of the input processing. Notice that I added the alpha back and set it to 255:
Figure 8: Code for converting tensor to image
When working with images, input and output are usually formatted the same way and post-processing mirrors what you do in pre-processing. You can feed the pre-processing into post-processing directly, without running the model and then render the results, to validate they are symmetrical. This does not mean that the model will work well with the data, but it can surface issues with your processing.
And here is the result, using a photo of an Eurasian blue tit by Francis Franklin:
Figure 9: Bird image before and after stylizing as mosaic
Throughout this journey, I learned that making small steps is the way to go. Working with ORT can feel like using a black box and taking baby steps is essential for understanding the input and output at every stage.
The post Use ONNX Runtime in Flutter appeared first on Surface Duo Blog.
]]>Hello prompt engineers, This week, we are taking one last look at the new Assistants API. Previous blog posts have covered the Retrieval tool with uploaded files and the Code interpreter tool. In today’s post, we’ll add the askWikipedia function that we’d previously built to the fictitious Contoso employee handbook document chat. Configure functions in […]
The post OpenAI Assistant functions on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
This week, we are taking one last look at the new Assistants API. Previous blog posts have covered the Retrieval tool with uploaded files and the Code interpreter tool. In today’s post, we’ll add the askWikipedia
function that we’d previously built to the fictitious Contoso employee handbook document chat.
We’ll start by configuring the assistant in the OpenAI playground. This isn’t required – assistants can be created and figured completely in code – however it’s convenient to be able to test interactively before doing the work to incorporate the function into the JetchatAI Kotlin sample app.
Adding a function declaration is relatively simple – click the Add button and then provide a JSON description of the function’s capabilities. Figure 1 shows the Tools pane where functions are configured:
Figure 1: Add a function definition in the playground (shows askWikipedia
already defined)
The function configuration is just an empty text box, into which you enter the JSON that describes your function – its name, what it does, and the parameters it needs. This information is then used by the model to determine when the function could be used to resolve a user query.
The JSON that describes the askWikipedia
function from our earlier post is shown in Figure 2. It has a single parameter – query
– which the model will extract from the user’s input.
{ "name": "askWikipedia", "description": "Answer user questions by querying the Wikipedia website. Don't call this function if you can answer from your training data.", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "The search term to query on the wikipedia website. Extract the subject from the sentence or phrase to use as the search term." } }, "required": [ "query" ] } }
Figure 2: JSON description of the askWikipedia
function
In theory, this should be enough for us to test whether the model will call the function; however, my test question “what is measles?” would always be answered by the model without calling askWikipedia
. Notice that my function description
included the instruction “Don’t call this function if you can answer from your training data.” – this was because in the earlier example, the function was being called too often!
Since this assistant’s purpose is to discuss health plans, I updated the system prompt (called Instructions
in the Assistant playground) to the following (new text in bold):
answer questions about employee benefits and health plans from uploaded files. ALWAYS include the file reference with any annotations. use the askWikipedia function to answer questions about medical conditions
After this change, the test question now triggers a function call!
Figure 3 shows what a function call looks like in the playground – because function calls are delegated to the implementing application, the playground needs you to interactively supply a simulated function return value. In this case, I pasted in the first paragraph of the Wikipedia page for “measles”, and the model summarized that into its chat response.
Figure 3: Testing a function call in the playground
With the function configured and confirmation that it will get called for the test question, we can now add the function call handler to our Android assistant in Kotlin.
In the original Assistant API implementation, there is a loop after the message is added to a run (shown in Figure 4) where the chat app waits for the model to respond with its answer:
do { delay(1_000) val runTest = openAI.getRun(thread!!.id, run.id) } while (runTest.status != Status.Completed)
Figure 4: Checking message status until the run is completed and the model response is available
Now that we’ve added a function definition to the assistant, the runTest.status
can change to RequiresAction
(instead of Completed
), which indicates that the model is waiting for one (or more) function return values to be supplied.
The AskWikipediaFunction
class already contains code to search Wikipedia, so we just need to add some code to detect when a function call has occurred and provide the return value. The code shown in Figure 5 does that in these steps:
RequiresAction
steps
stepDetails
for information about what action is required
FunctionTool
toolOutput
with the function return value
Once the assistant receives the function return value, it can construct its answer back to the user. This will set the status
to Completed
(exiting the do
loop) and the code will display the model’s response to the user.
do { delay(1_000) val runTest = openAI.getRun(thread!!.id, run.id) if (runTest.status == Status.RequiresAction) { // run is waiting for action from the caller val steps = openAI.runSteps(thread!!.id, run.id) // get the run steps val stepDetails = steps[0].stepDetails // latest step if (stepDetails is ToolCallStepDetails) { // contains a tool call val toolCallStep = stepDetails.toolCalls!![0] // get the latest tool call if (toolCallStep is ToolCallStep.FunctionTool) { // which is a function call var function = toolCallStep.function var functionResponse = "Error: function was not found or did not return any information." when (function.name) { "askWikipedia" -> { var functionArgs = argumentsAsJson(function.arguments) ?: error("arguments field is missing") val query = functionArgs.getValue("query").jsonPrimitive.content // CALL THE FUNCTION!! functionResponse = AskWikipediaFunction.function(query) } // Package the function return value val to = toolOutput { toolCallId = ToolId(toolCallStep.id.id) output = functionResponse } // Send back to assistant openAI.submitToolOutput(thread!!.id, run.id, listOf(to)) delay(1_000) // wait before polling again, to see if status is complete } } } } while (runTest.status != Status.Completed)
Figure 5: Code to detect the RequiresAction
state and call the function
Note that there are a number of shortcuts in the code shown – when multiple functions are declared, the assistant might orchestrate multiple function calls in a run, and there are multiple places where better error/exception handling is required. You can read more about how multiple function calls might behave in the OpenAI Assistant documentation.
Here is a screenshot of JetchatAI for Android showing an assistant conversation using the askWikipedia
function:
Figure 6: JetchatAI #assistant-chat showing a response generated by a function call
You can view the complete code for the assistant function call in this pull request for JetchatAI.
Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features like the Assistant API.
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
The post OpenAI Assistant functions on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers, Over the last few weeks, we’ve looked at different aspects of the new OpenAI Assistant API, both prototyping in the playground and using Kotlin in the JetchatAI sample. In this post we’re going to add the Code Interpreter feature which allows the Assistants API to write and run Python code in a […]
The post OpenAI Assistant code interpreter on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
Over the last few weeks, we’ve looked at different aspects of the new OpenAI Assistant API, both prototyping in the playground and using Kotlin in the JetchatAI sample. In this post we’re going to add the Code Interpreter feature which allows the Assistants API to write and run Python code in a sandboxed execution environment. By using the code interpreter, chat interactions can solve complex math problems, code problems, read and parse data files, and output formatted data files and charts.
To keep with the theme of the last few examples, we are going to test the code interpreter with a simple math problem related to the fictitious Contoso health plans used in earlier posts.
The code interpreter is just a setting to be enabled, either in code or via the playground (depending on where you have set up your assistant). Figure 1 shows the Kotlin for creating an assistant using the code interpreter – including setting a very basic system prompt/meta prompt/instructions:
val assistant = openAI.assistant( request = AssistantRequest( name = "doc chat", instructions = "answer questions about health plans", tools = listOf(AssistantTool.CodeInterpreter), // enables the code interpreter model = ModelId("gpt-4-1106-preview") ) )
Figure 1: enabling the code interpreter in Kotlin when creating an assistant
As discussed in the first Kotlin assistant post, JetchatAI loads an assistant definition that was configured in the OpenAI playground, so it’s even easier to enable the Code interpreter by flipping this switch:
Figure 2: enabling the code interpreter in the OpenAI playground
With the code interpreter enabled, we can test it both interactively in the playground and in the JetchatAI app. The test question is “if the health plan costs $1000 a month, what is deducted weekly from my paycheck?”.
The playground output includes the code that was generated and executed in the interpreter – in Figure 3 you can see it creates variables from the values mentioned in the query and then a calculation that returns a value to the model, which is incorporated into the response to the user.
Figure 3: Testing the code interpreter in the playground with a simple math question (using GPT-4)
Notice that the code interpreter just returns a numeric answer, and the model decides to ‘round up’ to “$230.77” and format as a dollar amount.
When implementing the Assistant API in an app, the code interpreter step (or steps) would not be rendered (in the same way that you don’t render function call responses), although they are available in the run’s step_details
data structure so you could still retrieve and display them to the user, or use for logging/telemetry or some other purpose in your app.
Figure 4: Testing the same query on Android (also using GPT-4)
No code changes were made to the JetchatAI Android sample for this testing, it just needs the Assistant’s configuration updated as shown.
One of the reasons that the code interpreter option exists is because LLMs “on their own” can be terribly bad at math. Here are some examples using the same prompt on different models without a code interpreter to help:
The full response for each of these is shown below – I included the ChatGPT response separately because although it’s likely using similar models to what I have access to in their playground, it has its own system prompt/meta prompt which can affect how it approaches these types of problems. All three of these examples appear to be following some sort of chain of thought prompting, although they take different approaches. While some of these answers are close to the code interpreter result, that’s probably not ideal when money’s involved! How chat applications respond to queries with real-world implications (like financial advice, for example) is something that should be evaluated through the lens of responsible AI – remember this is just a fictitious example to test the LLM’s math skills.
GPT 3.5 Turbo attempts to ‘walk through’ the calculation, but doesn’t seem to understand all months will not necessarily contain exactly two pay days. At least the response includes a disclaimer to consult HR or payroll to verify!
Figure 5: output from gpt-3.5-turbo for the same query
It’s worth noting that when the code interpreter is enabled with gpt-3.5-turbo in the playground, it returns $230.77 (just like gpt-4 with the interpreter does in Figures 3 and 4).
Using the GPT 4 model results in a different set of steps to solve, but unlike the code interpreter which multiplies out the total cost first, this solution calculates the average number of weeks in a month. The first mistake is that value is 4.3(repeating), and by rounding to 4.33 it will get a slightly different answer. The second mistake is that $1000/4.33= 230.9468822170901; however it returns a result of $231.17 which is about 22 cents different to what the rounded answer should be. It also includes a disclaimer and advice to confirm with HR or payroll.
Figure 6: output from gpt-4 for the same query
The public ChatGPT follows similar logic to the GPT-4 playground, although it has better math rendering skills. It makes the same two mistakes, first rounding an intermediate value in the calculation, and then still failing to divide 1000/4.33 accurately.
Figure 7: output from ChatGPT for the same query
OpenAI models without the code interpreter feature seem to have some trouble with mathematical questions, returning different answers for the same question depending on the model and possibly the system prompt and context. In the simple testing above, the code interpreter feature does a better job of calculating a reasonable answer, and can do so more consistently on both gpt-4 and gpt-3.5-turbo models.
Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features like the Assistant API.
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
The post OpenAI Assistant code interpreter on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers, This week we’re continuing to discuss the new Assistant API announced at OpenAI Dev Day. There is documentation available that explains how the API works and shows python/javascript/curl examples, but in this post we’ll implement in Kotlin for Android and Jetpack Compose. You can review the code in this JetchatAI pull request. […]
The post OpenAI Assistant on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
This week we’re continuing to discuss the new Assistant API announced at OpenAI Dev Day. There is documentation available that explains how the API works and shows python/javascript/curl examples, but in this post we’ll implement in Kotlin for Android and Jetpack Compose. You can review the code in this JetchatAI pull request.
A few weeks ago, we demonstrated building a simple Assistant in the OpenAI Playground – uploading files, setting a system prompt, and performing RAG-assisted queries – mimicking this Azure demo. To refresh your memory, Figure 1 shows the Assistant configuration:
Figure 1: OpenAI Assistant Playground (key explained in this post)
We are going to reference this specific assistant, which has been configured in the playground, from an Android app. After the assistant is created, a unique identifier is displayed under the assistant name (see Figure 2), which can be referenced via the API.
Figure 2: Assistant unique identifier
The openai-kotlin GitHub repo contains some basic instructions for accessing the Assistant API in Kotlin. This guidance has been adapted to work in JetchatAI, as shown in Figure 3. Comparing this file (AssistantWrapper.kt) to earlier examples (eg. OpenAIWrapper.kt) you’ll notice it is significantly simpler! Using the Assistant API means:
Many lines of code over multiple files in older examples can be reduced to the chat
function shown in Figure 3:
suspend fun chat(message: String): String { if (assistant == null) { // open assistant and create a new thread every app-start assistant = openAI.assistant(id = AssistantId(Constants.OPENAI_ASSISTANT_ID)) // from the Playground thread = openAI.thread() } val userMessage = openAI.message ( threadId = thread!!.id, request = MessageRequest( role = Role.User, content = message ) ) val run = openAI.createRun( threadId = thread!!.id, request = RunRequest(assistantId = assistant!!.id) ) do { delay(1_000) val runTest = openAI.getRun(thread!!.id, run.id) } while (runTest.status != Status.Completed) val messages = openAI.messages(thread!!.id) val message = messages.first() // bit of a hack, get the last one generated val messageContent = message.content[0] if (messageContent is MessageContent.Text) { return messageContent.text.value } return "<Assistant Error>" // TODO: error handling }
Figure 3: new chat
function using a configured Assistant (by Id) using Kotlin API
Assistants can also support function calling, but we haven’t included any function demonstration in this sample.
To access the new Assistant channel in JetchatAI, first choose #assistant-chat from the chat list:
Figure 4: Access the #assistant-chat channel in the Chats menu of JetchatAI
You can now ask questions relating to the content of the five source PDF documents that were uploaded in the playground. The documents relate to employee policies and health plans of the fictitious Contoso/Northwind organizations. There are two example queries and responses in the screenshot in Figure 5:
Figure 5: user queries against the Assistant loaded with fictitious ‘employee policy’ PDFs as the data source
The Assistant API is still in preview, so it could change before the official release. The current JetchatAI implementation needs some more work in error handling, resolving citations, checking for multiple assistant responses, implementing functions, and more. The takeaway from this week is how much simpler the Assistant API is to use versus implementing the Chat API directly.
Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features.
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
The post OpenAI Assistant on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers, Last week we looked at one of the new OpenAI features – Assistants – in the web playground, but good news: the OpenAI Kotlin library is already being updated with the new APIs and you can start to try them out right now in your Android codebase with snapshot package builds. With […]
The post Test the latest AI features in Kotlin appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
Last week we looked at one of the new OpenAI features – Assistants – in the web playground, but good news: the OpenAI Kotlin library is already being updated with the new APIs and you can start to try them out right now in your Android codebase with snapshot package builds. With a few minor configuration changes you can start testing the latest AI features and get ready for a supported package release.
While new features are being added to the Kotlin library, you can track progress from this GitHub issue and the related PRs including support for the updated Images API as well as the Assistants API. Not all features will be complete (at the time of writing).
The owner (aallam) of the openai-kotlin repo is publishing pre-release snapshot builds of these features for developers to play with prior to an official release. You can see the changes required to test the snapshot in this JetchatAI pull request:
maven { url = uri("https://oss.sonatype.org/content/repositories/snapshots/") }
openai
package version reference to 3.6.0-SNAPSHOT
(in gradle/libs.versions.toml):
openai = "com.aallam.openai:openai-client:3.6.0-SNAPSHOT"
Once the completed 3.6.0 release is available, don’t forget to remove the snapshots package source and update the version reference to a stable number.
One of the new features announced at OpenAI Dev Day is DALL·E 3 support in the Image generation API. You can read more in the FAQs and documentation.
To use the updated API, first reference the snapshot builds as described above, and then add the new model declaration to the ImageCreation
constructor:
val imageRequest = ImageCreation(prompt = prompt, model = ModelId("dall-e-3"))
Note that the model
parameter did not exist in previous builds of the OpenAI Kotlin library, so if you see any compile errors then re-check your gradle changes to ensure you’re referencing the latest snapshot.
The images in Figure 1 compare the same prompt being run on DALL·E 2 and DALL·E 3 in the JetchatAI demo:
Figure 1: image response from DALL·E 2 and DALL·E 3 for the prompt “a bucolic image”
Have fun playing with DALL·E 3 and the other new APIs!
Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features.
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
The post Test the latest AI features in Kotlin appeared first on Surface Duo Blog.
]]>Hello prompt engineers, OpenAI held their first Dev Day on November 6th, which included a number of new product announcements, including GPT-4 Turbo with 128K context, function calling updates, JSON mode, improvements to GPT-3.5 Turbo, the Assistant API, DALL*E 3, text-to-speech, and more. This post will focus just on the Assistant API because it greatly […]
The post OpenAI Assistants appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
OpenAI held their first Dev Day on November 6th, which included a number of new product announcements, including GPT-4 Turbo with 128K context, function calling updates, JSON mode, improvements to GPT-3.5 Turbo, the Assistant API, DALL*E 3, text-to-speech, and more. This post will focus just on the Assistant API because it greatly simplifies a lot of the challenges we’ve been addressing in the JetchatAI Android sample app.
The Assistants overview explains the key features of the new API and how to implement an example in Python. In today’s blog post we’ll compare the new API to the Chat API we’ve been using in our Android JetchatAI sample, and specifically to the #document-chat demo that we built over the past few weeks.
Big differences between the Assistants API and the Chat API:
The last two posts have been about building a basic document chat similar to this Azure OpenAI sample, using a lot of custom Kotlin code. OpenAI Assistants now make this trivial to build, as shown in Figure 1 below. This example shows how to prototype an Assistant in the Playground that can answer questions based on documentation sources.
You can see the result is very similar to the responses from the previous blog post:
Figure 1: OpenAI Assistant playground, with the sample PDFs uploaded and supporting typical user queries
Each element in the playground in Figure 1 is explained below:
Comparing this example to the Chat Playground prototype last week you can see the Assistants API is much simpler.
The Assistant API is still in beta, so expect changes before it’s ready for production use, however it’s certain that the Kotlin code to implement the Assistant will be much simpler than the Chat API. Rather than keeping track of the entire conversation, managing the context window, chunking the source data, and prompt-tweaking for RAG, in future the code will only keep track of the thread identifier and exchange new queries and responses with the server. The citations can also be programmatically resolved when you implement the API (even though they appear as illegible model-generated strings in the playground).
The OpenAI Kotlin open-source client library is currently implementing the Assistant API – see the progress in this pull request.
Refer to the OpenAI blog for more details on the Dev Day announcements, and the Assistants docs for more technical detail.
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
The post OpenAI Assistants appeared first on Surface Duo Blog.
]]>Hello prompt engineers, Last week’s blog introduced a simple “chat over documents” Android implementation, using some example content from this Azure demo. However, if you take a look at the Azure sample, the output is not only summarized from the input PDFs, but it’s also able to cite which document the answer is drawn from […]
The post Chunking for citations in a document chat appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
Last week’s blog introduced a simple “chat over documents” Android implementation, using some example content from this Azure demo. However, if you take a look at the Azure sample, the output is not only summarized from the input PDFs, but it’s also able to cite which document the answer is drawn from (showing in Figure 1). In this blog, we’ll investigate how to add citations to the responses in JetchatAI.
Figure 1: Azure OpenAI demo result shows citations for the information presented in the response
In order to provide similar information in the JetchatAI document chat on Android, we’ll need to update the document parsing (chunking) so that we have enough context to answer questions and identify the source.
Before spending a lot of time on the parsing algorithm, it makes sense to confirm that we can get the model to understand what we want to achieve. To quickly iterate prototypes for this feature, I simulated a request/response in the OpenAI Playground, using the existing prompts from the app and some test embeddings from testing the document chat feature:
Figure 2: OpenAI playground for testing prompt ideas, with a prototype for answering a document chat with cited sources
Figure 2 shows an example chat interaction based on the documents we added to the app in the previous blog. The facts listed in the USER prompt (#4) are examples of the embeddings from testing the existing feature. Each element of the “prompt prototype” is explained below:
[1]
numbered square brackets, and the grounding reinforces that citations should be used and added to the end of the response.
#
markdown-style emphases on the filename helps the model to group the data that follows. This test data is actual embeddings from testing the document chat feature previously.
Slightly changing the user-prompt to “does my plan cover contact lenses” (without mentioning immunizations), we can confirm that the answer and cited documents changes:
Figure 3: OpenAI playground example where only one source document is cited
Note that in Figure 3 the citation numbering seems to reflect the position of the “document” in the grounding prompt. Although this should be numbered from one, I’m going to ignore it for now (another exercise for the reader). The updated prompt and grounding format works well enough to be added to the app for further testing.
Now that we’ve established a prompt that works in the OpenAI playground, we need to update the app to parse the documents differently so that we can re-create the grounding format in code.
Currently, the sentence embeddings are all added without keeping track of the source document. When they’re added to the grounding data, they are ordered by similarity score (highest first).
To implement the prompt and grounding prototyped above, we need to:
The code for these two changes is shown below (and is in this pull request), followed by final app testing.
Because the code from last week was already keepting track of ‘document id’ as it parsed the resource files, minimal changes were needed to keep track of the actual filenames.
Firstly, a new array rawFilenames
contains the user-friendly filename representation for each resource:
val rawResources = listOf(R.raw.benefit_options, R.raw.northwind_standard_benefits_details) val rawFilenames = listOf<String>("Benefit-options.pdf", "Northwind-Standard-benefits-details.pdf")
Figure 4: adding the user-friendly filename strings (must match the resources order)
Then as the code is looping through the resources, we add the user-friendly filename to a cache, keyed by the ‘document id’ we already have stored as part of the embeddings key:
for (resId in rawResources) { documentId++ documentCache["$documentId"] = rawFilenames[documentId] // filename will be shown to user
Figure 5: storing the filename to match the documentId
for later retrieval
It’s now possible to determine which document a given sentence was found in.
When the document filename is stored for each embedding, the code building the grounding prompt can group the embeddings under document “headings” so that the model can better understand the context for the embedding strings.
For the document filenames to be useful, the system prompt must be updated to match the prototype in Figure 2. Figure 6 below shows the updated system prompt from the DocumentChatWrapper.kt init
function:
grounding = """ You are a personal assistant for Contoso employees. You will answer questions about Contoso employee benefits from various employee manuals. Your answers will be short and concise. Only use the functions you have been provided with. The user has Northwind Standard health plan. For each piece of information you provide, cite the source in brackets like so: [1]. At the end of the answer, always list each source with its corresponding number and provide the document name, like so [1] Filename.doc""".trimMargin()
Figure 6: updated system prompt (including a personalization statement about the user’s current plan)
The code in Figure 7 shows the grounding
function changes to support citations, producing output similar to the prototype grounding in Figure 2. After ranking the embeddings by similarity (and ignoring results with less than 0.8 similarity score), it loops through and groups sentences by document filename:
var matches = sortedVectors.tailMap(0.8) // re-sort based on key, to group by filename var sortedMatches: SortedMap<String, String> = sortedMapOf() for (dpKey in sortedVectors.tailMap(0.8)) { val fileId = dpKey.value.split('-')[0] // the document id is the first part of the embedding key val filename = documentNameCache[fileId]!! val content = documentCache[dpKey.value]!! if (sortedMatches.contains(filename)) { // add to current ‘file’ matching sentences sortedMatches[filename] += "\n\n$content" } else { // first match for this filename sortedMatches[filename] = content } } // loop through filenames and output the matching sentences for each file messagePreamble = "The following information is extracted from Contoso employee handbooks and health plan documents:" for (file in sortedMatches) { messagePreamble += "\n\n# ${file.key}\n\n${file.value}\n\n#####\n\n" // use the # pound markdown-like heading syntax for the filename, } messagePreamble += "\n\nUse the above information to answer the following question, providing numbered citations for document sources used (mention the cited documents at the end by number). Synthesize the information into a summary paragraph:\n\n"
Figure 7: updated grounding
function
Now that the code has been updated to:
The responses in the JetchatAI document chat should now include numbered citations.
With these relatively small changes in the code, the #document-chat conversation in JetchatAI will now add citations when asked questions about the fictitious Contoso employee benefits documents that are referenced via RAG principles:
Figure 8: JetchatAI showing citations when referencing source documents
This post is closely related to the document chat implementation post.
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
The post Chunking for citations in a document chat appeared first on Surface Duo Blog.
]]>Hello prompt engineers, In last week’s discussion on improving embedding efficiency, we mentioned the concept of “chunking”. Chunking is the process of breaking up a longer document (ie. too big to fit under a model’s token limit) into smaller pieces of text, which will be used to generate embeddings for vector similarity comparisons with user […]
The post Document chat with OpenAI on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
In last week’s discussion on improving embedding efficiency, we mentioned the concept of “chunking”. Chunking is the process of breaking up a longer document (ie. too big to fit under a model’s token limit) into smaller pieces of text, which will be used to generate embeddings for vector similarity comparisons with user queries (just like the droidcon conference session data).
Inspired by this Azure Search OpenAI demo, and also the fact that ChatGPT itself released a PDF-ingestion feature this week, we’ve added a “document chat” feature to the JetchatAI Android sample app. To access the document chat demo, open JetchatAI and use the navigation panel to change to the #document-chat conversation:
Figure 1: access the #document-chat
To build the #document-chat we re-used a lot of code and added some PDF document content from an Azure chat sample.
In the pull-request for this feature, you’ll see a number of new files that were cloned from existing code to create the #document-chat channel:
DocumentChatWrapper
– sets the system prompt to guide the model to only answer “Contoso employee” questions
DocumentDatabase
– functions to store the text chunks and embeddings in Sqlite so they are persisted across app restarts
AskDocumentFunction
– SQL generating function that can attempt searches on the text chunks in the database. Ideally, we would provide a semantic full-text search backend, but in this example only basic SQL text matching is supported.
The bulk of this code is identical to the droidcon conference chat demo, except instead of a hardcoded database of session details, we needed to write new code to parse and store the content from PDF documents. This new code exists mainly in the loadVectorCache
and initVectorCache
functions (as well as a new column in the embeddings Sqlite database to hold the corresponding content).
To create the data store, we used the test data associated with the Azure Search demo on GitHub: six documents that describe the fictitious Contoso company’s employee handbook and benefits. These are provided as PDFs, but to keep our demo simple I manually copied the text into .txt files which are added to the JetchatAI raw
resources folder. This means we don’t have to worry about PDF file format parsing, but can still play around with different ways of chunking the content.
The code to load these documents from the resources folder is shown in Figure 2:
var documentId = -1 val rawResources = listOf(R.raw.benefit_options) // R.raw.employee_handbook, R.raw.perks_plus, R.raw.role_library, R.raw.northwind_standard_benefits_details, R.raw.northwind_health_plus_benefits_details for (resId in rawResources) { documentId++ val inputStream = context.resources.openRawResource(resId) val documentText = inputStream.bufferedReader().use { it.readText() }
Figure 2: loading the source document contents
Once we’ve loaded the contents of each document, we need to break it up before creating embeddings that can be used to match against user queries (and ultimately answer their questions with retrieval augmented generation).
This explanation of chunking strategies outlines some of the considerations and methods for breaking up text to use for RAG-style LLM interactions. For our initial implementation we are going to take a very simplistic approach, which is to create an embedding for each sentence:
val documentSentences = documentText.split(Regex("[.!?]\\s*")) var sentenceId = -1 for (sentence in documentSentences){ if (sentence.isNotEmpty()){ sentenceId++ val embeddingRequest = EmbeddingRequest( model = ModelId(Constants.OPENAI_EMBED_MODEL), input = listOf(sentence) ) val embedding = openAI.embeddings(embeddingRequest) val vector = embedding.embeddings[0].embedding.toDoubleArray() // add to in-memory cache vectorCache["$documentId-$sentenceId"] = vector documentCache["$documentId-$sentenceId"] = sentence
Figure 3: uses regex to break into sentences and creates/stores an embedding vector for each sentence
Although this is the simplest chunking method, there are some drawbacks:
Even so, short embeddings like this can be functional, as shown in the next section.
NOTE: The app needs to parse and generate embeddings for ALL the documents before it can answer any user queries. Generating the embeddings can take a few minutes because of the large number of embedding API requests required. Be prepared to wait the first time you use the demo if parsing all six source files. Alternatively, changing the
rawResources
array to only load a single document (likeR.raw.benefit_options
) will start faster and still be able to answer basic questions (as shown in the examples below). The app saves the embeddings to Sqlite so subsequent executions will be faster (unless the Sqlite schema is changed or the app is deleted and re-installed).
With just this relatively minor change to our existing chat code (and adding the embedded files), we can ask fictitious employee questions (similar to those shown in the Azure Search OpenAI demo):
Figure 4: Ask questions about documents in JetchatAI
These two example queries are discussed below, showing the text chunks that are used for grounding.
The first test user query returns ten chunks where the vector similarity score was above the arbitrary 0.8
threshold. Figure 5 shows a selection of the matches (some removed for space), but you can also see that the grounding prompt has the introduction The following information is extract from Contoso employee handbooks and help plans:
and instruction Use the above information to answer the following question:
to guide the model when this is included in the prompt:
The following information is extract from Contoso employee handbooks and health plans: Comparison of Plans Both plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive care services This plan also offers coverage for preventive care services, as well as prescription drug coverage Northwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings Northwind Standard only offers coverage for vision exams and glasses Both plans offer coverage for vision and dental services, as well as medical services Use the above information to answer the following question:
Figure 5: the grounding information for the user query “does my plan have annual eye exams”
Because we have also registered the AskDocumentFunction
an SQL query (Figure 6) is also generated for the query, however the exact phrase “annual eye exam” does not have any matches and no additional grounding is provided by the function call.
SELECT DISTINCT content FROM embedding WHERE content LIKE '%annual eye exams%'
Figure 6: text search is too specific and returns zero results
The grounding in Figure 5 is enough for the model to answer the question with “Yes your plan covers annual eye exams”.
Note that the user query mentioned “my plan”, and the model’s response asserts that “your plan covers…”, probably because in the grounding data the statements include “Both plans offer coverage…”. We have not provided any grounding on what plan the user is signed up for, but that could be another improvement (perhaps in the system prompt) that would help answer more accurately.
The second test query only returns three chunks with a vector similarity score above 0.8
(shown in Figure 7)
The following information is extract from Contoso employee handbooks and health plans: Northwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings Both plans offer coverage for vision and dental services, as well as medical services Both plans offer coverage for vision and dental services Use the above information to answer the following question:
Figure 7: the grounding information for the user query “what about dental”
The model once again triggers the dynamic SQL function to perform a text search for “%dental%”, which returns the four matches shown in Figure 8.
SELECT DISTINCT content FROM embedding WHERE content LIKE '%dental%' ------- [('Northwind Health Plus Northwind Health Plus is a comprehensive plan that provides comprehensive coverage for medical, vision, and dental services') ,('Northwind Standard Northwind Standard is a basic plan that provides coverage for medical, vision, and dental services') ,('Both plans offer coverage for vision and dental services') ,('Northwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings') ,('Both plans offer coverage for vision and dental services, as well as medical services')]
Figure 8: SQL function results for the user query “what about dental?”
The chunks returned from the SQL query mostly overlap with the embeddings matches. The model uses this information to generate the response “Both plans offer coverage for dental services, including dental exams, cleanings, and fillings.”
If you look closely at the grounding data, there’s only evidence that the “Health Plus” plan covers fillings (there is no explicit mention that the “Standard” plan offers anything beyond “dental services”). This means that the answer given could be giving misleading information about fillings being covered by both plans – it may be a reasonable assumption given the grounding, or it could fall into the ‘hallucination’ category. If the chunks were larger then the model might have more context to understand which features are associated with which plan.
This example uses the simplest possible chunking strategy, and while some questions can be answered it’s likely that a more sophisticated chunking strategy will support more accurate responses. In addition, including more information about the user could result in more personalized responses.
Some additional samples that demonstrate building document chat services with more sophisticated search support:
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
The post Document chat with OpenAI on Android appeared first on Surface Duo Blog.
]]>Hello prompt engineers, I’ve been reading about how to improve the process of reasoning over long documents by optimizing the chunking process (how to break up the text into pieces) and then summarizing before creating embeddings to achieve better responses. In this blog post we’ll try to apply that philosophy to the Jetchat demo’s conference […]
The post More efficient embeddings appeared first on Surface Duo Blog.
]]>Hello prompt engineers,
I’ve been reading about how to improve the process of reasoning over long documents by optimizing the chunking process (how to break up the text into pieces) and then summarizing before creating embeddings to achieve better responses. In this blog post we’ll try to apply that philosophy to the Jetchat demo’s conference chat, hopefully achieving better chat responses and maybe saving a few cents as well.
When we first wrote about building a Retrieval Augmented Generation (RAG) chat feature, we created a ‘chunk’ of information for each conference session. This text contains all the information we have about the session, and it was used to:
Figure 1 shows an example of how the text was formatted (with key:value pairs) and the types of information provided:
Speaker: Craig Dunn Role: Software Engineer at Microsoft Location: Robertson 1 Date: 2023-06-09 Time: 16:30 Subject: AI for Android on- and off-device Description: AI and ML bring powerful new features to app developers, for processing text, images, audio, video, and more. In this session we’ll compare and contrast the opportunities available with on-device models using ONNX and the ChatGPT model running in the cloud.
Figure 1: an example of the session description data format used in the original Jetchat sample app
Using all the information fields for embeddings and grounding worked fine for our use case, and we’ve continued to build additional features like sliding window and history caching based on this similarity matching logic. However, that doesn’t mean it couldn’t be further improved!
When you consider how the embedding vector is used – to compare against the user query to match on general topics and subject similarity – it seems like we could simplify the information we use to create the embedding, such as removing the speaker name, date, time, and location keys and values. This information is not well suited to matching embeddings anyway (chat functions and dynamic SQL querying work much better for questions about those attributes), so we can reduce the text chunk used for generating embeddings to the speaker role, subject, and description as shown in Figure 2:
Speaker role: Software Engineer at Microsoft Subject: AI for Android on- and off-device Description: AI and ML bring powerful new features to app developers, for processing text, images, audio, video, and more. In this session we’ll compare and contrast the opportunities available with on-device models using ONNX and the ChatGPT model running in the cloud.
Figure 2: a more focused text chunk for embedding. The role was included since for this dataset and expected query usage it often contains relevant context.
There is an immediate improvement in cost efficiency, as the new text chunk is only 73 tokens, versus 104 tokens for the complete text in Figure 1 (which would be a 25% saving in the cost of calculating all the embeddings; although note that some have longer descriptions than others so the amount of cost savings will vary). While embedding API calls are much cheaper (at $0.0004 per 1000 tokens) than the chat API ($0.002 to $0.06 per 1000 tokens), it’s still a cost that can add up over time so it makes sense to reduce the amount of tokens used to create embeddings if possible.
Note that the shorter text chunk is ONLY used for creating the embedding vector. When the vector similarity with the user query is high enough, the ORIGINAL text with all the fields is what is added to the chat prompt grounding. This ensures that the chat model can still respond with speaker, date, time, and location information in the chat.
Figure 3: Screenshot showing the generated response still includes speaker, date, time, and location information
Testing with some common user queries from other blog posts in this series, the vector similarity scores are very close when comparing the query embedding vector against the larger text chunk (old score) and the smaller text chunk (new score). About a quarter of the sample were slightly lower scoring (highlighted in red), but the rest resulted in higher similarity scores.
User query |
Matching session |
Old similarity score |
New similarity score |
Are there any sessions on AI |
AI for Android on- and off-device |
0.807 |
0.810 |
Are there any sessions on gradle |
Improving Developer Experience with Gradle Build Scans (Rooz Mohazzabi) |
0.802 |
0.801 |
Improving Developer Experience with Gradle Build Scans |
0.810 |
0.806 |
|
Improving Developer Experience with Gradle Build Scans (Iury Souza) |
0.816 |
0.823 |
|
Crash Course in building your First Gradle Plugin |
0.821 |
0.827 |
|
Are there any sessions on Jetpack Compose |
Material You Review |
0.802 |
0.801 |
Building a component library in Compose for a large-scale banking application |
0.814 |
0.824 |
|
Developing Apps optimized for Wear OS with Jetpack Compose |
0.815 |
0.814 |
|
Animating content changes with Jetpack Compose |
0.819 |
0.827 |
|
Practical Compose Navigation with a Red Siren |
0.823 |
0.825 |
|
Compose-View Interop in Practice |
0.824 |
0.838 |
|
Panel Discussion: Adopting Jetpack Compose @ Scale (Christina Lee) |
0.829 |
0.842 |
|
Panel Discussion: Adopting Jetpack Compose @ Scale (Alejandro Sanchez) |
0.831 |
0.849 |
|
Creative Coding with Compose: The Next Chapter |
0.832 |
0.840 |
|
Panel Discussion: Adopting Jetpack Compose @ Scale (Vinay Gaba) |
0.834 |
0.850 |
Figure 4: comparing vector similarity scores with the full text chunk embedding versus (old) the shorter version (new). Scores truncated to three decimal places for clarity.
The results seem to show the arbitrary cut-off of “0.8” for measuring whether a session was a good match seemed to still apply, and the actual results displayed to the user was unchanged.
Since (for these test cases at least) the chat responses in the app are identical, this improvement hasn’t affected the user experience positively or negatively (but it has reduced our embeddings API costs). Further testing on other conference queries might reveal different effects, and certainly for other use cases (such as reasoning over long documents using embedding chunks), “summarizing” the text used for embedding to better capture context that will match expected user queries could lead to better chat completions.
I’ve left the code changes to the end of the post, since very few lines of code were changed! You can see in the pull request that the key updates were:
SessionInfo
class, the first one to emit the shorter text chunk for embedding, and the larger one to emit the full text for grounding:
fun forEmbedding () : String { return "Speaker role: $role\nSubject: $subject\nDescription:$description" } fun toRagString () : String { return """ Speaker: $speaker Role: $role Location: $location Date: $date Time: $time Subject: $subject Description: $description""".trimIndent() }
DroidconEmbeddingsWrapper.initVectorCache
uses the DroidconSessionObjects
collection and the forEmbedding
function to create embeddings vectors with the summarized session info:
for (session in DroidconSessionObjects.droidconSessions) { val embeddingRequest = EmbeddingRequest( model = ModelId(Constants.OPENAI_EMBED_MODEL), input = listOf(session.value.forEmbedding()) )
DroidconEmbeddingsWrapper.grounding()
function uses toRagString()
to use the full text in the chat prompt:
messagePreamble += DroidconSessionObjects.droidconSessions[dpKey.value]?.toRagString()
These changes are very specific to our conference session data source. When the source data is less structured, you might consider generating an LLM completion to summarize each text chuck before generating the embedding.
We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.
If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.
There will be no livestream this week, but you can check out the archives on YouTube.
The post More efficient embeddings appeared first on Surface Duo Blog.
]]>