Surface Duo Blog https://devblogs.microsoft.com/surface-duo/ Build great Android experiences, from AI to foldable and large-screens. Thu, 04 Jan 2024 00:05:43 +0000 en-US hourly 1 https://devblogs.microsoft.com/surface-duo/wp-content/uploads/sites/53/2021/03/Microsoft-Favicon.png Surface Duo Blog https://devblogs.microsoft.com/surface-duo/ 32 32 2023 year in review https://devblogs.microsoft.com/surface-duo/2023-year-in-review/ <![CDATA[Craig Dunn]]> Mon, 01 Jan 2024 02:42:00 +0000 <![CDATA[AI]]> <![CDATA[Machine Learning]]> <![CDATA[Surface Duo SDK]]> <![CDATA[chatgpt]]> <![CDATA[Jetpack Compose]]> <![CDATA[kotlin]]> <![CDATA[machine learning]]> <![CDATA[msal]]> <![CDATA[onnx]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3705 <![CDATA[

Hello Android developers, 2023 was the year that machine learning and artificial intelligence really became mainstream, and we covered both topics with a focus on Android implementations. We published series on using the ONNX machine learning runtime, building Android apps with Microsoft Graph, and tutorials for Jetpack Compose developers! Take a look back at all […]

The post 2023 year in review appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello Android developers,

2023 was the year that machine learning and artificial intelligence really became mainstream, and we covered both topics with a focus on Android implementations. We published series on using the ONNX machine learning runtime, building Android apps with Microsoft Graph, and tutorials for Jetpack Compose developers! Take a look back at all the best posts from 2023…

OpenAI on Android

The blog focused heavily on working with OpenAI on Android using Kotlin, starting with some basic API access and then building out the JetchatAI demo using a variety of techniques including embeddings, “RAG” (retrieval augmented generation), images, functions, and more. Here are some of our favorite posts:

Topic

Blog posts

Assistants API

Introduction to Assistants

Assistant API on Android

Code interpreter

Functions

DALL*E image generation

DALL*E 3 image generation on Android

Embeddings API

Add embeddings to chat

Vector caching (redux)

More efficient embeddings

Chat API

Add AI to the Jetchat sample

Chat memory with functions

Sliding chat window

“Infinite chat” with history summarization

“Infinite chat” with history embeddings

Document chat

Chunking for citations

Other topics

Chat functions weather demo

Dynamic SQL queries

Prompt engineering tips

Tokens and limits

We also talked about responsible AI and speech-to-speech conversations with AI. All these posts used the OpenAI API but the same features are available via the Azure OpenAI Service.

ONNX on Android

ONNX runtime is a cross-platform machine learning foundation that can be added to Android apps. There was a series of posts to get started with ONNX on Android:

You can also use ONNX Runtime with Flutter!

Microsoft Graph on Android

Microsoft Graph provides a unified programming model to a rich data set across Microsoft applications and services, that can also be integrated into custom applications. This series of posts showed how to authenticate with MSAL and then integrate data with a custom Android app:

Jetpack Compose

The Jetpack Compose highlight of the year was this animation series inspired by a droidcon talk:

Some of these animation ideas were applied to the Jetchat AI Compose sample.

There was also a series on Relay for Figma and creating styles to import designs from Figma to your Jetpack Compose application UI.

And finally, some tips for foldable Android developers – a new FoldAwareColumn in Accompanist and Improved navigation support in TwoPaneLayout.

Feedback

If you have any questions about OpenAI, ONNX, Microsoft Graph, or Jetpack Compose, use the feedback forum or message us @surfaceduodev.

The post 2023 year in review appeared first on Surface Duo Blog.

]]>
Use ONNX Runtime in Flutter https://devblogs.microsoft.com/surface-duo/flutter-onnx-runtime/ https://devblogs.microsoft.com/surface-duo/flutter-onnx-runtime/#comments <![CDATA[Andrei Diaconu]]> Thu, 21 Dec 2023 17:07:05 +0000 <![CDATA[Machine Learning]]> <![CDATA[machine learning]]> <![CDATA[onnx]]> https://devblogs.microsoft.com/surface-duo/?p=3666 <![CDATA[

Hello Flutter developers! After recently reading about how Pieces.app uses ONNX runtime inside a Flutter app, I was determined to try it myself. This article shows a summary of the journey I took and provides a few tips for you if you want to do the same. Since we have FFI in Dart for calling […]

The post Use ONNX Runtime in Flutter appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello Flutter developers!

After recently reading about how Pieces.app uses ONNX runtime inside a Flutter app, I was determined to try it myself. This article shows a summary of the journey I took and provides a few tips for you if you want to do the same.

Since we have FFI in Dart for calling C code and ONNX Runtime offers a C library, this is the best way to integrate across most platforms. Before I walk down that path, I decide to have a look at pub.dev to see if anyone did this before me. My thinking here is that anything running ONNX Runtime is a good starting point, even if I must contribute to the project to make it do what I need. In the past, if a plugin lacked functionality, I would fork it, write what was missing and then use the fork as a git dependency. If it was appropriate, I would also open a PR to upstream the changes.

Tip: Easiest way to contribute to OSS is to solve your own issues and upstream the changes.

A screenshot of a pub.dev search for the term ONNX. Four results show up. Each has a small number of likes and low popularity.
Figure 1: Searching for ONNX on pub.dev

If I’m searching for ONNX, four packages show up. As it sometimes happens on pub.dev, some packages are started and published but not finished. After looking at the code, I concluded that only onnxruntime has enough work put into it that it’s worth giving a shot. At first glance, it seems to only run on Android and iOS, but after looking at the code, I see it is based on the ONNX Runtime C Library and it uses Dart FFI, which means I can make it run on other platforms down the line. Off I go with a brand new flutter project flutter create onnxflutterplay and then flutter pub add onnxruntime.

Tip: Whenever you decide which library to use, have a look at the code and the issues raised on GitHub. This gives you a better picture of overall quality and completeness.

The library comes with an example. It seems to be an audio processing sample, which is far too complicated for where I am right now. I want to understand the basics and run the simplest ONNX model I can think of. This will also prove to me that the plugin works. I start searching for the simplest model I can think of and end up with the model from the ONNX Runtime basic usage example. It takes two float numbers as input and outputs their sum. I follow the instructions and generate my first ever ORT model. This is how the model looks like in Netron.

Screenshot of the Netron app, with a simple model open that requires two float inputs. The model outputs a single float value.
Figure 2: Netron app showing a simple model

To figure out how to use the model, I have a few resources at my disposal. First, I have the sample code from the model repo, which is Swift code and might be intimidating, but is well documented and quite similar to Kotlin and Dart. I need to be comfortable looking at other languages anyway, since most AI researchers use Python. I see the names “A”, “B” and “C” and the float type being used explicitly. The other resource I have is a test from the flutter plugin. It uses simple data types for input and output, which shows me how to pack “A” and “B” inputs properly. You can see the complete code on GitHub. This is what I end up with:

A screenshot of code. The code is as follows:
  void _inferSingleAdd() async {
    OrtEnv.instance.init();
    final sessionOptions = OrtSessionOptions();
    final rawAssetFile = await rootBundle.load("assets/models/single_add.ort");
    final bytes = rawAssetFile.buffer.asUint8List();
    final session = OrtSession.fromBuffer(bytes, sessionOptions);
    final runOptions = OrtRunOptions();
    final inputOrt = OrtValueTensor.createTensorWithDataList(
        Float32List.fromList([5.9]),
    );
    final inputs = {'A':inputOrt, 'B': inputOrt};
    final outputs = session.run(runOptions, inputs);
    inputOrt.release();
    runOptions.release();
    sessionOptions.release();
    // session.release();
    OrtEnv.instance.release();
    List c = outputs[0]?.value as List;
    print(c[0] ?? "none");
  }
Figure 3: Code for inferring the simple model

I run into some exceptions with the session.release() call. From my investigations, this library might expect to be called from an isolate and I am not doing that yet. To move past the errors, I simply commented that line – but if I was doing this for a production app I would give the isolate a try and investigate further. For now, this will do.

Tip: When setting up ONNX Runtime, use a simple model. It eliminates issues that stem from processing, model complexity, supported operators, and so on.

Next step in my journey is to try a larger model. My end goal here is to work with images, and I feel prepared to start using the simplest model I can find. The perfect model to continue with is one that takes an image input and only applies some color filter or other easy to debug operation. I start looking for such a model but can’t find one. I land on a style transfer model from the ONNX Model Zoo archive. I pick the mosaic pretrained model and I immediately open it in Netron.

Screenshot of the Netron app, with a model open that requires large float matrices as input and output. The model is complex and large.
Figure 4: Netron showing a complex model

You can clearly see the input and output there: float32[1,3,224,224]. The numbers in brackets represent the shape of the tensor. The shape is important because we process our input and output to match that shape. When that shape was not respected, I usually got a runtime error telling me it expected something else. You can feed some models raw PNG or JPEG files, but not this model. This model requires a bit of processing.

Tip: Install Netron so you can simply double-click to view on your PC. Always check the inputs and outputs this way to avoid confusion.

I did not know about tensor shapes before this work, so maybe it’s worth pausing a bit to discuss what it means. If you have a simple matrix with 10 rows of 100 elements each, the shape is [10, 100]. The shape is the number of elements on each of the axes of the tensor. For an experienced computer vision machine learning developer, I expect that something like [1, 3, 224, 224] immediately screams “one image with 3 channels per pixel (Red, Green, Blue) of size 224 by 224 pixels”.

I first convert the ONNX file into ORT format and then add it to the app. I also prepare an image. I do not want to fiddle with resizing and transforming the input or output yet, so I fire up mspaint and make a 224 by 224 pixels, completely red image. During debugging, I also make a half red, half green image.

A red square. The red used here is the purest red available to a computer, meaning a 255 value for red and 0 for green and blue. This is typically represented as #FF0000
Figure 5: Red square

A square split down the middle vertically. The left side is red and the right side is green.
Figure 6: Half red, half green square

A red image of the exact size I need provides me with a simple to debug input. Working with ONNX Runtime or Machine Learning in general proves to be a lot of pre- or post-processing.

Tip: Working with images means very large arrays, which are hard to follow. Whenever some input or output is hard to debug, ask yourself what image you can manufacture to help find the problem.

For example, colors for each pixel are represented differently in Flutter or Android compared to these ONNX models. To drive this point, let’s consider an unusual 1×10 image. We have 10 pixels in total. Each has 4 color components. Let’s number each pixel 1 to 10 and each color component R (Red), G (Green), B (Blue) and A (Alpha). In the sample below, Flutter stores the image as:

  R1 G1 B1 A1 R2 G2 B2 A2 R3 G3 B3 A3 […] R10 G10 B10 A10

From what I see, due to how tensor reshaping works, to get the right ONNX Runtime Tensor, the image data must look like this:

  R1 R2 R3 […] R10 G1 G2 G3 […] G10 B1 B2 B3 […] B10

Reordering the colors and dropping the Alpha component to fit this format is our pre-processing and the code looks like this:

A screenshot of code. The code is as follows:
  Future<List<double>> imageToFloatTensor(ui.Image image) async {
    final imageAsFloatBytes = (await image.toByteData(format: ui.ImageByteFormat.rawRgba))!;
    final rgbaUints = Uint8List.view(imageAsFloatBytes.buffer);

    final indexed = rgbaUints.indexed;
    return [
    ...indexed.where((e) => e.$1 % 4 == 0).map((e) => e.$2.toDouble()),
    ...indexed.where((e) => e.$1 % 4 == 1).map((e) => e.$2.toDouble()),
    ...indexed.where((e) => e.$1 % 4 == 2).map((e) => e.$2.toDouble()),
    ];
  }
Figure 7: Code for converting image to tensor

Working with a red image here helps me debug the actual numbers I see in the tensor data. I expect to see 50176 (224×224) occurrences of the value 255 (maximum for red), followed by all zeros (green and blue). The result I get back from the model output also needs to be processed back to a Flutter image. This does the exact opposite of the input processing. Notice that I added the alpha back and set it to 255:

A screenshot of code. The code is as follows:
  Future<ui.Image> floatTensorToImage(List tensorData) {
    final outRgbaFloats = Uint8List(4 * 224 * 224);
    for (int x = 0; x < 224; x++) {
      for (int y = 0; y < 224; y++) {
        final index = x * 224 * 4 + y * 4;
        outRgbaFloats[index + 0] = tensorData[0][0][x][y].clamp(0, 255).toInt(); // r
        outRgbaFloats[index + 1] = tensorData[0][1][x][y].clamp(0, 255).toInt(); // g
        outRgbaFloats[index + 2] = tensorData[0][2][x][y].clamp(0, 255).toInt(); // b
        outRgbaFloats[index + 3] = 255; // a
      }
    }
    final completer = Completer<ui.Image>();
    ui.decodeImageFromPixels(outRgbaFloats, 224, 224, ui.PixelFormat.rgba8888, (ui.Image image) {
      completer.complete(image);
    });

    return completer.future;
  }
Figure 8: Code for converting tensor to image

When working with images, input and output are usually formatted the same way and post-processing mirrors what you do in pre-processing. You can feed the pre-processing into post-processing directly, without running the model and then render the results, to validate they are symmetrical. This does not mean that the model will work well with the data, but it can surface issues with your processing.

Tip: Working with images? Feed your pre-processing to your post-processing and display it on the screen. This makes many issues easy to spot.

And here is the result, using a photo of an Eurasian blue tit by Francis Franklin:

Two images of birds: left is a photo of an Eurasian blue tit, the right is a stylized interpretation that has been generated by the sample code
Figure 9: Bird image before and after stylizing as mosaic

Throughout this journey, I learned that making small steps is the way to go. Working with ORT can feel like using a black box and taking baby steps is essential for understanding the input and output at every stage.

Tip: Take the smallest step you can think of. There is a lot that can go wrong when processing large tensors such as those for images. Creating bespoke images to use as input is also a skill you need to learn.

Call to action

The post Use ONNX Runtime in Flutter appeared first on Surface Duo Blog.

]]>
https://devblogs.microsoft.com/surface-duo/flutter-onnx-runtime/feed/ 2
OpenAI Assistant functions on Android https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-29/ <![CDATA[Craig Dunn]]> Fri, 15 Dec 2023 01:02:08 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3645 <![CDATA[

Hello prompt engineers, This week, we are taking one last look at the new Assistants API. Previous blog posts have covered the Retrieval tool with uploaded files and the Code interpreter tool. In today’s post, we’ll add the askWikipedia function that we’d previously built to the fictitious Contoso employee handbook document chat. Configure functions in […]

The post OpenAI Assistant functions on Android appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

This week, we are taking one last look at the new Assistants API. Previous blog posts have covered the Retrieval tool with uploaded files and the Code interpreter tool. In today’s post, we’ll add the askWikipedia function that we’d previously built to the fictitious Contoso employee handbook document chat.

Configure functions in the playground

We’ll start by configuring the assistant in the OpenAI playground. This isn’t required – assistants can be created and figured completely in code – however it’s convenient to be able to test interactively before doing the work to incorporate the function into the JetchatAI Kotlin sample app.

Adding a function declaration is relatively simple – click the Add button and then provide a JSON description of the function’s capabilities. Figure 1 shows the Tools pane where functions are configured:

Screenshot of the Assistant playground tools pane, showing the add functions button and the already-defined askWikipedia function
Figure 1: Add a function definition in the playground (shows askWikipedia already defined)

The function configuration is just an empty text box, into which you enter the JSON that describes your function – its name, what it does, and the parameters it needs. This information is then used by the model to determine when the function could be used to resolve a user query.

The JSON that describes the askWikipedia function from our earlier post is shown in Figure 2. It has a single parameter – query – which the model will extract from the user’s input.

  {
    "name": "askWikipedia",
    "description": "Answer user questions by querying the Wikipedia website. Don't call this function if you can answer from your training data.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {
          "type": "string",
          "description": "The search term to query on the wikipedia website. Extract the subject from the sentence or phrase to use as the search term."
        }
      },
      "required": [
        "query"
      ]
    }
  }

Figure 2: JSON description of the askWikipedia function

In theory, this should be enough for us to test whether the model will call the function; however, my test question “what is measles?” would always be answered by the model without calling askWikipedia. Notice that my function description included the instruction “Don’t call this function if you can answer from your training data.” – this was because in the earlier example, the function was being called too often!

Since this assistant’s purpose is to discuss health plans, I updated the system prompt (called Instructions in the Assistant playground) to the following (new text in bold):

answer questions about employee benefits and health plans from uploaded files. ALWAYS include the file reference with any annotations. use the askWikipedia function to answer questions about medical conditions

After this change, the test question now triggers a function call!

Figure 3 shows what a function call looks like in the playground – because function calls are delegated to the implementing application, the playground needs you to interactively supply a simulated function return value. In this case, I pasted in the first paragraph of the Wikipedia page for “measles”, and the model summarized that into its chat response.

Zoomed in view of the chat when a function is triggered and is waiting for the result to be supplied
Figure 3: Testing a function call in the playground

With the function configured and confirmation that it will get called for the test question, we can now add the function call handler to our Android assistant in Kotlin.

Assistant function calling in Kotlin

In the original Assistant API implementation, there is a loop after the message is added to a run (shown in Figure 4) where the chat app waits for the model to respond with its answer:

  do {
     delay(1_000)
     val runTest = openAI.getRun(thread!!.id, run.id)
  } while (runTest.status != Status.Completed)

Figure 4: Checking message status until the run is completed and the model response is available

Now that we’ve added a function definition to the assistant, the runTest.status can change to RequiresAction (instead of Completed), which indicates that the model is waiting for one (or more) function return values to be supplied.

The AskWikipediaFunction class already contains code to search Wikipedia, so we just need to add some code to detect when a function call has occurred and provide the return value. The code shown in Figure 5 does that in these steps:

  1. Detect when the status becomes RequiresAction
  2. Get the run’s steps
  3. Examines the latest stepDetails for information about what action is required
  4. If a tool call is indicated, check if it’s for the FunctionTool
  5. Get the function name and parameters (which are in json key:value format)
  6. Execute the function locally with the parameters provided
  7. Create a toolOutput with the function return value
  8. Send the output back to the assistant

Once the assistant receives the function return value, it can construct its answer back to the user. This will set the status to Completed (exiting the do loop) and the code will display the model’s response to the user.

  do
  {
      delay(1_000)
      val runTest = openAI.getRun(thread!!.id, run.id)
      if (runTest.status == Status.RequiresAction) { // run is waiting for action from the caller
          val steps = openAI.runSteps(thread!!.id, run.id) // get the run steps
          val stepDetails = steps[0].stepDetails // latest step
          if (stepDetails is ToolCallStepDetails) { // contains a tool call
              val toolCallStep = stepDetails.toolCalls!![0] // get the latest tool call
              if (toolCallStep is ToolCallStep.FunctionTool) { // which is a function call
                  var function = toolCallStep.function
                  var functionResponse = "Error: function was not found or did not return any information."
                  when (function.name) {
                      "askWikipedia" -> {
                          var functionArgs = argumentsAsJson(function.arguments) ?: error("arguments field is missing")
                          val query = functionArgs.getValue("query").jsonPrimitive.content
                          // CALL THE FUNCTION!!
                          functionResponse = AskWikipediaFunction.function(query)
                  }
                  // Package the function return value
                  val to = toolOutput {
                      toolCallId = ToolId(toolCallStep.id.id)
                      output = functionResponse
                  }
                  // Send back to assistant
                  openAI.submitToolOutput(thread!!.id, run.id, listOf(to))
                  delay(1_000) // wait before polling again, to see if status is complete
              }
          }
      }
  } while (runTest.status != Status.Completed)

Figure 5: Code to detect the RequiresAction state and call the function

Note that there are a number of shortcuts in the code shown – when multiple functions are declared, the assistant might orchestrate multiple function calls in a run, and there are multiple places where better error/exception handling is required. You can read more about how multiple function calls might behave in the OpenAI Assistant documentation.

Here is a screenshot of JetchatAI for Android showing an assistant conversation using the askWikipedia function:

JetchatAI Android chat app screenshot showing a query about measles being answered from the assistant function
Figure 6: JetchatAI #assistant-chat showing a response generated by a function call

Feedback and resources

You can view the complete code for the assistant function call in this pull request for JetchatAI.

Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features like the Assistant API. 

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts. 

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev

The post OpenAI Assistant functions on Android appeared first on Surface Duo Blog.

]]>
OpenAI Assistant code interpreter on Android https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-28/ <![CDATA[Craig Dunn]]> Mon, 11 Dec 2023 02:12:58 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3632 <![CDATA[

Hello prompt engineers, Over the last few weeks, we’ve looked at different aspects of the new OpenAI Assistant API, both prototyping in the playground and using Kotlin in the JetchatAI sample. In this post we’re going to add the Code Interpreter feature which allows the Assistants API to write and run Python code in a […]

The post OpenAI Assistant code interpreter on Android appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

Over the last few weeks, we’ve looked at different aspects of the new OpenAI Assistant API, both prototyping in the playground and using Kotlin in the JetchatAI sample. In this post we’re going to add the Code Interpreter feature which allows the Assistants API to write and run Python code in a sandboxed execution environment. By using the code interpreter, chat interactions can solve complex math problems, code problems, read and parse data files, and output formatted data files and charts.

To keep with the theme of the last few examples, we are going to test the code interpreter with a simple math problem related to the fictitious Contoso health plans used in earlier posts.

Enabling the code interpreter

The code interpreter is just a setting to be enabled, either in code or via the playground (depending on where you have set up your assistant). Figure 1 shows the Kotlin for creating an assistant using the code interpreter – including setting a very basic system prompt/meta prompt/instructions:

val assistant = openAI.assistant(
    request = AssistantRequest(
        name = "doc chat",
        instructions = "answer questions about health plans",
        tools = listOf(AssistantTool.CodeInterpreter), // enables the code interpreter
        model = ModelId("gpt-4-1106-preview")
    )
)

Figure 1: enabling the code interpreter in Kotlin when creating an assistant

As discussed in the first Kotlin assistant post, JetchatAI loads an assistant definition that was configured in the OpenAI playground, so it’s even easier to enable the Code interpreter by flipping this switch:

Figure 2: enabling the code interpreter in the OpenAI playground

Extending JetchatAI #assistant-chat

With the code interpreter enabled, we can test it both interactively in the playground and in the JetchatAI app. The test question is “if the health plan costs $1000 a month, what is deducted weekly from my paycheck?”.

The playground output includes the code that was generated and executed in the interpreter – in Figure 3 you can see it creates variables from the values mentioned in the query and then a calculation that returns a value to the model, which is incorporated into the response to the user.

Screenshot of the OpenAI Assistant playground showing a user query about how much $1000 a month is in weekly payments, along with the output from the code interpreter and the model's answer of approx $230 a week.
Figure 3: Testing the code interpreter in the playground with a simple math question (using GPT-4)

Notice that the code interpreter just returns a numeric answer, and the model decides to ‘round up’ to “$230.77” and format as a dollar amount.

When implementing the Assistant API in an app, the code interpreter step (or steps) would not be rendered (in the same way that you don’t render function call responses), although they are available in the run’s step_details data structure so you could still retrieve and display them to the user, or use for logging/telemetry or some other purpose in your app.


Figure 4: Testing the same query on Android (also using GPT-4)

No code changes were made to the JetchatAI Android sample for this testing, it just needs the Assistant’s configuration updated as shown.

Why add the code interpreter?

One of the reasons that the code interpreter option exists is because LLMs “on their own” can be terribly bad at math. Here are some examples using the same prompt on different models without a code interpreter to help:

  • GPT 3.5 Turbo playground – answers $250
  • GPT 4 playground – answers $231.17
  • ChatGPT – answers $231.18

The full response for each of these is shown below – I included the ChatGPT response separately because although it’s likely using similar models to what I have access to in their playground, it has its own system prompt/meta prompt which can affect how it approaches these types of problems. All three of these examples appear to be following some sort of chain of thought prompting, although they take different approaches. While some of these answers are close to the code interpreter result, that’s probably not ideal when money’s involved! How chat applications respond to queries with real-world implications (like financial advice, for example) is something that should be evaluated through the lens of responsible AI – remember this is just a fictitious example to test the LLM’s math skills.

GPT 3.5 Turbo playground (without code interpreter)

GPT 3.5 Turbo attempts to ‘walk through’ the calculation, but doesn’t seem to understand all months will not necessarily contain exactly two pay days. At least the response includes a disclaimer to consult HR or payroll to verify!


Figure 5: output from gpt-3.5-turbo for the same query

It’s worth noting that when the code interpreter is enabled with gpt-3.5-turbo in the playground, it returns $230.77 (just like gpt-4 with the interpreter does in Figures 3 and 4).

GPT 4 playground (without code interpreter)

Using the GPT 4 model results in a different set of steps to solve, but unlike the code interpreter which multiplies out the total cost first, this solution calculates the average number of weeks in a month. The first mistake is that value is 4.3(repeating), and by rounding to 4.33 it will get a slightly different answer. The second mistake is that $1000/4.33= 230.9468822170901; however it returns a result of $231.17 which is about 22 cents different to what the rounded answer should be. It also includes a disclaimer and advice to confirm with HR or payroll.


Figure 6: output from gpt-4 for the same query

ChatGPT (public chat)

The public ChatGPT follows similar logic to the GPT-4 playground, although it has better math rendering skills. It makes the same two mistakes, first rounding an intermediate value in the calculation, and then still failing to divide 1000/4.33 accurately.


Figure 7: output from ChatGPT for the same query

Summary

OpenAI models without the code interpreter feature seem to have some trouble with mathematical questions, returning different answers for the same question depending on the model and possibly the system prompt and context. In the simple testing above, the code interpreter feature does a better job of calculating a reasonable answer, and can do so more consistently on both gpt-4 and gpt-3.5-turbo models.

Resources and feedback

Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features like the Assistant API. 

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts. 

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev

The post OpenAI Assistant code interpreter on Android appeared first on Surface Duo Blog.

]]>
OpenAI Assistant on Android https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-27/ <![CDATA[Craig Dunn]]> Fri, 01 Dec 2023 03:09:06 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3623 <![CDATA[

Hello prompt engineers, This week we’re continuing to discuss the new Assistant API announced at OpenAI Dev Day. There is documentation available that explains how the API works and shows python/javascript/curl examples, but in this post we’ll implement in Kotlin for Android and Jetpack Compose. You can review the code in this JetchatAI pull request. […]

The post OpenAI Assistant on Android appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

This week we’re continuing to discuss the new Assistant API announced at OpenAI Dev Day. There is documentation available that explains how the API works and shows python/javascript/curl examples, but in this post we’ll implement in Kotlin for Android and Jetpack Compose. You can review the code in this JetchatAI pull request.

OpenAI Assistants

A few weeks ago, we demonstrated building a simple Assistant in the OpenAI Playground – uploading files, setting a system prompt, and performing RAG-assisted queries – mimicking this Azure demo. To refresh your memory, Figure 1 shows the Assistant configuration:

Screenshot of the OpenAI Assistant Playground showing the test configuration
Figure 1: OpenAI Assistant Playground (key explained in this post)

We are going to reference this specific assistant, which has been configured in the playground, from an Android app. After the assistant is created, a unique identifier is displayed under the assistant name (see Figure 2), which can be referenced via the API.


Figure 2: Assistant unique identifier

Building in Kotlin

The openai-kotlin GitHub repo contains some basic instructions for accessing the Assistant API in Kotlin. This guidance has been adapted to work in JetchatAI, as shown in Figure 3. Comparing this file (AssistantWrapper.kt) to earlier examples (eg. OpenAIWrapper.kt) you’ll notice it is significantly simpler! Using the Assistant API means:

  • No need for tracking token usage and message history for the sliding window – the Assistant API will automatically manage the model input size limit.
  • No need to keep sending past messages with each request, since they are stored on the server. Each user query is added to a thread which is then run against a configured assistant, entirely on the server.
  • No need to manually load or chunk document contents – we can upload documents and they will be chunked and stored on the server. Embeddings will automatically be generated.
  • No need to generate an embedding for the user’s query or do the vector similarity comparisons in Kotlin. The RAG will be done by the Assistant API on the server.

Many lines of code over multiple files in older examples can be reduced to the chat function shown in Figure 3:

suspend fun chat(message: String): String {
    if (assistant == null) { // open assistant and create a new thread every app-start
       assistant = openAI.assistant(id = AssistantId(Constants.OPENAI_ASSISTANT_ID)) // from the Playground
       thread = openAI.thread()
    }
    val userMessage = openAI.message (
       threadId = thread!!.id,
       request = MessageRequest(
          role = Role.User,
          content = message
       )
    )
    val run = openAI.createRun(
       threadId = thread!!.id,
       request = RunRequest(assistantId = assistant!!.id)
    )
    do
    {
       delay(1_000)
       val runTest = openAI.getRun(thread!!.id, run.id)
    } while (runTest.status != Status.Completed)
    val messages = openAI.messages(thread!!.id)
    val message = messages.first() // bit of a hack, get the last one generated
    val messageContent = message.content[0]
    if (messageContent is MessageContent.Text) {
       return messageContent.text.value
    }
    return "<Assistant Error>" // TODO: error handling
}

Figure 3: new chat function using a configured Assistant (by Id) using Kotlin API

Assistants can also support function calling, but we haven’t included any function demonstration in this sample.

Try it out

To access the new Assistant channel in JetchatAI, first choose #assistant-chat from the chat list:

Screenshot of the JetchatAI app showing the channel menu and how to choose the assistant chat
Figure 4: Access the #assistant-chat channel in the Chats menu of JetchatAI

You can now ask questions relating to the content of the five source PDF documents that were uploaded in the playground. The documents relate to employee policies and health plans of the fictitious Contoso/Northwind organizations. There are two example queries and responses in the screenshot in Figure 5:

Screenshot of the JetchatAI app showing two user queries answered by the assistant
Figure 5: user queries against the Assistant loaded with fictitious ‘employee policy’ PDFs as the data source

Next steps

The Assistant API is still in preview, so it could change before the official release. The current JetchatAI implementation needs some more work in error handling, resolving citations, checking for multiple assistant responses, implementing functions, and more. The takeaway from this week is how much simpler the Assistant API is to use versus implementing the Chat API directly.

Resources and feedback

Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features. 

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts. 

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev

The post OpenAI Assistant on Android appeared first on Surface Duo Blog.

]]>
Test the latest AI features in Kotlin https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-26/ <![CDATA[Craig Dunn]]> Thu, 23 Nov 2023 22:48:20 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3611 <![CDATA[

Hello prompt engineers, Last week we looked at one of the new OpenAI features – Assistants – in the web playground, but good news: the OpenAI Kotlin library is already being updated with the new APIs and you can start to try them out right now in your Android codebase with snapshot package builds. With […]

The post Test the latest AI features in Kotlin appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

Last week we looked at one of the new OpenAI features – Assistants – in the web playground, but good news: the OpenAI Kotlin library is already being updated with the new APIs and you can start to try them out right now in your Android codebase with snapshot package builds. With a few minor configuration changes you can start testing the latest AI features and get ready for a supported package release.

Use OpenAI Kotlin library snapshots

While new features are being added to the Kotlin library, you can track progress from this GitHub issue and the related PRs including support for the updated Images API as well as the Assistants API. Not all features will be complete (at the time of writing).

The owner (aallam) of the openai-kotlin repo is publishing pre-release snapshot builds of these features for developers to play with prior to an official release. You can see the changes required to test the snapshot in this JetchatAI pull request:

  1. Add the snapshot package source to the settings.gradle.kts file:
    maven { url = uri("https://oss.sonatype.org/content/repositories/snapshots/") }
  2. Update the openai package version reference to 3.6.0-SNAPSHOT (in gradle/libs.versions.toml):
    openai = "com.aallam.openai:openai-client:3.6.0-SNAPSHOT"
  3. You can now start working with the “beta” API implementations (such as the DALL·E 3 image support)

Once the completed 3.6.0 release is available, don’t forget to remove the snapshots package source and update the version reference to a stable number.

Test with DALL·E 3

One of the new features announced at OpenAI Dev Day is DALL·E 3 support in the Image generation API. You can read more in the FAQs and documentation.

To use the updated API, first reference the snapshot builds as described above, and then add the new model declaration to the ImageCreation constructor:

val imageRequest = ImageCreation(prompt = prompt, model = ModelId("dall-e-3"))

Note that the model parameter did not exist in previous builds of the OpenAI Kotlin library, so if you see any compile errors then re-check your gradle changes to ensure you’re referencing the latest snapshot.

The images in Figure 1 compare the same prompt being run on DALL·E 2 and DALL·E 3 in the JetchatAI demo:

Screenshots of two JetchatAI chats, the left showing an image from DALL-E2 and the right showing an image from DALL-E 3. The DALLE-E 3 image is hyper-real and looks more impressive.
Figure 1: image response from DALL·E 2 and DALL·E 3 for the prompt “a bucolic image”

Have fun playing with DALL·E 3 and the other new APIs!

Resources and feedback

Refer to the OpenAI blog for more details on the Dev Day announcements, and the openai-kotlin repo for updates on support for the new features. 

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts. 

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev

The post Test the latest AI features in Kotlin appeared first on Surface Duo Blog.

]]>
OpenAI Assistants https://devblogs.microsoft.com/surface-duo/openai-assistants/ <![CDATA[Craig Dunn]]> Sat, 18 Nov 2023 18:55:46 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3604 <![CDATA[

Hello prompt engineers, OpenAI held their first Dev Day on November 6th, which included a number of new product announcements, including GPT-4 Turbo with 128K context, function calling updates, JSON mode, improvements to GPT-3.5 Turbo, the Assistant API, DALL*E 3, text-to-speech, and more. This post will focus just on the Assistant API because it greatly […]

The post OpenAI Assistants appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

OpenAI held their first Dev Day on November 6th, which included a number of new product announcements, including GPT-4 Turbo with 128K context, function calling updates, JSON mode, improvements to GPT-3.5 Turbo, the Assistant API, DALL*E 3, text-to-speech, and more. This post will focus just on the Assistant API because it greatly simplifies a lot of the challenges we’ve been addressing in the JetchatAI Android sample app.

Assistants

The Assistants overview explains the key features of the new API and how to implement an example in Python. In today’s blog post we’ll compare the new API to the Chat API we’ve been using in our Android JetchatAI sample, and specifically to the #document-chat demo that we built over the past few weeks.

Big differences between the Assistants API and the Chat API:

  • Stateful – the history is stored on the server, and the client app just needs to track the thread identifier and send new messages.
  • Sliding window – automatic management of the model’s context window.
  • Doc upload and embedding – can be easily seeded with a knowledgebase which will be automatically chunked and embeddings created.
  • Code interpreter – Python code can be generated and run to answer questions.
  • Functions – supported just like the Chat API.

Revisiting the #document-chat…

The last two posts have been about building a basic document chat similar to this Azure OpenAI sample, using a lot of custom Kotlin code. OpenAI Assistants now make this trivial to build, as shown in Figure 1 below. This example shows how to prototype an Assistant in the Playground that can answer questions based on documentation sources.

You can see the result is very similar to the responses from the previous blog post:

Screenshot of the OpenAI Assistants playground with an example query with some screen elements highlighted and numbered.
Figure 1: OpenAI Assistant playground, with the sample PDFs uploaded and supporting typical user queries

Each element in the playground in Figure 1 is explained below:

  1. The Assistant Instructions are like the system prompt.
  2. Enable Retrieval on the assistant and upload the source documents (in our case, the fictitious Contoso PDFs).
  3. Add personalization information to augment the query, eg “my plan is the Northwind Health Standard plan”. App application implementation would add this automatically, not require the user to enter it every time.
  4. The User query “does my plan cover contact lenses and immunizations”
  5. doc chat shows the result from the model is grounded in the uploaded documents, and includes citations to the relevant information source.

Comparing this example to the Chat Playground prototype last week you can see the Assistants API is much simpler.

Coming to Kotlin

The Assistant API is still in beta, so expect changes before it’s ready for production use, however it’s certain that the Kotlin code to implement the Assistant will be much simpler than the Chat API. Rather than keeping track of the entire conversation, managing the context window, chunking the source data, and prompt-tweaking for RAG, in future the code will only keep track of the thread identifier and exchange new queries and responses with the server. The citations can also be programmatically resolved when you implement the API (even though they appear as illegible model-generated strings in the playground).

The OpenAI Kotlin open-source client library is currently implementing the Assistant API – see the progress in this pull request.

Resources and feedback

Refer to the OpenAI blog for more details on the Dev Day announcements, and the Assistants docs for more technical detail.

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.

The post OpenAI Assistants appeared first on Surface Duo Blog.

]]>
Chunking for citations in a document chat https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-25/ <![CDATA[Craig Dunn]]> Mon, 13 Nov 2023 00:42:30 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3593 <![CDATA[

Hello prompt engineers, Last week’s blog introduced a simple “chat over documents” Android implementation, using some example content from this Azure demo. However, if you take a look at the Azure sample, the output is not only summarized from the input PDFs, but it’s also able to cite which document the answer is drawn from […]

The post Chunking for citations in a document chat appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

Last week’s blog introduced a simple “chat over documents” Android implementation, using some example content from this Azure demo. However, if you take a look at the Azure sample, the output is not only summarized from the input PDFs, but it’s also able to cite which document the answer is drawn from (showing in Figure 1). In this blog, we’ll investigate how to add citations to the responses in JetchatAI.


Figure 1: Azure OpenAI demo result shows citations for the information presented in the response

In order to provide similar information in the JetchatAI document chat on Android, we’ll need to update the document parsing (chunking) so that we have enough context to answer questions and identify the source.

Prompt engineering playground

Before spending a lot of time on the parsing algorithm, it makes sense to confirm that we can get the model to understand what we want to achieve. To quickly iterate prototypes for this feature, I simulated a request/response in the OpenAI Playground, using the existing prompts from the app and some test embeddings from testing the document chat feature:


Figure 2: OpenAI playground for testing prompt ideas, with a prototype for answering a document chat with cited sources

Figure 2 shows an example chat interaction based on the documents we added to the app in the previous blog. The facts listed in the USER prompt (#4) are examples of the embeddings from testing the existing feature. Each element of the “prompt prototype” is explained below:

  1. Existing system prompt and grounding introduction (unchanged).
  2. Specify which plan the user has, to help answer questions more specifically.
  3. Updates to the system prompt and the grounding prompt to teach the model how to cite sources. The system prompt explains what citations should “look like”, with [1] numbered square brackets, and the grounding reinforces that citations should be used and added to the end of the response.
  4. The similar embeddings are now grouped by the document they were extracted from, and the # markdown-style emphases on the filename helps the model to group the data that follows. This test data is actual embeddings from testing the document chat feature previously.
  5. The user’s query, which is added to the end of the grounding data (from embeddings) and prompt.
  6. The model’s response attempts to refer to “Your plan” and hopefully distinguishes the plan mentioned in the system prompt (#2) from other plan features.
  7. Two citations are provided in the response, because the vision and immunication chunks are from different source documents.
  8. The model correctly adds the cited documents at the end of the response.

Slightly changing the user-prompt to “does my plan cover contact lenses” (without mentioning immunizations), we can confirm that the answer and cited documents changes:


Figure 3: OpenAI playground example where only one source document is cited

Note that in Figure 3 the citation numbering seems to reflect the position of the “document” in the grounding prompt. Although this should be numbered from one, I’m going to ignore it for now (another exercise for the reader). The updated prompt and grounding format works well enough to be added to the app for further testing.

Updated chunking and embeddings

Now that we’ve established a prompt that works in the OpenAI playground, we need to update the app to parse the documents differently so that we can re-create the grounding format in code.

Currently, the sentence embeddings are all added without keeping track of the source document. When they’re added to the grounding data, they are ordered by similarity score (highest first).

To implement the prompt and grounding prototyped above, we need to:

  1. Alter the document parsing so that we keep track of which document each embedding comes from,
  2. After we’ve identified the most similar embeddings, group them by document name, and
  3. Update the system and grounding prompts to train the model to create citations.

The code for these two changes is shown below (and is in this pull request), followed by final app testing.

Chunking changes

Because the code from last week was already keepting track of ‘document id’ as it parsed the resource files, minimal changes were needed to keep track of the actual filenames.

Firstly, a new array rawFilenames contains the user-friendly filename representation for each resource:

val rawResources = listOf(R.raw.benefit_options, R.raw.northwind_standard_benefits_details)
val rawFilenames = listOf<String>("Benefit-options.pdf", "Northwind-Standard-benefits-details.pdf")

Figure 4: adding the user-friendly filename strings (must match the resources order)

Then as the code is looping through the resources, we add the user-friendly filename to a cache, keyed by the ‘document id’ we already have stored as part of the embeddings key:

for (resId in rawResources) {
    documentId++
    documentCache["$documentId"] = rawFilenames[documentId]  // filename will be shown to user

Figure 5: storing the filename to match the documentId for later retrieval

It’s now possible to determine which document a given sentence was found in.

Grounding tweaks

When the document filename is stored for each embedding, the code building the grounding prompt can group the embeddings under document “headings” so that the model can better understand the context for the embedding strings.

For the document filenames to be useful, the system prompt must be updated to match the prototype in Figure 2. Figure 6 below shows the updated system prompt from the DocumentChatWrapper.kt init function:

grounding = """
   You are a personal assistant for Contoso employees. 
   You will answer questions about Contoso employee benefits from various employee manuals.
   Your answers will be short and concise. 
   Only use the functions you have been provided with.
   The user has Northwind Standard health plan.
   For each piece of information you provide, cite the source in brackets like so: [1].
   At the end of the answer, always list each source with its corresponding number and provide the document name, like so [1] Filename.doc""".trimMargin()

Figure 6: updated system prompt (including a personalization statement about the user’s current plan)

The code in Figure 7 shows the grounding function changes to support citations, producing output similar to the prototype grounding in Figure 2. After ranking the embeddings by similarity (and ignoring results with less than 0.8 similarity score), it loops through and groups sentences by document filename:

var matches = sortedVectors.tailMap(0.8)
// re-sort based on key, to group by filename
var sortedMatches: SortedMap<String, String> = sortedMapOf()
for (dpKey in sortedVectors.tailMap(0.8)) {
    val fileId = dpKey.value.split('-')[0] // the document id is the first part of the embedding key
    val filename = documentNameCache[fileId]!!
    val content = documentCache[dpKey.value]!!
    if (sortedMatches.contains(filename))
    { // add to current ‘file’ matching sentences
        sortedMatches[filename] += "\n\n$content"
    } else { // first match for this filename
        sortedMatches[filename] = content
    }
}
// loop through filenames and output the matching sentences for each file
messagePreamble = "The following information is extracted from Contoso employee handbooks and health plan documents:"
for (file in sortedMatches) {
    messagePreamble += "\n\n# ${file.key}\n\n${file.value}\n\n#####\n\n" // use the # pound markdown-like heading syntax for the filename,
}
messagePreamble += "\n\nUse the above information to answer the following question, providing numbered citations for document sources used (mention the cited documents at the end by number). Synthesize the information into a summary paragraph:\n\n"

Figure 7: updated grounding function

Now that the code has been updated to:

  1. Keep track of which document each embedding sentence was found in,
  2. Group high-similarity embedding results by document filename, and
  3. Add instructions in the system and grounding prompts to cite the source of facts in the model’s response.

The responses in the JetchatAI document chat should now include numbered citations.

Citations in the chat

With these relatively small changes in the code, the #document-chat conversation in JetchatAI will now add citations when asked questions about the fictitious Contoso employee benefits documents that are referenced via RAG principles:

Two screenshots of the JetchatAI app running on Android, with user queestions and model answers containing numbered citations.

Figure 8: JetchatAI showing citations when referencing source documents

Feedback and resources

This post is closely related to the document chat implementation post.

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.

The post Chunking for citations in a document chat appeared first on Surface Duo Blog.

]]>
Document chat with OpenAI on Android https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-24/ <![CDATA[Craig Dunn]]> Sat, 04 Nov 2023 03:28:10 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3582 <![CDATA[

Hello prompt engineers, In last week’s discussion on improving embedding efficiency, we mentioned the concept of “chunking”. Chunking is the process of breaking up a longer document (ie. too big to fit under a model’s token limit) into smaller pieces of text, which will be used to generate embeddings for vector similarity comparisons with user […]

The post Document chat with OpenAI on Android appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

In last week’s discussion on improving embedding efficiency, we mentioned the concept of “chunking”. Chunking is the process of breaking up a longer document (ie. too big to fit under a model’s token limit) into smaller pieces of text, which will be used to generate embeddings for vector similarity comparisons with user queries (just like the droidcon conference session data).

Inspired by this Azure Search OpenAI demo, and also the fact that ChatGPT itself released a PDF-ingestion feature this week, we’ve added a “document chat” feature to the JetchatAI Android sample app. To access the document chat demo, open JetchatAI and use the navigation panel to change to the #document-chat conversation:

Screenshot of JetchatAI on Android, showing the slide-out navigation panel
Figure 1: access the #document-chat

To build the #document-chat we re-used a lot of code and added some PDF document content from an Azure chat sample.

Code foundations

In the pull-request for this feature, you’ll see a number of new files that were cloned from existing code to create the #document-chat channel:

  • DocumentChatWrapper – sets the system prompt to guide the model to only answer “Contoso employee” questions
  • DocumentDatabase – functions to store the text chunks and embeddings in Sqlite so they are persisted across app restarts
  • AskDocumentFunction – SQL generating function that can attempt searches on the text chunks in the database. Ideally, we would provide a semantic full-text search backend, but in this example only basic SQL text matching is supported.

The bulk of this code is identical to the droidcon conference chat demo, except instead of a hardcoded database of session details, we needed to write new code to parse and store the content from PDF documents. This new code exists mainly in the loadVectorCache and initVectorCache functions (as well as a new column in the embeddings Sqlite database to hold the corresponding content).

Reading the source documents

To create the data store, we used the test data associated with the Azure Search demo on GitHub: six documents that describe the fictitious Contoso company’s employee handbook and benefits. These are provided as PDFs, but to keep our demo simple I manually copied the text into .txt files which are added to the JetchatAI raw resources folder. This means we don’t have to worry about PDF file format parsing, but can still play around with different ways of chunking the content.

The code to load these documents from the resources folder is shown in Figure 2:

  var documentId = -1
  val rawResources = listOf(R.raw.benefit_options) // R.raw.employee_handbook, R.raw.perks_plus, R.raw.role_library, R.raw.northwind_standard_benefits_details, R.raw.northwind_health_plus_benefits_details
  for (resId in rawResources) {
      documentId++
      val inputStream = context.resources.openRawResource(resId)
      val documentText = inputStream.bufferedReader().use { it.readText() }

Figure 2: loading the source document contents

Once we’ve loaded the contents of each document, we need to break it up before creating embeddings that can be used to match against user queries (and ultimately answer their questions with retrieval augmented generation).

Chunking the documents

This explanation of chunking strategies outlines some of the considerations and methods for breaking up text to use for RAG-style LLM interactions. For our initial implementation we are going to take a very simplistic approach, which is to create an embedding for each sentence:

   val documentSentences = documentText.split(Regex("[.!?]\\s*"))
   var sentenceId = -1
   for (sentence in documentSentences){
       if (sentence.isNotEmpty()){
           sentenceId++
           val embeddingRequest = EmbeddingRequest(
               model = ModelId(Constants.OPENAI_EMBED_MODEL),
               input = listOf(sentence)
           )
           val embedding = openAI.embeddings(embeddingRequest)
           val vector = embedding.embeddings[0].embedding.toDoubleArray()
           // add to in-memory cache
           vectorCache["$documentId-$sentenceId"] = vector
           documentCache["$documentId-$sentenceId"] = sentence

Figure 3: uses regex to break into sentences and creates/stores an embedding vector for each sentence

Although this is the simplest chunking method, there are some drawbacks:

  • Headings and short sentences probably don’t have enough information to make useful prompt grounding.
  • Longer sentences might still lack context that would help the model answer questions accurately.

Even so, short embeddings like this can be functional, as shown in the next section.

NOTE: The app needs to parse and generate embeddings for ALL the documents before it can answer any user queries. Generating the embeddings can take a few minutes because of the large number of embedding API requests required. Be prepared to wait the first time you use the demo if parsing all six source files. Alternatively, changing the rawResources array to only load a single document (like R.raw.benefit_options) will start faster and still be able to answer basic questions (as shown in the examples below). The app saves the embeddings to Sqlite so subsequent executions will be faster (unless the Sqlite schema is changed or the app is deleted and re-installed).

Document answers from embeddings and SQL search

With just this relatively minor change to our existing chat code (and adding the embedded files), we can ask fictitious employee questions (similar to those shown in the Azure Search OpenAI demo):

Screenshot of JetchatAI with questions and answers about loaded documents
Figure 4: Ask questions about documents in JetchatAI

These two example queries are discussed below, showing the text chunks that are used for grounding.

“does my plan cover annual eye exams”

The first test user query returns ten chunks where the vector similarity score was above the arbitrary 0.8 threshold. Figure 5 shows a selection of the matches (some removed for space), but you can also see that the grounding prompt has the introduction The following information is extract from Contoso employee handbooks and help plans: and instruction Use the above information to answer the following question: to guide the model when this is included in the prompt:

The following information is extract from Contoso employee handbooks and health plans:
                                                                                                      
Comparison of Plans
Both plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive care services

This plan also offers coverage for preventive care services, as well as prescription drug coverage

Northwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings

Northwind Standard only offers coverage for vision exams and glasses

Both plans offer coverage for vision and dental services, as well as medical services

Use the above information to answer the following question:

Figure 5: the grounding information for the user query “does my plan have annual eye exams”

Because we have also registered the AskDocumentFunction an SQL query (Figure 6) is also generated for the query, however the exact phrase “annual eye exam” does not have any matches and no additional grounding is provided by the function call.

SELECT DISTINCT content FROM embedding WHERE content LIKE '%annual eye exams%'

Figure 6: text search is too specific and returns zero results

The grounding in Figure 5 is enough for the model to answer the question with “Yes your plan covers annual eye exams”.

Note that the user query mentioned “my plan”, and the model’s response asserts that “your plan covers…”, probably because in the grounding data the statements include “Both plans offer coverage…”. We have not provided any grounding on what plan the user is signed up for, but that could be another improvement (perhaps in the system prompt) that would help answer more accurately.

“what about dental”

The second test query only returns three chunks with a vector similarity score above 0.8 (shown in Figure 7)

The following information is extract from Contoso employee handbooks and health plans:
                                                                                                      
Northwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings

Both plans offer coverage for vision and dental services, as well as medical services

Both plans offer coverage for vision and dental services

Use the above information to answer the following question:

Figure 7: the grounding information for the user query “what about dental”

The model once again triggers the dynamic SQL function to perform a text search for “%dental%”, which returns the four matches shown in Figure 8.

SELECT DISTINCT content FROM embedding WHERE content LIKE '%dental%'
 -------

[('Northwind Health Plus
Northwind Health Plus is a comprehensive plan that provides comprehensive coverage for medical, vision, and dental services')
,('Northwind Standard Northwind Standard is a basic plan that provides coverage for medical, vision, and dental services')
,('Both plans offer coverage for vision and dental services')
,('Northwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings')
,('Both plans offer coverage for vision and dental services, as well as medical services')]

Figure 8: SQL function results for the user query “what about dental?”

The chunks returned from the SQL query mostly overlap with the embeddings matches. The model uses this information to generate the response “Both plans offer coverage for dental services, including dental exams, cleanings, and fillings.”

If you look closely at the grounding data, there’s only evidence that the “Health Plus” plan covers fillings (there is no explicit mention that the “Standard” plan offers anything beyond “dental services”). This means that the answer given could be giving misleading information about fillings being covered by both plans – it may be a reasonable assumption given the grounding, or it could fall into the ‘hallucination’ category. If the chunks were larger then the model might have more context to understand which features are associated with which plan.

This example uses the simplest possible chunking strategy, and while some questions can be answered it’s likely that a more sophisticated chunking strategy will support more accurate responses. In addition, including more information about the user could result in more personalized responses.

Resources and feedback

Some additional samples that demonstrate building document chat services with more sophisticated search support:

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.

The post Document chat with OpenAI on Android appeared first on Surface Duo Blog.

]]>
More efficient embeddings https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-23/ <![CDATA[Craig Dunn]]> Mon, 30 Oct 2023 01:33:36 +0000 <![CDATA[AI]]> <![CDATA[chatgpt]]> <![CDATA[openai]]> https://devblogs.microsoft.com/surface-duo/?p=3563 <![CDATA[

Hello prompt engineers, I’ve been reading about how to improve the process of reasoning over long documents by optimizing the chunking process (how to break up the text into pieces) and then summarizing before creating embeddings to achieve better responses. In this blog post we’ll try to apply that philosophy to the Jetchat demo’s conference […]

The post More efficient embeddings appeared first on Surface Duo Blog.

]]>
<![CDATA[

Hello prompt engineers,

I’ve been reading about how to improve the process of reasoning over long documents by optimizing the chunking process (how to break up the text into pieces) and then summarizing before creating embeddings to achieve better responses. In this blog post we’ll try to apply that philosophy to the Jetchat demo’s conference chat, hopefully achieving better chat responses and maybe saving a few cents as well.

Basic RAG embedding

When we first wrote about building a Retrieval Augmented Generation (RAG) chat feature, we created a ‘chunk’ of information for each conference session. This text contains all the information we have about the session, and it was used to:

  • Create an embedding vector that we compare against user queries, AND
  • Add to the chat prompt as grounding context when there is a high vector similarity between the embeddings for the chunk and the user query.

Figure 1 shows an example of how the text was formatted (with key:value pairs) and the types of information provided:

Speaker: Craig Dunn
Role: Software Engineer at Microsoft
Location: Robertson 1
Date: 2023-06-09
Time: 16:30
Subject: AI for Android on- and off-device
Description: AI and ML bring powerful new features to app developers, for processing text, images, audio, video, and more. In this session we’ll compare and contrast the opportunities available with on-device models using ONNX and the ChatGPT model running in the cloud.

Figure 1: an example of the session description data format used in the original Jetchat sample app

Using all the information fields for embeddings and grounding worked fine for our use case, and we’ve continued to build additional features like sliding window and history caching based on this similarity matching logic. However, that doesn’t mean it couldn’t be further improved!

More efficient embeddings

When you consider how the embedding vector is used – to compare against the user query to match on general topics and subject similarity – it seems like we could simplify the information we use to create the embedding, such as removing the speaker name, date, time, and location keys and values. This information is not well suited to matching embeddings anyway (chat functions and dynamic SQL querying work much better for questions about those attributes), so we can reduce the text chunk used for generating embeddings to the speaker role, subject, and description as shown in Figure 2:

Speaker role: Software Engineer at Microsoft
Subject: AI for Android on- and off-device
Description: AI and ML bring powerful new features to app developers, for processing text, images, audio, video, and more. In this session we’ll compare and contrast the opportunities available with on-device models using ONNX and the ChatGPT model running in the cloud.

Figure 2: a more focused text chunk for embedding. The role was included since for this dataset and expected query usage it often contains relevant context.

There is an immediate improvement in cost efficiency, as the new text chunk is only 73 tokens, versus 104 tokens for the complete text in Figure 1 (which would be a 25% saving in the cost of calculating all the embeddings; although note that some have longer descriptions than others so the amount of cost savings will vary). While embedding API calls are much cheaper (at $0.0004 per 1000 tokens) than the chat API ($0.002 to $0.06 per 1000 tokens), it’s still a cost that can add up over time so it makes sense to reduce the amount of tokens used to create embeddings if possible.

Note that the shorter text chunk is ONLY used for creating the embedding vector. When the vector similarity with the user query is high enough, the ORIGINAL text with all the fields is what is added to the chat prompt grounding. This ensures that the chat model can still respond with speaker, date, time, and location information in the chat.

Screenshot of Jetchat AI answering a question about Jetpack Compose sessions
Figure 3: Screenshot showing the generated response still includes speaker, date, time, and location information

Better results?

Testing with some common user queries from other blog posts in this series, the vector similarity scores are very close when comparing the query embedding vector against the larger text chunk (old score) and the smaller text chunk (new score). About a quarter of the sample were slightly lower scoring (highlighted in red), but the rest resulted in higher similarity scores.

User query

Matching session

Old similarity score

New similarity score

Are there any sessions on AI

AI for Android on- and off-device

0.807

0.810

Are there any sessions on gradle

Improving Developer Experience with Gradle Build Scans (Rooz Mohazzabi)

0.802

0.801

Improving Developer Experience with Gradle Build Scans

0.810

0.806

Improving Developer Experience with Gradle Build Scans (Iury Souza)

0.816

0.823

Crash Course in building your First Gradle Plugin

0.821

0.827

Are there any sessions on Jetpack Compose

Material You Review

0.802

0.801

Building a component library in Compose for a large-scale banking application

0.814

0.824

Developing Apps optimized for Wear OS with Jetpack Compose

0.815

0.814

Animating content changes with Jetpack Compose

0.819

0.827

Practical Compose Navigation with a Red Siren

0.823

0.825

Compose-View Interop in Practice

0.824

0.838

Panel Discussion: Adopting Jetpack Compose @ Scale (Christina Lee)

0.829

0.842

Panel Discussion: Adopting Jetpack Compose @ Scale (Alejandro Sanchez)

0.831

0.849

Creative Coding with Compose: The Next Chapter

0.832

0.840

Panel Discussion: Adopting Jetpack Compose @ Scale (Vinay Gaba)

0.834

0.850

Figure 4: comparing vector similarity scores with the full text chunk embedding versus (old) the shorter version (new). Scores truncated to three decimal places for clarity.

The results seem to show the arbitrary cut-off of “0.8” for measuring whether a session was a good match seemed to still apply, and the actual results displayed to the user was unchanged.

Since (for these test cases at least) the chat responses in the app are identical, this improvement hasn’t affected the user experience positively or negatively (but it has reduced our embeddings API costs). Further testing on other conference queries might reveal different effects, and certainly for other use cases (such as reasoning over long documents using embedding chunks), “summarizing” the text used for embedding to better capture context that will match expected user queries could lead to better chat completions.

Code

I’ve left the code changes to the end of the post, since very few lines of code were changed! You can see in the pull request that the key updates were:

  1. Two new methods on the SessionInfo class, the first one to emit the shorter text chunk for embedding, and the larger one to emit the full text for grounding:
    fun forEmbedding () : String {
        return "Speaker role: $role\nSubject: $subject\nDescription:$description"
    }
    fun toRagString () : String {
        return """
            Speaker: $speaker
            Role: $role
            Location: $location
            Date: $date
            Time: $time
            Subject: $subject
            Description: $description""".trimIndent()
    }
    
  2. The DroidconEmbeddingsWrapper.initVectorCache uses the DroidconSessionObjects collection and the forEmbedding function to create embeddings vectors with the summarized session info:
    for (session in DroidconSessionObjects.droidconSessions) {
        val embeddingRequest = EmbeddingRequest(
            model = ModelId(Constants.OPENAI_EMBED_MODEL),
            input = listOf(session.value.forEmbedding())
        )
  3. The DroidconEmbeddingsWrapper.grounding() function uses toRagString() to use the full text in the chat prompt:
    messagePreamble += DroidconSessionObjects.droidconSessions[dpKey.value]?.toRagString()

These changes are very specific to our conference session data source. When the source data is less structured, you might consider generating an LLM completion to summarize each text chuck before generating the embedding.

Resources and feedback

We’d love your feedback on this post, including any tips or tricks you’ve learned from playing around with ChatGPT prompts.

If you have any thoughts or questions, use the feedback forum or message us on Twitter @surfaceduodev.

There will be no livestream this week, but you can check out the archives on YouTube.

The post More efficient embeddings appeared first on Surface Duo Blog.

]]>