Get Started with AI Text Recognition (OCR) in the Windows App SDK

Article
02/07/2025

Important

Available in the latest experimental channel release of the Windows App SDK.

The Windows App SDK experimental channel includes APIs and features in early stages of development. All APIs in the experimental channel are subject to extensive revisions and breaking changes and may be removed from subsequent releases at any time. Experimental features are not supported for use in production environments and apps that use them cannot be published to the Microsoft Store.

Self-contained apps are not supported.

Text recognition, also known as optical character recognition (OCR), is supported by the Windows App SDK through a set of artificial intelligence (AI)-backed APIs that can detect and extract text within images and convert it into machine readable character streams.

These APIs can identify characters, words, lines, polygonal text boundaries, and provide confidence levels for each match. They are also exclusively supported by hardware acceleration in in devices with a neural processing unit (NPU), making them faster and more accurate than the legacy Windows.Media.Ocr.OcrEngine APIs in the Windows platform SDK.

For API details, see API ref for Text Recognition (OCR) in the Windows App SDK.

Tip

Provide feedback on these APIs and their functionality by creating a new Issue in the Windows App SDK GitHub repo (include OCR in the title) or by responding to an existing issue.

Prerequisites

A Copilot+ PC from Qualcomm, Intel, or AMD.
- Arm64EC (Emulation Compatible) is not currently supported.
Windows 11 Insider Preview Build 26120.3073 (Dev and Beta Channels) or later must be installed on your device.

What can I do with the Windows App SDK and AI Text Recognition?

Use the new AI Text Recognition features in the Windows App SDK to identify and recognize text in an image. You can also get the text boundaries and confidence scores for the recognized text.

Create an ImageBuffer from a file

In this example we call a LoadImageBufferFromFileAsync function to get an ImageBuffer from an image file.

In the LoadImageBufferFromFileAsync function, we complete the following steps:

Create a StorageFile object from the specified file path.
Open a stream on the StorageFile using OpenAsync.
Create a BitmapDecoder for the stream.
Call GetSoftwareBitmapAsync on the bitmap decoder to get a SoftwareBitmap object.
Return an image buffer from CreateBufferAttachedToBitmap.

using Microsoft.Windows.Vision;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;

public async Task<ImageBuffer> LoadImageBufferFromFileAsync(string filePath)
{
    StorageFile file = await StorageFile.GetFileFromPathAsync(filePath);
    IRandomAccessStream stream = await file.OpenAsync(FileAccessMode.Read);
    BitmapDecoder decoder = await BitmapDecoder.CreateAsync(stream);
    SoftwareBitmap bitmap = await decoder.GetSoftwareBitmapAsync();

    if (bitmap == null)
    {
        return null;
    }

    return ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
}

namespace winrt
{
    using namespace Microsoft::Windows::Vision;
    using namespace Microsoft::Windows::Imaging;
    using namespace Windows::Graphics::Imaging;
    using namespace Windows::Storage;
    using namespace Windows::Storage::Streams;
}

winrt::IAsyncOperation<winrt::ImageBuffer> LoadImageBufferFromFileAsync(
    const std::wstring& filePath)
{
    auto file = co_await winrt::StorageFile::GetFileFromPathAsync(filePath);
    auto stream = co_await file.OpenAsync(winrt::FileAccessMode::Read);
    auto decoder = co_await winrt::BitmapDecoder::CreateAsync(stream);
    auto bitmap = co_await decoder.GetSoftwareBitmapAsync();
    if (bitmap == nullptr) {
        co_return nullptr;
    }
    co_return winrt::ImageBuffer::CreateBufferAttachedToBitmap(bitmap);
}

Recognize text in a bitmap image

The following example shows how to recognize some text in a SoftwareBitmap object as a single string value:

Create a TextRecognizer object through a call to the EnsureModelIsReady function, which also confirms there is a language model present on the system.
Using the bitmap obtained in the previous snippet, we call the RecognizeTextFromSoftwareBitmap function.
Call CreateBufferAttachedToBitmap on the image file to get an ImageBuffer object.
Call RecognizeTextFromImage to get the recognized text from the ImageBuffer.
Create a wstringstream object and load it with the recognized text.
Return the string.

Note

The EnsureModelIsReady function is used to check the readiness state of the text recognition model (and install it if necessary).

using Microsoft.Windows.Vision;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;

public async Task<string> RecognizeTextFromSoftwareBitmap(SoftwareBitmap bitmap)
{
    TextRecognizer textRecognizer = await EnsureModelIsReady();
    ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
    RecognizedText recognizedText = textRecognizer.RecognizeTextFromImage(imageBuffer);
    StringBuilder stringBuilder = new StringBuilder();

    foreach (var line in recognizedText.Lines)
    {
        stringBuilder.AppendLine(line.Text);
    }

    return stringBuilder.ToString();
}

public async Task<TextRecognizer> EnsureModelIsReady()
{
    if (!TextRecognizer.IsAvailable())
    {
        var loadResult = await TextRecognizer.MakeAvailableAsync();
        if (loadResult.Status != PackageDeploymentStatus.CompletedSuccess)
        {
            throw new Exception(loadResult.ExtendedError().Message);
        }
    }

    return await TextRecognizer.CreateAsync();
}

namespace winrt
{
    using namespace Microsoft::Windows::Vision;
    using namespace Microsoft::Windows::Imaging;
    using namespace Windows::Graphics::Imaging;
}

winrt::IAsyncOperation<winrt::TextRecognizer> EnsureModelIsReady();

winrt::IAsyncOperation<winrt::hstring> RecognizeTextFromSoftwareBitmap(winrt::SoftwareBitmap const& bitmap)
{
    winrt::TextRecognizer textRecognizer = co_await EnsureModelIsReady();
    winrt::ImageBuffer imageBuffer = winrt::ImageBuffer::CreateBufferAttachedToBitmap(bitmap);
    winrt::RecognizedText recognizedText = textRecognizer.RecognizeTextFromImage(imageBuffer);
    std::wstringstream stringStream;
    for (const auto& line : recognizedText.Lines())
    {
        stringStream << line.Text().c_str() << std::endl;
    }
    co_return winrt::hstring{stringStream.view()};
}

winrt::IAsyncOperation<winrt::TextRecognizer> EnsureModelIsReady()
{
  if (!winrt::TextRecognizer::IsAvailable())
  {
    auto loadResult = co_await winrt::TextRecognizer::MakeAvailableAsync();
    if (loadResult.Status() != winrt::PackageDeploymentStatus::CompletedSuccess)
    {
        throw winrt::hresult_error(loadResult.ExtendedError());
    }
  }

  co_return winrt::TextRecognizer::CreateAsync();
}

Get word bounds and confidence

Here we show how to visualize the BoundingBox of each word in a SoftwareBitmap object as a collection of color-coded polygons on a Grid element.

Note

For this example we assume a TextRecognizer object has already been created and passed in to the function.

using Microsoft.Windows.Vision;
using Microsoft.Graphics.Imaging;
using Windows.Graphics.Imaging;
using Windows.Storage;
using Windows.Storage.Streams;

public void VisualizeWordBoundariesOnGrid(
    SoftwareBitmap bitmap,
    Grid grid,
    TextRecognizer textRecognizer)
{
    ImageBuffer imageBuffer = ImageBuffer.CreateBufferAttachedToBitmap(bitmap);
    RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);

    SolidColorBrush greenBrush = new SolidColorBrush(Microsoft.UI.Colors.Green);
    SolidColorBrush yellowBrush = new SolidColorBrush(Microsoft.UI.Colors.Yellow);
    SolidColorBrush redBrush = new SolidColorBrush(Microsoft.UI.Colors.Red);

    foreach (var line in result.Lines)
    {
        foreach (var word in line.Words)
        {
            PointCollection points = new PointCollection();
            var bounds = word.BoundingBox;
            points.Add(bounds.TopLeft);
            points.Add(bounds.TopRight);
            points.Add(bounds.BottomRight);
            points.Add(bounds.BottomLeft);

            Polygon polygon = new Polygon();
            polygon.Points = points;
            polygon.StrokeThickness = 2;

            if (word.Confidence < 0.33)
            {
                polygon.Stroke = redBrush;
            }
            else if (word.Confidence < 0.67)
            {
                polygon.Stroke = yellowBrush;
            }
            else
            {
                polygon.Stroke = greenBrush;
            }

            grid.Children.Add(polygon);
        }
    }
}

namespace winrt
{
    using namespace Microsoft::Windows::Vision;
    using namespace Microsoft::Windows::Imaging;
    using namespace Micrsooft::Windows::UI::Xaml::Controls;
    using namespace Micrsooft::Windows::UI::Xaml::Media;
    using namespace Micrsooft::Windows::UI::Xaml::Shapes;
}

void VisualizeWordBoundariesOnGrid(
    winrt::SoftwareBitmap const& bitmap,
    winrt::Grid const& grid,
    winrt::TextRecognizer const& textRecognizer)
{
    winrt::ImageBuffer imageBuffer = winrt::ImageBuffer::CreateBufferAttachedToBitmap(bitmap);
    
    winrt::RecognizedText result = textRecognizer.RecognizeTextFromImage(imageBuffer);

    auto greenBrush = winrt::SolidColorBrush(winrt::Microsoft::UI::Colors::Green);
    auto yellowBrush = winrt::SolidColorBrush(winrt::Microsoft::UI::Colors::Yellow);
    auto redBrush = winrt::SolidColorBrush(winrt::Microsoft::UI::Colors::Red);
    
    for (const auto& line : recognizedText.Lines())
    {
        for (const auto& word : line.Words())
        {
            winrt::PointCollection points;
            const auto& bounds = word.BoundingBox();
            points.Append(bounds.TopLeft);
            points.Append(bounds.TopRight);
            points.Append(bounds.BottomRight);
            points.Append(bounds.BottomLeft);

            winrt::Polygon polygon;
            polygon.Points(points);
            polygon.StrokeThickness(2);

            if (word.Confidence() < 0.33)
            {
                polygon.Stroke(redBrush);
            }
            else if (word.Confidence() < 0.67)
            {
                polygon.Stroke(yellowBrush);
            }
            else
            {
                polygon.Stroke(greenBrush);
            }

            grid.Children().Add(polygon);
        }
    }
}

Additional resources

Access files and folders with Windows App SDK and WinRT APIs

Share via

Get Started with AI Text Recognition (OCR) in the Windows App SDK

Prerequisites

What can I do with the Windows App SDK and AI Text Recognition?

Create an ImageBuffer from a file

Recognize text in a bitmap image

Get word bounds and confidence

Additional resources

Feedback

Additional resources

Share via

Get Started with AI Text Recognition (OCR) in the Windows App SDK

Prerequisites

What can I do with the Windows App SDK and AI Text Recognition?

Create an ImageBuffer from a file

Recognize text in a bitmap image

Get word bounds and confidence

Additional resources

Related content

Feedback

Additional resources