TRIAL＆RetailAIAdvent Calendar 2024

Google Vision AIでレシートスキャンに挑戦！Kotlin × Quarkusでアプリを作ってみた

Posted at 2024-12-18

はじめに

TRIAL＆RetailAI Advent Calendar 2024の19日目の記事です。
昨日は@t-hiroyukiさんの「Lookerを使用したバグと対策の管理方法」という記事でした。
件数が多いJiraのバグチケットの管理が楽になる、とっても効率的でいい方法だなと思いました。私も活用してみたいです！

・・・

この記事では Google Cloud Vision AI を使用して、レシートの店舗名・金額・購入日を読み取り、Firestoreに保存する方法についてご紹介します。

このアプリを作った理由

わたしは独学でこの会社に今年4月中途入社しました。振り返ると、入社してからは日々の業務に追われる中で、直近では主にテスト業務を担当していました。
そんな中、
•「何かを作って動かす楽しさ」を思い出したい
•よく行くお店で自分が一年間どのくらい浪費したのか計算したい（特にお菓子・デザート・お酒など..w）

という気持ちから、特定の店舗のレシートを読み取ることを目的としたアプリを作ってみることにしました。

技術的に未熟な部分もあるかと思いますが、温かい目で読んでいただけると嬉しいです。

使用技術

今回は以下の構成でやってみました。

Google Cloud Vision AI: 画像からテキストを抽出するOCR機能を提供（バージョン: 3.35.0）
Firestore: レシートのデータを保存するNoSQLデータベース（バージョン: 3.20.2）
Quarkus: 軽量で高速なJavaフレームワーク（バージョン: 3.17.3）
Kotlin: シンプルかつモダンなプログラミング言語（バージョン: 2.0.21）
ローカル環境での実行（後でCloud Runにデプロイする予定...）

手順

1. Google Cloud Vision AIのセットアップ

GCPプロジェクトを作成
Vision AI APIを有効化
サービスアカウントを作成し、認証キーをダウンロード

この認証キーは、後ほどローカル環境でGoogle Cloud APIを利用する際に使用します。

2. Quarkusプロジェクトの作成

Vision AIを利用する準備
Gradleプロジェクトに google-cloud-vision を追加します。

build.gradle.kts

dependencies {
    implementation("com.google.cloud:google-cloud-vision:3.35.0")
}

ReceiptDataExtractorのコード
以下のコードは、送信された画像ファイルをGoogle Cloud Vision APIを使用してOCR解析し、レシートに記載されたテキスト情報を抽出するサービスです。このサービスは、画像内のテキストを検出し、さらに検出したテキストから領収書に関連するデータ（店舗名、合計金額、アイテム、日付）を抽出する機能を提供します。

package org.acme.domain.service

import com.google.cloud.vision.v1.AnnotateImageRequest
import com.google.cloud.vision.v1.Feature
import com.google.cloud.vision.v1.Image
import com.google.cloud.vision.v1.ImageAnnotatorClient
import com.google.protobuf.ByteString
import io.quarkus.logging.Log
import jakarta.enterprise.context.ApplicationScoped
import java.awt.image.BufferedImage
import java.io.ByteArrayInputStream
import java.io.ByteArrayOutputStream
import java.io.InputStream
import javax.imageio.ImageIO

@ApplicationScoped
class ReceiptDataExtractor {

    private val client = ImageAnnotatorClient.create()

    // OCR解析でテキストを抽出する関数
    fun extractTextFromImage(imageStream: InputStream): String {
        try {
            val originalImageBytes = ByteString.readFrom(imageStream)
            if (originalImageBytes.isEmpty) {
                throw RuntimeException("Failed to read the image data.")
            }
            Log.info("Original image size: ${originalImageBytes.size()} bytes")

            val resizedImageBytes = resizeImage(originalImageBytes)
            Log.info("Resized image size: ${resizedImageBytes.size()} bytes")

            val image = Image.newBuilder().setContent(resizedImageBytes).build()
            val feature = Feature.newBuilder().setType(Feature.Type.DOCUMENT_TEXT_DETECTION).build()

            val request = AnnotateImageRequest.newBuilder()
                .addFeatures(feature)
                .setImage(image)
                .build()

            val response = client.batchAnnotateImages(listOf(request))
            val textAnnotation = response.responsesList[0].fullTextAnnotation
            return textAnnotation.text
        } catch (e: Exception) {
            Log.error("Error during text extraction: ${e.message}")
            throw RuntimeException("Failed to extract text from image: ${e.message}")
        }
    }

    // 画像のリサイズ処理
    private fun resizeImage(originalImageBytes: ByteString): ByteString {
        try {
            val image = ImageIO.read(ByteArrayInputStream(originalImageBytes.toByteArray()))
                ?: throw RuntimeException("Failed to decode image.")

            val maxDimension = 1024
            val scale = maxOf(image.width, image.height) / maxDimension.toFloat()
            val newWidth = (image.width / scale).toInt()
            val newHeight = (image.height / scale).toInt()

            val resizedImage = BufferedImage(newWidth, newHeight, BufferedImage.TYPE_INT_ARGB)
            val graphics = resizedImage.createGraphics()
            graphics.drawImage(image, 0, 0, newWidth, newHeight, null)
            graphics.dispose()

            val byteArrayOutputStream = ByteArrayOutputStream()
            ImageIO.write(resizedImage, "PNG", byteArrayOutputStream)
            return ByteString.copyFrom(byteArrayOutputStream.toByteArray())
        } catch (e: Exception) {
            Log.error("Error resizing image: ${e.message}")
            throw RuntimeException("Failed to resize image: ${e.message}")
        }
    }

    // 領収書から店舗名、金額、アイテム、日付を抽出する関数
    fun extractReceiptData(text: String): Map<String, Any> {
        val storeName = Regex("""いせやフーズクラブ""").find(text)?.value ?: "Unknown Store"
        val totalPrice = extractTotalPrice(text) ?: 0
        val items = extractItems(text)
        val date = extractDate(text)?.let { convertToStandardDateFormat(it) } ?: "Unknown Date"

        return mapOf(
            "StoreName" to storeName,
            "TotalPrice" to totalPrice,
            "Items" to items,
            "Date" to date
        )
    }

    // 合計金額を抽出する
    private fun extractTotalPrice(text: String): Int? {
        val priceRegex = Regex("合計\\s*¥?\\s*(\\d{1,3}(?:[ ,]\\d{3})*)")
        val matchResult = priceRegex.find(text)

        if (matchResult != null) {
            Log.info("Total price found: ${matchResult.groups[1]?.value}")
        } else {
            Log.info("No total price found.")
        }

        val rawPrice = matchResult?.groups?.get(1)?.value?.replace("[ ,]".toRegex(), "")
        Log.info("Extracted raw price string: $rawPrice")

        return rawPrice?.toIntOrNull()
    }

    // 日付を抽出する
    private fun extractDate(text: String): String? {
        val dateRegex = Regex("\\d{4}年\\d{2}月\\d{2}日")
        return dateRegex.find(text)?.value?.trim()
    }

    private fun convertToStandardDateFormat(date: String): String {
        val regex = Regex("(\\d{4})年(\\d{2})月(\\d{2})日")
        return regex.replace(date) { matchResult ->
            "${matchResult.groupValues[1]}-${matchResult.groupValues[2]}-${matchResult.groupValues[3]}"
        }
    }

    // 商品情報を抽出する
    private fun extractItems(text: String): List<Map<String, Any>> {
        val itemRegex = Regex("""(\d{6})\s+([^\d¥]+(?:\s+[^\d¥]+)*?)\s*(¥[\d,]+(?:\.\d{1,2})?)\s*(X\d+点|3P|)?""")
        val items = mutableListOf<Map<String, Any>>()

        val matches = itemRegex.findAll(text)

        for (match in matches) {
            val itemCode = match.groupValues[1].trim()
            val itemName = match.groupValues[2].trim()
            val price = match.groupValues[3].trim()
            val quantity = match.groupValues[4].trim()

            val itemInfo = mutableMapOf<String, Any>(
                "ItemCode" to itemCode,
                "ItemName" to itemName,
                "Price" to price
            )

            if (quantity.isNotEmpty()) {
                itemInfo["Quantity"] = quantity
            }

            items.add(itemInfo)
        }

        if (items.isEmpty()) {
            println("No items found in the text.")
        }

        return items
    }

    // テキストの正規化
    fun normalizeText(text: String): String {
        return text
            .replace(Regex("\\s+"), " ")
            .replace(Regex("¥\\s*"), "¥")
            .replace(Regex("X\\s*"), "X")
            .replace(",", "")
            .replace("点 ", "点:")
            .trim()
    }
}

3. Firestoreにデータを保存する

Vision APIを使用して抽出したデータをFirestoreに保存します。

Firestoreを使うには、以下の依存関係を追加します。

build.gradle.kts

dependencies {
    implementation("com.google.cloud:google-cloud-firestore:3.20.2")
}

Firestore用のリポジトリを作成します。

ReceiptRepositoryのコード
以下は、Firestoreにレシート情報を保存するリポジトリの例です。

@ApplicationScoped
class FirestoreReceiptRepository : ReceiptRepository {

    private val firestore: Firestore = FirestoreOptions.getDefaultInstance().service
    private val receiptCollection = firestore.collection("receipts")

    override fun save(receipt: Receipt) {
        val receiptData = mapOf(
            "storeName" to receipt.storeName,
            "totalPrice" to receipt.totalPrice,
            "date" to receipt.date,
            "items" to receipt.items
        )

        val apiFuture: ApiFuture<DocumentReference> = receiptCollection.add(receiptData)
        apiFuture.get()
    }
}

サンプルデータ（Firestoreに保存される形式）

{
  "storeName": "Sample Store",
  "totalPrice": 1234,
  "date": "2024-12-16",
  "items": [
    {"id": "000123", "name": "Item A", "price": 123},
    {"id": "000456", "name": "Item B", "price": 456}
  ]
}

4. ローカル環境での実行

簡単なエンドポイントを作成し、画像をPOSTリクエストで受け取って処理します。

ReceiptScanResource.kt

@Path("/scan")
class ReceiptScanResource {

    @Inject
    lateinit var receiptService: ReceiptService

    @POST
    @Consumes("multipart/form-data")
    @Produces(MediaType.APPLICATION_JSON)
    fun scanReceipt(@MultipartForm form: ReceiptUploadForm): Response {
        val file: InputStream = form.image
        return try {
            val imageBytes = file.readBytes()
            Log.info("Image size: ${imageBytes.size} bytes")

            receiptService.processReceipt(ByteArrayInputStream(imageBytes))

            Response.ok("Receipt processed successfully").build()
        } catch (e: IllegalArgumentException) {
            Log.error("Invalid data: ${e.message}")
            createBadRequestResponse("Invalid data: ${e.message}")
        } catch (e: Exception) {
            Log.error("Error during receipt scanning: ${e.message}")
            Response.status(Response.Status.INTERNAL_SERVER_ERROR)
                .entity("Failed to process image: ${e.message}")
                .build()
        }
    }

    private fun createBadRequestResponse(message: String): Response {
        return Response.status(Response.Status.BAD_REQUEST)
            .entity(message)
            .build()
    }
}

使用レシート：

いざ実行！

curl -X POST \
  -F "image=@file/to/path/demoreceipt.png" \
  http://localhost:8080/scan

補足説明
• -X POST：HTTP POSTリクエストを送信
• -F "image=..."：multipart/form-data形式で画像を送信

結果：

詰まった部分

いくつかの課題に直面しました。それらへの対応や未解決の部分があるので、ご紹介します。

1. 商品リストの抽出の精度

レシートから商品名と金額を正確に抽出するのは難しく、特に複数の商品が一行にまとめて表示されることがありました。
抽出結果としては、商品名と価格が一行にまとまることがあり、例えば「みたらし&塩あん団子」と「もめんとうふ」が一行に表示されることがありました。
（例：000453 みたらし&塩あん団子 000301 もめんとうふ 3P ¥85）

この問題には、文字列分割や正規表現を活用して対処しましたが、完全に解決するのは難しいと感じました。

⚫︎抽出したかった文字

000453 みたらし&塩あん団子 ¥200
000301 もめんとうふ 3P ¥85
000202 ミズナ ¥148
000205 ダイズサラダ ¥128
000351 ニコニコたまご ¥198
000119 精肉 2P 500円 ¥500
000702 ポリラップ 50m ¥148

⚫︎抽出できた文字（3商品のみ）

Extracted Items = [000202 ミズナ ¥148, 000205 ダイズサラダ ¥128, 000351 ニコニコたまご ¥198]

2. GCPのサービスアカウントのロール設定

サービスアカウントに適切なロール（roles/visionai.userやroles/datastore.user）を割り当てる必要がありました。
最初に権限エラーが発生し、ロール設定の重要性を再確認しました。

3. iPhoneで撮影したレシート画像が大きすぎる問題

iPhoneで撮影した高解像度の画像をそのままアップロードすると、サイズが大きすぎてエラーになることがありました。この問題に対処するため、サーバー側で画像を自動的に圧縮する処理を追加しました。

しかし、現状では圧縮処理を施した後でも、一部の画像ではアップロードが正常に完了しないケースが確認されています。実際に、撮影した画像を一度スクリーンショットで撮り直し、その縮小された画像をアップロードする必要がある場合がありました。

このことから、現在の画像処理の仕組みにはさらなるサーバー側の改善が必要であると考えています。特に、高解像度画像を確実に処理できるようにするためのリソース管理やフォーマット対応の見直しが課題です。

4. 特定のレシート形式に依存

Vision AIを使用した際、レシートのフォーマットが店舗ごとに異なるため、すべてのレシートで正確に情報を抽出するのは難しいと感じました。今回は特定のスーパーのレシートで動作を確認しています。

まとめ

今回のアプリ作成を通じて、Google Cloud Vision AIを活用してレシートから得られる情報を簡単にデータ化し、管理する方法を学びました。
今後は商品リストの抽出精度を向上させることを目指し、さまざまなレシートフォーマットに対応していきたいです。

※本プロジェクトのGitHubリポジトリはこちらからご覧いただけます。

さいごに

以上、TRIAL＆RetailAI Advent Calendar 2024 19日目「QuarkusとKotlinでやってみた！Google Cloud Vision AIでレシート金額を読み取る方法」でした。
明日は@Ryu_Itoさんの『Helmを「完全に理解した」』という記事です。完全理解した、って言えるのかっこいいですね！お楽しみに！

RetailAIとTRIALではエンジニアを募集しています。
興味がある方はご連絡ください！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up