Build 2024 ã§çºè¡¨ããã Cosmos DB ã®ã¢ãããã¼ãã®ä¸ã§ã注ç®åº¦ã®é«ã Vector Search (Preview) ã«ã¤ãã¦ãå®éã«è©¦ãã¦æ·±å ãã¦ããã¾ãããããã¾ã§ Azure ã§ Vector Search ãå®è¡ããã«ã¯ã³ã¹ããé«ãå²ã« SLA ã®ä½ã AI Search ã使ãå¿ è¦ãããã¾ããããCosmos DB for NoSQL ã Vector Search ã«å¯¾å¿ãããã¨ã§æ å¢ã大ããå¤ãããã¨ãã¦ãã¾ãã
åºæ¬ã¨ãªããã¼ã¿ã¹ãã¢ã§ãã Cosmos DB ã Vector Search ã«å¯¾å¿ãããã¨ã§ã追å ã®ã¤ã³ããã¯ã¹ã¨ãã¦ã® AI Search ã使ãå¿ è¦ããªããªããããVector Search ã®å®è¡çµæã¨ã㦠Cosmos DB ã«ä¿åããã¦ããå ¨ã¦ã®ãã¼ã¿ãåå¾ã§ããã¨ããã®ã¯å¤§ããªã¡ãªããã§ãããã¡ãã Cosmos DB ã®å ¨ã¦ã®æ©è½ãå©ç¨ã§ãããããã¹ã±ã¼ã©ããªãã£ã¨ããã©ã¼ãã³ã¹ã¯ä¿è¨¼ããã¦ããã®ã§å®å¿ãªä¸ã«ãé常ã«å°ããªã³ã¹ãã§å®ç¾åºæ¥ãããã«ãªã£ã¦ãã¾ãã
ç¹ã«å¤§è¦æ¨¡ãªã·ããªãªã«å¯¾å¿ããããã« DiskANN ãã¼ã¹ã® Vector Index ããµãã¼ãããã¦ãããããããã¾ã§ AI Search ã§ã¯é常ã«é«ã³ã¹ãã«ãªã£ã¦ãã¾ã£ã¦ããã·ããªãªã«ãé£ãªã対å¿ã§ãã¾ããDiskANN ã«ã¤ãã¦ã¯å ¬å¼ããã°ã Microsoft Research ã§ãç´¹ä»ããã¦ããã®ã§ãèå³ãããæ¹ã¯ãã¡ããåç §ãã¦ãã ããã
åºæ¬ç㪠Vector Search ã®ä½¿ãæ¹ã«ã¤ãã¦ã¯ã以ä¸ã®å ¬å¼ããã¥ã¡ã³ãã«å ¨ã¦ã¾ã¨ã¾ã£ã¦ããã®ã§ãããèªãã°åããã®ãç¨æã§ãã¾ããã¬ãã«ã®é«ãæ¥æ¬èªã®ããã¥ã¡ã³ããæåããç¨æããã¦ããã®ã§ããã¤ãéãå®å®ã® Cosmos DB ãã¼ã ã¨ããæãã§ãã
Cosmos DB for NoSQL ã® Vector Search ã§æ°ã«ãªã£ã¦ããç¹ã RU ã®æ¶è²»å¨ãã«ãªãã®ã§ã主ã«ãã®ãããã«ã¤ãã¦ç¢ºèªãã¦ããã¾ããè¨èªã¯ C# ã®ã¿ã§ç¢ºèªãã¦ãã¾ãããä»ã®è¨èªã§ãã»ã¼åãã«ãªãã¯ãã§ãã
ãµã³ãã«ã¨ã㦠Cosmos DB ã«æå ¥ãããã¼ã¿ã¯ãã®ããã°ã®å ¨ã¢ã¼ã«ã¤ãã¨ãªãã¾ãã
æ°ãã Cosmos DB ã¢ã«ã¦ã³ãã使ãã
ã¾ã㯠Vector Search ãæå¹åããã¢ã«ã¦ã³ããç¨æããããã§ãããç¾ç¶ã§ã¯ Vector Search ãæå¹åããå ´åã«ã¯ãæ°è¦ã« Cosmos DB ã¢ã«ã¦ã³ãã使ããã®ãç¡é£ã§ããä»ã® Features ãæå¹åãã¦ãã㨠Vector Search ã®æå¹åãåºæ¥ãªãã®ã§ãã¯ãªã¼ã³ãªç¶æ ã§æ§ç¯ããæ¹ãçµæã¨ãã¦æ©ãã§ãã
æå¹åãçµããã° Vector Search ã«å¿ è¦ãªè¨å®ãè¡ã£ã Container ã使ã§ããããã«ãªãã¾ãã
ã¡ãªã¿ã«ãã®è¨å®ã¯ Content Vector Policy 㨠Vector Index ã®è¨å®ãè¡ãããã®æ©è½æå¹åã§ãSQL ã使ã£ã Vector Search èªä½ã®å®è¡ã¨ã¯é¢ä¿ãªãããã§ãã
Vector Policy ä»ãã® Container ã使ãã
Vector Search ãæå¹åããå¾ã¯ Azure Portal ããæ°ãã Container ã使ãã¦ããã¾ããç¾ç¶ã® Vector Search ã®å¶ç´ã¨ãã¦ãã¼ã¿ãã¼ã¹å ±æã¹ã«ã¼ãããã¯ä½¿ããªãããã使æã« "Share throughput across container" ã®ãã§ãã¯ãå¤ã㨠Container Vector Policy ã®è¨å®ã表示ããã¾ãã
ãã® Container Vector Policy ã§ç¹å®ããããã£ã«å¯¾ãã¦æ¬¡å æ°ããã¼ã¿åãªã©ãè¨å®ãããã¨ã§ãå¾è¿°ãã Vector Index ã®ä½æãå¯è½ã¨ãªãã¾ãã
Container Vector Policy ãå®ç¾©ãã㨠Vector Index ã®ä½æãå¯è½ã«ãªãã¾ãããå®ã¯ Cosmos DB ã® Vector Search ã§ã¯ Vector Index ã®ä½æã¯å¿ é ã§ã¯ããã¾ãããå½ç¶ãªããå¹çã¯æªããªãã¾ãã Vector Index ç¡ãã§ãåä½ãã¾ãããä½ãªã Container Vector Policy ç¡ãã§ãåä½ãã¾ãã®ã§ãVector Index ã使ããããã«å¿ è¦ãªæ å ±ã Container Vector Policy ã¨èãã¦ããã£ã¦åé¡ããã¾ããã
Vector Index ã®è¨å®ã«ã¤ãã¦ã¯å¯¾è±¡ã¨ãªã Content Vector Policy ãå®ç¾©ããããããã£ã¨ãåè¿°ãã DiskANN ãªã©ã®ç¨®é¡ãæå®ããå½¢ã«ãªãã¾ããVector Index ã®ç¨®é¡ã«ã¤ãã¦ã¯å ¬å¼ããã¥ã¡ã³ããåç §ãã¦ãã ããã
ç¾ç¶ã§ã¯ DiskANN ã¯ãã©ã¼ã ããã®ç³è«ãå¿
è¦ã§ãèªåã®ææã¡ã®ã¢ã«ã¦ã³ãã§ã¯æå¹åããã¦ããªããã試ãã¦ãã¾ããã quantizedFlat
㯠DiskANN ã¨åãéååã¢ã«ã´ãªãºã ã使ã£ã flat
㪠Vector Index ãªã®ã§ãDiskANN ã¨æ¯ã¹ã¦æ¥µç«¯ã«æ§è½ãæªãã¨ãããã¨ã¯ããã¾ãããå¤§è¦æ¨¡ãªå ´åã«ã¯ DiskANN ãå§åçã«æå©ã«ãªãã¨ãããã¨ã®ããã§ãã
ä»åã¯ä»¥ä¸ã®ããã«ã·ã³ãã«ãª Content Vector Policy 㨠Vector Index ãå®ç¾©ãã¾ãããVector ã«é¢é£ããè¨å®ã¯ Container ã®ä½ææã«ããæå®ã§ããªãã®ã§ãè¨å®ãå¿ãã¦ãã¾ã£ãå ´åã«ã¯ Container ãä½ãç´ãå¿ è¦ãããã®ã§æ³¨æãå¿ è¦ã§ãã
ãã使ããã¦ãã OpenAI ã® text-embedding-ada-002
ã使ãã¨ãã®è¨å®ã«ãªã£ã¦ããã®ã§ãä»ã® Embedding åãã®ã¢ãã«ã使ãéã«ã¯æ¬¡å
æ°ããã¼ã¿åã調æ´ãã¦ãã ããã
ããã§ Vector Index ãæã£ã Container ã使ã§ããã®ã§ãå¾ã¯ãã¼ã¿ãæå ¥ãã¦ã¯ã¨ãªãæããã ãã§ãã
Embedding ãå®è¡ã Cosmos DB ã«ä¿åãã
Container Vector Policy 㨠Vector Index ãæã¤ Container ã使ãã¦ãã¾ãã°ãå¾ã¯ããã¾ã§ã® Cosmos DB ã¨åãä½¿ãæ¹ã§ Embedding çµæã® Vector ãæå ¥ããã ãã§ããã¾ã㯠Azure OpenAI Service 㨠Cosmos DB ãå©ç¨ããããã«å¿ è¦ãªåæåã³ã¼ãããç´¹ä»ãã¦ããã¾ãã
以ä¸ã®éã Cosmos DB ã«é¢ä¿ããé¨åã¯ããã¾ã§ã¨å ¨ãå¤ãããªããã¨ãåããã¯ãã§ãã
using Azure.AI.OpenAI; using Microsoft.Azure.Cosmos; var openAIClient = new OpenAIClient(new Uri("https://***.openai.azure.com/"), new Azure.AzureKeyCredential("<API_KEY>")); var connectionString = "<CONNECTION_STRING>"; var cosmosClient = new CosmosClient(connectionString, new CosmosClientOptions { SerializerOptions = new CosmosSerializationOptions { PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase } }); var container = cosmosClient.GetContainer("my-database", "vector-sample");
ä»åãµã³ãã«ã¨ã㦠Cosmos DB ã«æå
¥ãããã¼ã¿ã¯ããã®ããã°ã®ã¢ã¼ã«ã¤ãã¨ãªãããããã¼ã¿åã¯ä»¥ä¸ã®ããã«ã·ã³ãã«ãªã¯ã©ã¹ã§è¡¨ç¾ãã¦ãã¾ããããã¦ååã®éã ContentVector
ã Embedding çµæã® Vector ãä¿åãã float
ã®é
åã«ãªã£ã¦ãã¾ãã
public class BlogEntry { public string Id { get; set; } public string Title { get; set; } public string Content { get; set; } public float[] ContentVector { get; set; } }
ããã°ã®ãã¼ã¿ã¯é©å½ã«ã¨ã¯ã¹ãã¼ããããã®ãä¸è¨ã®åã«è©°ããªããããã®ãç¨æããã®ã§ãå¾ã¯ä»¥ä¸ã®ã³ã¼ãã®ããã« GetEmbeddingsAsync
ãå¼ã³åºã㦠Vector ãçæã㦠Cosmos DB ã«è¿½å ããã ãã§ãã
å°å³ã« Embedding
ããããã£ã ReadOnlyMemory<float>
ã¨ãã¦å®ç¾©ããã¦ããã®ã§ ToArray
ãå¿
è¦ã«ãªãã®ãå«ã§ãããCosmos DB SDK ã ReadOnlyMemory<T>
ãä¸æãæ±ã£ã¦ãããªãã®ã§ä»æ¹ããã¾ããã*1
Embeddings embeddings = await openAIClient.GetEmbeddingsAsync(new EmbeddingsOptions { DeploymentName = "text-embedding-ada-002", Input = { blogEntry.Content } }); blogEntry.ContentVector = embeddings.Data[0].Embedding.ToArray(); var response = await container.UpsertItemAsync(blogEntry, new PartitionKey(blogEntry.Id)); Console.WriteLine($"{response.RequestCharge} RUs");
ãã®ã³ã¼ããå®è¡ãã㨠Cosmos DB ã« 1536 次å
ã® float
é
åãæå
¥ããã¾ããããªãã¨æ¸ãè¾¼ã¿ã«å¿
è¦ãª RU ã 279.62 RU ã¨é常ã«é«ããªã£ã¦ãã¾ãã¾ãããJSON ã®ãµã¤ãºã¨ãã¦ã¯å¤ãè¦ç©ãã£ã¦ã 2KB ç¨ãªã®ã§çè«ä¸ã¯ 10 RU åå¾ã§æ¸ãè¾¼ããã¯ãã§ãã
æ¸ãè¾¼ã¿ã« 300 RU è¿ãããã£ã¦ãã¾ã㨠Cosmos DB ã§ Vector Search ã使ããã¨è¨ã£ã¦ããä½ã³ã¹ãã§ã®éç¨ãé常ã«é£ãããªãã®ã§å¤§å¹ ã«ä¸ããå¿ è¦ãåºã¦ãã¾ãã
ããã¾ã§ã® Cosmos DB ãéç¨ãã¦ããçµé¨ãããJSON ã®ãµã¤ãºãå°ããå²ã«æ¸ã込㿠RU ããããåå ããããã©ã«ãã® Index ãå
¨ããããã£ã«å¯¾ãã¦ä½æããããã¨ã«ããã¨åãã£ã¦ããã®ã§ã以ä¸ã®ããã« Index Policy ãä¿®æ£ã contentVector
以ä¸ãå
¨ã¦é¤å¤ããããã«è¨å®ãã¾ãã
Index Policy ãã contentVector
ãé¤å¤ãã¦ã Vector Index ã«ã¯å½±é¿ããªããããããã§ç¡é§ãªããããã£ã«å¯¾ã㦠Index ãæ§ç¯ããå¿
è¦ããªããªããã RU ãä¸ããã¯ãã§ãã
{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*" } ], "excludedPaths": [ { "path": "/contentVector/*" }, { "path": "/\"_etag\"/?" } ], "vectorIndexes": [ { "path": "/contentVector", "type": "quantizedFlat" } ] }
Container ã® Index Policy ãä¿®æ£ããå¾ã«ãå度åãã³ã¼ããå®è¡ãã¦ã¿ãã¨ä»åº¦ã¯ 11.05 RU ã¨åçã«æ¸ãè¾¼ã¿ã³ã¹ããä¸ãããã¨ã確èªãã¾ããã
ããã¯æ¦ãäºæ³ãããéãã®æ¸ãè¾¼ã¿ã³ã¹ããªã®ã§ãåå許容ã§ããç¯å²ã ã¨èãã¦ãã¾ãã
ããã¥ã¡ã³ãã«ã¯ Index Policy ã®ãã¥ã¼ãã³ã°ã«ã¤ãã¦ã¯è¨è¼ããã¦ããªãã®ã§ã念ã®ãã Cosmos DB ãã¼ã ã«åé¡ãç¡ãã®ã確èªãåã£ã¦ããæä¸ã§ããçè«ä¸ã¯åé¡ãªãã¯ãã§ãããæ¸ãè¾¼ã¿ã³ã¹ãã 1/30 è¿ãåæ¸åºæ¥ãã®ã¯å¤§ããã®ã§æåããé©ç¨ãã¦ããããé¨åã§ãã
SQL ã使ã£ã¦ Vector Search ãå®è¡ãã
AI Search ã®å ´å㯠Vector Search ãå®è¡ããã«ã¯å°ç¨ã®ãã£ã¼ã«ãã使ãå½¢ã§ããããCosmos DB ã§ã¯ SQL ããã®ã¾ã¾ä½¿ãã®ã§ WHERE ã使ã£ããã£ã«ã¿å«ãã·ã¼ã ã¬ã¹ã«æ±ããã¨ãåºæ¥ã¾ãã
Cosmos DB ã® Vector Search å®è£
ã®ã³ã¢ã¨ãªãé¨å㯠VectorDistance
颿°ã¨ãªãã¾ãããã®é¢æ°ã SQL ã® ORDER BY ã§å©ç¨ããäºã§ Vector Search ãå®ç¾ããã¾ãã
AI Search ã«æ¯ã¹ãã¨ããªããã£ããªæ©è½ãæä¾ããã¦ããã®ã§ãä¾ãã° Vector Search çµæã®ã¹ã³ã¢ã«ã¤ãã¦ã¯èªåã§ SELECT æã§æ¸ãå¿
è¦ããããªã©å°ãæéã¯ããã¾ããå
¬å¼ããã¥ã¡ã³ãã«ãããããã« LINQ ã«ã¯ VectorDistance
çãªã¡ã½ãããç¨æããã¦ããªããããSQL ã§æ¸ããªãã¨ãããªãã®ãæéã§ãã
ã¨ã¯ãããVector Search ã®å©ç¨ãã¿ã¼ã³ã¯ã¹ãã¼ãã«å¤ãããã§ããªããããããã¾ã§ SQL ãæ¸ãæéã¯çºçããªãã¨èãã¦ãã¾ããå°æ¥çã« LINQ ã§ãæ¸ããããã«ãªãã¨å¬ããã¨ãã£ãã¬ãã«ã®è©±ã§ãã
å®éã« Vector Search ã使ã£ã¦ãã®ããã°ã®è¨äºã«å¯¾ããé¡ä¼¼æ¤ç´¢ãè¡ã£ã¦ããããã§ãããSQL ã§å®è¡ãã VectorDistance
颿°ã®çµæãé¡ä¼¼æ¤ç´¢ã®ã¹ã³ã¢ã¨ãã¦è¿ãããã®ã§ã以ä¸ã®ããã« Score
ããããã£ã追å ããã¯ã©ã¹ãç¨æãã¾ããã
public class BlogEntryWithScore : BlogEntry { public float Score { get; set; } }
å¾ã¯å
¥åæååã text-embedding-ada-002
ã§ Embedding ããçµæã Cosmos DB ã®ã¯ã¨ãªã«ãã©ã¡ã¼ã¿ã¨ãã¦æ¸¡ãã¦ãããã¾ã§éãã® SQL å®è¡ãè¡ãã° Vector Search ã¨ãã¦åä½ãã¾ãã以ä¸ã®ãµã³ãã«ã§ã¯ 10 ä»¶ã ãåå¾ããããã«ãã¦ãã¾ãã
Embeddings result = await openAIClient.GetEmbeddingsAsync(new EmbeddingsOptions { DeploymentName = "text-embedding-ada-002", Input = { "App Service ã®ããã©ã¼ãã³ã¹ã¨ã³ã¹ãã®æé©åæ¹æ³" } }); var query = new QueryDefinition("SELECT TOP 10 c.title, VectorDistance(c.contentVector, @embedding) AS score FROM c ORDER BY VectorDistance(c.contentVector, @embedding)") .WithParameter("@embedding", result.Data[0].Embedding.ToArray()); var iterator = container.GetItemQueryIterator<BlogEntryWithScore>(query); while (iterator.HasMoreResults) { var blogEntries = await iterator.ReadNextAsync(); if (blogEntries.Count == 0) { break; } foreach (var blogEntry in blogEntries) { Console.WriteLine($"{blogEntry.Score}: {blogEntry.Title}"); } Console.WriteLine($"{blogEntries.RequestCharge} RUs"); }
å®éã«ãã®ã³ã¼ããå®è¡ããã¨ãããã£ã½ãçµæãè¿ã£ã¦ãã¦ãããã¨ãåããã¾ãã主ã«ããã©ã¼ãã³ã¹é¨åãéè¦ããã¦ããããã«è¦ãã¾ãããApp Service ã«é¢ä¿ããå 容ãåå¾åºæ¥ã¦ãã¾ãã
ã¯ã¨ãªã«ããã£ãã³ã¹ãã 10.3 RU ã¨é常ã«ä½ãåªç§ãªçµæã¨ãªãã¾ããããããããã®ã³ã¹ãã§ããã°ã«ã¸ã¥ã¢ã«ã«å®è¡ãã¦ãä½ãåé¡ãªãã¬ãã«ã«åã¾ã£ã¦ãã¾ãã
追å ã§ Vector Index ã®å¹æã確èªããããã«ãVector Index ãå®ç¾©ãã¦ããªãå¥ã® Container ã使ãã¦å®è¡ãã¦ã¿ãã¨ãããã¯ã¨ãªã®ã³ã¹ãã 103.63 RU 㨠10 åã«ãå¢ãã¦ãã¾ãã¾ãããCosmos DB ã® Vector Search 㯠Vector Index ãç¡ãã¦ãåä½ãã¾ãããã³ã¹ããé常ã«é«ããªãã¨ããã®ã¯æ³¨æç¹ã§ãã
ç¨æã§ãããã¼ã¿ã§ã¯ç©çãã¼ãã£ã·ã§ã³ãè¤æ°ä½ãã«ã¯è¶³ããªãã£ãã®ã§ãå¤§è¦æ¨¡ãªãã¼ã¿ã§ã® Vector Search ã«ã¤ãã¦ã¯æ¤è¨¼ãè¶³ãã¦ãã¾ããããBuild ã§å®éã«éçºãã¼ã ã«ç¢ºèªãåã£ãéãã§ã¯ãã¼ãã£ã·ã§ã³ãã¼ã®æå®ã¯ Vector Search ã§ãéè¦ã¨ãªã£ã¦ãã¦ãã¯ã¨ãªã³ã¹ããä¸ããã®ã«å½¹ç«ã¤ã¨ãã話ã§ããã
ä»åã®æ¤è¨¼ã¯å ¨ã¦ä¸é 1000 RU ã®ãªã¼ãã¹ã±ã¼ã«ã§è¡ã£ã¦ãã¾ãããIndex Policy ãæé©åããå¾ã§ã¯æ¶è²»ããã RU ã¯å¸¸ã« 100 RU ã¨ãªã£ã¦ãããããä¿¡é ¼æ§ã®é«ããã¼ã¿ã¹ãã¢ã§ã® Vector Search ãæ 1500 åç¨åº¦ã§éå§åºæ¥ã¾ããCosmos DB for NoSQL ã® Vector Search ã使ããã¨ã§ AI Search ã«æ¯ã¹ãã¨åçãªã³ã¹ããã¦ã³ãå®ç¾åºæ¥ããã§ãã
*1:JSON ã·ãªã¢ã©ã¤ã¶ã System.Text.Json ã«å¤æ´ããã¨ã¯ã³ãã£ã³ããããã