-
Notifications
You must be signed in to change notification settings - Fork 234
docs: make document indices self-contained #1678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
caa00fd
b45e3a6
cad4e60
11bda62
96319ca
f5825f8
8aaedbe
4a3e25c
db77beb
82afb99
befc786
9bdb0dc
64f83bf
759900c
60cd4d4
ca25feb
7fef5d8
fe572da
10bc14b
6199a2a
fa8f919
c257a4e
ccf17e1
f3ca77c
21e3ad2
e6ef9c4
19045ec
5736334
41c7307
a32a1e5
8a8aa33
ef0b7ef
6818688
9268161
b402802
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Signed-off-by: jupyterjazz <[email protected]>
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -58,49 +58,52 @@ This doesn't require a database server - rather, it saves your data locally. | |
| For a deeper understanding, please look into its [documentation](index_in_memory.md). | ||
|
|
||
| ### Define document schema and create data | ||
| The following code snippet defines a document schema using the `BaseDoc` class. Each document consists of a title (a string), | ||
| a price (an integer), and an embedding (a 128-dimensional array). It also creates a list of ten documents with dummy titles, | ||
| prices ranging from 0 to 9, and randomly generated embeddings. | ||
| ```python | ||
| from docarray import BaseDoc, DocList | ||
| from docarray.index import InMemoryExactNNIndex | ||
| from docarray.typing import NdArray | ||
| import numpy as np | ||
|
|
||
| # Define the document schema. | ||
| class MyDoc(BaseDoc): | ||
| title: str | ||
| price: int | ||
| embedding: NdArray[128] | ||
|
|
||
| # Create documents (using dummy/random vectors) | ||
| docs = DocList[MyDoc]( | ||
| MyDoc(title=f"title #{i}", price=i, embedding=np.random.rand(128)) | ||
| for i in range(10) | ||
| ) | ||
| ``` | ||
|
|
||
| ### Initialize the Document Index and add data | ||
| Here we initialize an `InMemoryExactNNIndex` instance with the document schema defined previously, and add the created documents to this index. | ||
| ```python | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we add 1-2 sentences of explanation to these code snippets? i think they are very self-explanatory, but personally as a user i don't like code snippets without words around them :) |
||
| # Initialize a new InMemoryExactNNIndex instance and add the documents to the index. | ||
| doc_index = InMemoryExactNNIndex[MyDoc]() | ||
| doc_index.index(docs) | ||
| ``` | ||
|
|
||
| ### Perform a vector similarity search | ||
| Now, let's perform a similarity search on the document embeddings using a query vector of ones. | ||
jupyterjazz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| As a result, we'll retrieve the top 10 most similar documents and their corresponding similarity scores. | ||
| ```python | ||
| # Perform a vector search. | ||
| query = np.ones(128) | ||
| retrieved_docs, scores = doc_index.find(query, search_field='embedding', limit=10) | ||
| ``` | ||
|
|
||
| ### Filter documents | ||
| In this segment, we filter the indexed documents based on their price field, specifically retrieving documents with a price less than 5. | ||
jupyterjazz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ```python | ||
| # Perform filtering (price < 5) | ||
| query = {'price': {'$lt': 5}} | ||
| filtered_docs = doc_index.filter(query, limit=10) | ||
| ``` | ||
|
|
||
| ### Combine different search methods | ||
| The final snippet combines the vector similarity search and filtering operations into a single query. | ||
| We first perform a similarity search on the document embeddings and then apply a filter to return only those documents with a price greater than or equal to 2. | ||
jupyterjazz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ```python | ||
| # Perform a hybrid search - combining vector search with filtering | ||
| query = ( | ||
| doc_index.build_query() # get empty query object | ||
| .find(query=np.ones(128), search_field='embedding') # add vector similarity search | ||
|
|
@@ -109,3 +112,15 @@ query = ( | |
| ) | ||
| retrieved_docs, scores = doc_index.execute_query(query) | ||
| ``` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should here again add a big fat link to all the backend documentation pages and tell people that they can get more detailed information there |
||
|
|
||
| ## Learn more | ||
| The code snippets presented above just scratch the surface of what a Document Index can do. | ||
jupyterjazz marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| To learn more and get the most out of `DocArray`, take a look at the detailed guides for the vector database backends you're interested in: | ||
|
|
||
| - [Weaviate](https://weaviate.io/) | [Docs](index_weaviate.md) | ||
| - [Qdrant](https://qdrant.tech/) | [Docs](index_qdrant.md) | ||
| - [Elasticsearch](https://www.elastic.co/elasticsearch/) v7 and v8 | [Docs](index_elastic.md) | ||
| - [Redis](https://redis.com/) | [Docs](index_redis.md) | ||
| - [Milvus](https://milvus.io/) | [Docs](index_milvus.md) | ||
| - [HNSWlib](https://github.com/nmslib/hnswlib) | [Docs](index_hnswlib.md) | ||
| - InMemoryExactNNIndex | [Docs](index_in_memory.md) | ||
Uh oh!
There was an error while loading. Please reload this page.