Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
caa00fd
chore: first pr
jupyterjazz Jun 28, 2023
b45e3a6
docs: modify hnsw
jupyterjazz Jul 6, 2023
cad4e60
Merge branch 'main' into docs-self-contained-indices
jupyterjazz Jul 6, 2023
11bda62
docs: rough versions of inmemory and hnsw
jupyterjazz Jul 6, 2023
96319ca
chore: update branch
jupyterjazz Jul 6, 2023
f5825f8
docs: weaviate v1
jupyterjazz Jul 6, 2023
8aaedbe
docs: elastic v1
jupyterjazz Jul 17, 2023
4a3e25c
docs: introduction page
jupyterjazz Jul 17, 2023
db77beb
docs: redis v1
jupyterjazz Jul 17, 2023
82afb99
docs: qdrant v1
jupyterjazz Jul 17, 2023
befc786
docs: validate intro inmemory and hnsw examples
jupyterjazz Jul 17, 2023
9bdb0dc
docs: validate elastic and qdrant examples
jupyterjazz Jul 17, 2023
64f83bf
docs: validate code examples for redis and weaviate
jupyterjazz Jul 18, 2023
759900c
Merge branch 'main' into docs-self-contained-indices
jupyterjazz Jul 19, 2023
60cd4d4
chore: merge recent updates
jupyterjazz Jul 19, 2023
ca25feb
docs: milvus v1
jupyterjazz Jul 19, 2023
7fef5d8
Merge branch 'main' into docs-self-contained-indices
jupyterjazz Jul 24, 2023
fe572da
docs: validate milvus code
jupyterjazz Jul 24, 2023
10bc14b
docs: make redis and milvus visible
jupyterjazz Jul 24, 2023
6199a2a
docs: refine vol1
jupyterjazz Jul 26, 2023
fa8f919
Merge branch 'main' into docs-self-contained-indices
jupyterjazz Jul 26, 2023
c257a4e
docs: refine vol2
jupyterjazz Jul 26, 2023
ccf17e1
chore: pull recent updates
jupyterjazz Jul 26, 2023
f3ca77c
docs: update api reference
jupyterjazz Jul 27, 2023
21e3ad2
Merge branch 'main' into docs-self-contained-indices
jupyterjazz Jul 27, 2023
e6ef9c4
docs: apply suggestions
jupyterjazz Jul 31, 2023
19045ec
docs: separate nested data section
jupyterjazz Jul 31, 2023
5736334
Merge branch 'main' into docs-self-contained-indices
jupyterjazz Jul 31, 2023
41c7307
docs: apply suggestions vol2
jupyterjazz Jul 31, 2023
a32a1e5
fix: nested data imports
jupyterjazz Jul 31, 2023
8a8aa33
Merge branch 'main' into docs-self-contained-indices
jupyterjazz Aug 1, 2023
ef0b7ef
docs: apply johannes suggestions
jupyterjazz Aug 1, 2023
6818688
chore: merge conflicts
jupyterjazz Aug 1, 2023
9268161
docs: apply suggestions
jupyterjazz Aug 1, 2023
b402802
docs: app sgg
jupyterjazz Aug 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: apply suggestions
Signed-off-by: jupyterjazz <[email protected]>
  • Loading branch information
jupyterjazz committed Aug 1, 2023
commit 926816110db43fc177ffe15bfe0b4859fadec66f
12 changes: 6 additions & 6 deletions docs/user_guide/storing/docindex.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,30 +79,30 @@ docs = DocList[MyDoc](
```

### Initialize the Document Index and add data
Here we initialize an `InMemoryExactNNIndex` instance with the document schema defined previously, and add the created documents to this index.
Here we initialize an `InMemoryExactNNIndex` instance with the document schema we defined previously, and add the created documents to this index.
```python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add 1-2 sentences of explanation to these code snippets? i think they are very self-explanatory, but personally as a user i don't like code snippets without words around them :)

doc_index = InMemoryExactNNIndex[MyDoc]()
doc_index.index(docs)
```

### Perform a vector similarity search
Now, let's perform a similarity search on the document embeddings using a query vector of ones.
As a result, we'll retrieve the top 10 most similar documents and their corresponding similarity scores.
Now, let's perform a similarity search on the document embeddings.
As a result, we'll retrieve ten most similar documents and their corresponding similarity scores.
```python
query = np.ones(128)
retrieved_docs, scores = doc_index.find(query, search_field='embedding', limit=10)
```

### Filter documents
In this segment, we filter the indexed documents based on their price field, specifically retrieving documents with a price less than 5.
In this snippet, we filter the indexed documents based on their price field, specifically retrieving documents with a price less than 5:
```python
query = {'price': {'$lt': 5}}
filtered_docs = doc_index.filter(query, limit=10)
```

### Combine different search methods
The final snippet combines the vector similarity search and filtering operations into a single query.
We first perform a similarity search on the document embeddings and then apply a filter to return only those documents with a price greater than or equal to 2.
We first perform a similarity search on the document embeddings and then apply a filter to return only those documents with a price greater than or equal to 2:
```python
query = (
doc_index.build_query() # get empty query object
Expand All @@ -114,7 +114,7 @@ retrieved_docs, scores = doc_index.execute_query(query)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should here again add a big fat link to all the backend documentation pages and tell people that they can get more detailed information there


## Learn more
The code snippets presented above just scratch the surface of what a Document Index can do.
The code snippets above just scratch the surface of what a Document Index can do.
To learn more and get the most out of `DocArray`, take a look at the detailed guides for the vector database backends you're interested in:

- [Weaviate](https://weaviate.io/) | [Docs](index_weaviate.md)
Expand Down
20 changes: 10 additions & 10 deletions docs/user_guide/storing/index_elastic.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ but will also work for [ElasticV7DocIndex][docarray.index.backends.elasticv7.Ela
## Basic usage
This snippet demonstrates the basic usage of [ElasticDocIndex][docarray.index.backends.elastic.ElasticDocIndex]. It defines a document schema with a title and an embedding,
creates ten dummy documents with random embeddings, initializes an instance of [ElasticDocIndex][docarray.index.backends.elastic.ElasticDocIndex] to index these documents,
and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
and performs a vector similarity search to retrieve ten most similar documents to a given query vector.

```python
from docarray import BaseDoc, DocList
Expand Down Expand Up @@ -238,7 +238,7 @@ You can use the `limit` argument to configure how many documents to return.
=== "Search by Document"

```python
# create a query Document
# create a query document
query = SimpleDoc(tensor=np.ones(128))

# find similar documents
Expand Down Expand Up @@ -266,7 +266,7 @@ You can use the `limit` argument to configure how many documents to return.
To peform a vector search, you need to specify a `search_field`. This is the field that serves as the
basis of comparison between your query and the documents in the Document Index.

In this particular example you only have one field (`tensor`) that is a vector, so you can trivially choose that one.
In this example you only have one field (`tensor`) that is a vector, so you can trivially choose that one.
In general, you could have multiple fields of type `NdArray` or `TorchTensor` or `TensorFlowTensor`, and you can choose
which one to use for the search.

Expand All @@ -280,7 +280,7 @@ How these scores are calculated depends on the backend, and can usually be [conf

### Batched search

You can also search for multiple documents at once, in a batch, using the [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method.
You can also search for multiple documents at once, in a batch, using the [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method.

=== "Search by Documents"

Expand All @@ -290,7 +290,7 @@ You can also search for multiple documents at once, in a batch, using the [find_
SimpleDoc(tensor=np.random.rand(128)) for i in range(3)
)

# find similar Documents
# find similar documents
matches, scores = doc_index.find_batched(queries, search_field='tensor', limit=5)

print(f'{matches=}')
Expand All @@ -304,15 +304,15 @@ You can also search for multiple documents at once, in a batch, using the [find_
# create some query vectors
query = np.random.rand(3, 128)

# find similar Documents
# find similar documents
matches, scores = doc_index.find_batched(query, search_field='tensor', limit=5)

print(f'{matches=}')
print(f'{matches[0].text=}')
print(f'{scores=}')
```

The [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
The [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
a list of `DocList`s, one for each query, containing the closest matching documents and their similarity scores.


Expand Down Expand Up @@ -424,8 +424,8 @@ docs = doc_index.filter(query)
## Text search

In addition to vector similarity search, the Document Index interface offers methods for text search:
[text_search()][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [text_search_batched()][docarray.index.abstract.BaseDocIndex.text_search_batched].
[`text_search()`][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [`text_search_batched()`][docarray.index.abstract.BaseDocIndex.text_search_batched].

As in "pure" Elasticsearch, you can use text search directly on the field of type `str`:

Expand Down Expand Up @@ -453,7 +453,7 @@ docs, scores = doc_index.text_search(query, search_field='text')
Document Index supports atomic operations for vector similarity search, text search and filter search.

To combine these operations into a single, hybrid search query, you can use the query builder that is accessible
through [build_query()][docarray.index.abstract.BaseDocIndex.build_query]:
through [`build_query()`][docarray.index.abstract.BaseDocIndex.build_query]:

For example, you can build a hybrid serach query that performs range filtering, vector search and text search:

Expand Down
22 changes: 11 additions & 11 deletions docs/user_guide/storing/index_hnswlib.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and s
## Basic usage
This snippet demonstrates the basic usage of [HnswDocumentIndex][docarray.index.backends.hnswlib.HnswDocumentIndex]. It defines a document schema with a title and an embedding,
creates ten dummy documents with random embeddings, initializes an instance of [HnswDocumentIndex][docarray.index.backends.hnswlib.HnswDocumentIndex] to index these documents,
and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
and performs a vector similarity search to retrieve ten most similar documents to a given query vector.

```python
from docarray import BaseDoc, DocList
Expand Down Expand Up @@ -194,7 +194,7 @@ to find similar documents within the Document Index:
=== "Search by Document"

```python
# create a query Document
# create a query document
query = MyDoc(embedding=np.random.rand(128), text='query')

# find similar documents
Expand Down Expand Up @@ -222,7 +222,7 @@ to find similar documents within the Document Index:
To peform a vector search, you need to specify a `search_field`. This is the field that serves as the
basis of comparison between your query and the documents in the Document Index.

In this particular example you only have one field (`embedding`) that is a vector, so you can trivially choose that one.
In this example you only have one field (`embedding`) that is a vector, so you can trivially choose that one.
In general, you could have multiple fields of type `NdArray` or `TorchTensor` or `TensorFlowTensor`, and you can choose
which one to use for the search.

Expand All @@ -233,9 +233,9 @@ When searching on the subindex level, you can use the [`find_subindex()`][docarr

How these scores are calculated depends on the backend, and can usually be [configured](#configuration).

### Batched Search
### Batched search

You can also search for multiple documents at once, in a batch, using the [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method.
You can also search for multiple documents at once, in a batch, using the [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method.

=== "Search by Documents"

Expand All @@ -245,7 +245,7 @@ You can also search for multiple documents at once, in a batch, using the [find_
MyDoc(embedding=np.random.rand(128), text=f'query {i}') for i in range(3)
)

# find similar Documents
# find similar documents
matches, scores = db.find_batched(queries, search_field='embedding', limit=5)

print(f'{matches=}')
Expand All @@ -259,15 +259,15 @@ You can also search for multiple documents at once, in a batch, using the [find_
# create some query vectors
query = np.random.rand(3, 128)

# find similar Documents
# find similar documents
matches, scores = db.find_batched(query, search_field='embedding', limit=5)

print(f'{matches=}')
print(f'{matches[0].text=}')
print(f'{scores=}')
```

The [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
The [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
a list of `DocList`s, one for each query, containing the closest matching documents and their similarity scores.


Expand Down Expand Up @@ -309,16 +309,16 @@ for doc in cheap_books:
To see how to perform text search, you can check out other backends that offer support.

In addition to vector similarity search, the Document Index interface offers methods for text search:
[text_search()][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [text_search_batched()][docarray.index.abstract.BaseDocIndex.text_search_batched].
[`text_search()`][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [`text_search_batched()`][docarray.index.abstract.BaseDocIndex.text_search_batched].


## Hybrid search

Document Index supports atomic operations for vector similarity search, text search and filter search.

To combine these operations into a single, hybrid search query, you can use the query builder that is accessible
through [build_query()][docarray.index.abstract.BaseDocIndex.build_query]:
through [`build_query()`][docarray.index.abstract.BaseDocIndex.build_query]:

```python
# Define the document schema.
Expand Down
18 changes: 9 additions & 9 deletions docs/user_guide/storing/index_in_memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ utilizes DocArray's [`find()`][docarray.utils.find.find] and [`filter_docs()`][d
## Basic usage
This snippet demonstrates the basic usage of [InMemoryExactNNIndex][docarray.index.backends.in_memory.InMemoryExactNNIndex]. It defines a document schema with a title and an embedding,
creates ten dummy documents with random embeddings, initializes an instance of [InMemoryExactNNIndex][docarray.index.backends.in_memory.InMemoryExactNNIndex] to index these documents,
and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
and performs a vector similarity search to retrieve ten most similar documents to a given query vector.

```python
from docarray import BaseDoc, DocList
Expand Down Expand Up @@ -191,7 +191,7 @@ to find similar documents within the Document Index:
=== "Search by Document"

```python
# create a query Document
# create a query document
query = MyDoc(embedding=np.random.rand(128), text='query')

# find similar documents
Expand Down Expand Up @@ -219,7 +219,7 @@ to find similar documents within the Document Index:
To peform a vector search, you need to specify a `search_field`. This is the field that serves as the
basis of comparison between your query and the documents in the Document Index.

In this particular example you only have one field (`embedding`) that is a vector, so you can trivially choose that one.
In this example you only have one field (`embedding`) that is a vector, so you can trivially choose that one.
In general, you could have multiple fields of type `NdArray` or `TorchTensor` or `TensorFlowTensor`, and you can choose
which one to use for the search.

Expand All @@ -230,9 +230,9 @@ When searching on the subindex level, you can use the [`find_subindex()`][docarr

How these scores are calculated depends on the backend, and can usually be [configured](#configuration).

### Batched Search
### Batched search

You can also search for multiple documents at once, in a batch, using the [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method.
You can also search for multiple documents at once, in a batch, using the [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method.

=== "Search by documents"

Expand Down Expand Up @@ -264,7 +264,7 @@ You can also search for multiple documents at once, in a batch, using the [find_
print(f'{scores=}')
```

The [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
The [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
a list of `DocList`s, one for each query, containing the closest matching documents and their similarity scores.


Expand Down Expand Up @@ -306,8 +306,8 @@ for doc in cheap_books:
To see how to perform text search, you can check out other backends that offer support.

In addition to vector similarity search, the Document Index interface offers methods for text search:
[text_search()][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [text_search_batched()][docarray.index.abstract.BaseDocIndex.text_search_batched].
[`text_search()`][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [`text_search_batched()`][docarray.index.abstract.BaseDocIndex.text_search_batched].



Expand All @@ -316,7 +316,7 @@ as well as the batched version [text_search_batched()][docarray.index.abstract.B
Document Index supports atomic operations for vector similarity search, text search and filter search.

To combine these operations into a single, hybrid search query, you can use the query builder that is accessible
through [build_query()][docarray.index.abstract.BaseDocIndex.build_query]:
through [`build_query()`][docarray.index.abstract.BaseDocIndex.build_query]:

```python
# Define the document schema.
Expand Down
22 changes: 11 additions & 11 deletions docs/user_guide/storing/index_milvus.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ focusing on special features and configurations of Milvus.
## Basic usage
This snippet demonstrates the basic usage of [MilvusDocumentIndex][docarray.index.backends.milvus.MilvusDocumentIndex]. It defines a document schema with a title and an embedding,
creates ten dummy documents with random embeddings, initializes an instance of [MilvusDocumentIndex][docarray.index.backends.milvus.MilvusDocumentIndex] to index these documents,
and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
and performs a vector similarity search to retrieve ten most similar documents to a given query vector.

!!! note "Single Search Field Requirement"
In order to utilize vector search, it's necessary to define 'is_embedding' for one field only.
Expand Down Expand Up @@ -187,10 +187,10 @@ the [`find()`][docarray.index.abstract.BaseDocIndex.find] method:
=== "Search by Document"

```python
# create a query Document
# create a query document
query = MyDoc(embedding=np.random.rand(128), title='query')

# find similar Documents
# find similar documents
matches, scores = doc_index.find(query, limit=5)

print(f'{matches=}')
Expand All @@ -204,7 +204,7 @@ the [`find()`][docarray.index.abstract.BaseDocIndex.find] method:
# create a query vector
query = np.random.rand(128)

# find similar Documents
# find similar documents
matches, scores = doc_index.find(query, limit=5)

print(f'{matches=}')
Expand All @@ -215,13 +215,13 @@ the [`find()`][docarray.index.abstract.BaseDocIndex.find] method:
The [`find()`][docarray.index.abstract.BaseDocIndex.find] method returns a named tuple containing the closest
matching documents and their associated similarity scores.

When searching on the subindex level, you can use the [`find_subindex()]`[docarray.index.abstract.BaseDocIndex.find_subindex] method, which returns a named tuple containing the subindex documents, similarity scores and their associated root documents.
When searching on the subindex level, you can use the [`find_subindex()`][docarray.index.abstract.BaseDocIndex.find_subindex] method, which returns a named tuple containing the subindex documents, similarity scores and their associated root documents.

How these scores are calculated depends on the backend, and can usually be [configured](#configuration).

### Batched Search
### Batched search

You can also search for multiple documents at once, in a batch, using the [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method.
You can also search for multiple documents at once, in a batch, using the [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method.

=== "Search by documents"

Expand Down Expand Up @@ -253,7 +253,7 @@ You can also search for multiple documents at once, in a batch, using the [find_
print(f'{scores=}')
```

The [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
The [`find_batched()`][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
a list of `DocList`s, one for each query, containing the closest matching documents and their similarity scores.


Expand Down Expand Up @@ -294,8 +294,8 @@ for doc in cheap_books:
To see how to perform text search, you can check out other backends that offer support.

In addition to vector similarity search, the Document Index interface offers methods for text search:
[text_search()][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [text_search_batched()][docarray.index.abstract.BaseDocIndex.text_search_batched].
[`text_search()`][docarray.index.abstract.BaseDocIndex.text_search],
as well as the batched version [`text_search_batched()`][docarray.index.abstract.BaseDocIndex.text_search_batched].



Expand All @@ -304,7 +304,7 @@ as well as the batched version [text_search_batched()][docarray.index.abstract.B
Document Index supports atomic operations for vector similarity search, text search and filter search.

To combine these operations into a single, hybrid search query, you can use the query builder that is accessible
through [build_query()][docarray.index.abstract.BaseDocIndex.build_query]:
through [`build_query()`][docarray.index.abstract.BaseDocIndex.build_query]:

```python
# Define the document schema.
Expand Down
Loading