docs: apply johannes suggestions

Signed-off-by: jupyterjazz <[email protected]>
docarray · JoanFM · Aug 1, 2023 · Jun 28, 2023 · Jul 6, 2023 · Jul 6, 2023
commit ef0b7ef869cf332b22f27f18f5da51e837ebedfb
diff --git a/docs/user_guide/storing/docindex.md b/docs/user_guide/storing/docindex.md
@@ -58,49 +58,52 @@ This doesn't require a database server - rather, it saves your data locally.
     For a deeper understanding, please look into its [documentation](index_in_memory.md).
 
 ### Define document schema and create data
+The following code snippet defines a document schema using the `BaseDoc` class. Each document consists of a title (a string), 
+a price (an integer), and an embedding (a 128-dimensional array). It also creates a list of ten documents with dummy titles, 
+prices ranging from 0 to 9, and randomly generated embeddings.
 ```python
 from docarray import BaseDoc, DocList
 from docarray.index import InMemoryExactNNIndex
 from docarray.typing import NdArray
 import numpy as np
 
-# Define the document schema.
 class MyDoc(BaseDoc):
     title: str
     price: int
     embedding: NdArray[128]
 
-# Create documents (using dummy/random vectors)
 docs = DocList[MyDoc](
     MyDoc(title=f"title #{i}", price=i, embedding=np.random.rand(128))
     for i in range(10)
 )
 ```
 
 ### Initialize the Document Index and add data
+Here we initialize an `InMemoryExactNNIndex` instance with the document schema defined previously, and add the created documents to this index.
 ```python
-# Initialize a new InMemoryExactNNIndex instance and add the documents to the index.
 doc_index = InMemoryExactNNIndex[MyDoc]()
 doc_index.index(docs)
 ```
 
 ### Perform a vector similarity search
+Now, let's perform a similarity search on the document embeddings using a query vector of ones. 
+As a result, we'll retrieve the top 10 most similar documents and their corresponding similarity scores.
 ```python
-# Perform a vector search.
 query = np.ones(128)
 retrieved_docs, scores = doc_index.find(query, search_field='embedding', limit=10)
 ```
 
 ### Filter documents
+In this segment, we filter the indexed documents based on their price field, specifically retrieving documents with a price less than 5.
 ```python
-# Perform filtering (price < 5)
 query = {'price': {'$lt': 5}}
 filtered_docs = doc_index.filter(query, limit=10)
 ```
 
 ### Combine different search methods
+The final snippet combines the vector similarity search and filtering operations into a single query. 
+We first perform a similarity search on the document embeddings and then apply a filter to return only those documents with a price greater than or equal to 2.
 ```python
-# Perform a hybrid search - combining vector search with filtering
 query = (
     doc_index.build_query()  # get empty query object
     .find(query=np.ones(128), search_field='embedding')  # add vector similarity search
@@ -109,3 +112,15 @@ query = (
 )
 retrieved_docs, scores = doc_index.execute_query(query)
 ```
+
+## Learn more
+The code snippets presented above just scratch the surface of what a Document Index can do. 
+To learn more and get the most out of `DocArray`, take a look at the detailed guides for the vector database backends you're interested in:
+
+- [Weaviate](https://weaviate.io/)  |  [Docs](index_weaviate.md)
+- [Qdrant](https://qdrant.tech/)  |  [Docs](index_qdrant.md)
+- [Elasticsearch](https://www.elastic.co/elasticsearch/) v7 and v8  |  [Docs](index_elastic.md)
+- [Redis](https://redis.com/)  |  [Docs](index_redis.md)
+- [Milvus](https://milvus.io/)  |  [Docs](index_milvus.md)
+- [HNSWlib](https://github.com/nmslib/hnswlib)  |  [Docs](index_hnswlib.md)
+- InMemoryExactNNIndex  |  [Docs](index_in_memory.md)
diff --git a/docs/user_guide/storing/index_elastic.md b/docs/user_guide/storing/index_elastic.md
@@ -35,6 +35,9 @@ but will also work for [ElasticV7DocIndex][docarray.index.backends.elasticv7.Ela
 
 
 ## Basic usage
+This snippet demonstrates the basic usage of [ElasticDocIndex][docarray.index.backends.elastic.ElasticDocIndex]. It defines a document schema with a title and an embedding, 
+creates ten dummy documents with random embeddings, initializes an instance of [ElasticDocIndex][docarray.index.backends.elastic.ElasticDocIndex] to index these documents, 
+and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
 
 ```python
 from docarray import BaseDoc, DocList
@@ -186,23 +189,44 @@ db.index(data)
 
 ## Index
 
-Use `.index()` to add documents into the index.
+Now that you have a Document Index, you can add data to it, using the [`index()`][docarray.index.abstract.BaseDocIndex.index] method.
 The `.num_docs()` method returns the total number of documents in the index.
 
 ```python
-index_docs = [SimpleDoc(tensor=np.ones(128)) for _ in range(64)]
+from docarray import DocList
 
-doc_index.index(index_docs)
+# create some random data
+docs = DocList[SimpleDoc]([SimpleDoc(tensor=np.ones(128)) for _ in range(64)])
+
+doc_index.index(docs)
 
 print(f'number of docs in the index: {doc_index.num_docs()}')
 ```
 
+As you can see, `DocList[SimpleDoc]` and `ElasticDocIndex[SimpleDoc]` both have `SimpleDoc` as a parameter.
+This means that they share the same schema, and in general, both the Document Index and the data that you want to store need to have compatible schemas.
+
+!!! question "When are two schemas compatible?"
+    The schemas of your Document Index and data need to be compatible with each other.
+
+    Let's say A is the schema of your Document Index and B is the schema of your data.
+    There are a few rules that determine if schema A is compatible with schema B.
+    If _any_ of the following are true, then A and B are compatible:
+
+    - A and B are the same class
+    - A and B have the same field names and field types
+    - A and B have the same field names, and, for every field, the type of B is a subclass of the type of A
+
+    In particular, this means that you can easily [index predefined documents](#using-a-predefined-document-as-schema) into a Document Index.
+
+
+
 ## Vector search
 
-The `.find()` method is used to find the nearest neighbors of a vector.
+Now that you have indexed your data, you can perform vector similarity search using the [`find()`][docarray.index.abstract.BaseDocIndex.find] method.
 
-You need to specify the `search_field` that is used when performing the vector search.
-This is the field that serves as the basis of comparison between your query and indexed documents.
+You can use the [`find()`][docarray.index.abstract.BaseDocIndex.find] function with a document of the type `MyDoc` 
+to find similar documents within the Document Index:
 
 You can use the `limit` argument to configure how many documents to return.
 
@@ -211,14 +235,87 @@ You can use the `limit` argument to configure how many documents to return.
     This can lead to poor performance when the search involves many vectors.
     [ElasticDocIndex][docarray.index.backends.elastic.ElasticDocIndex] does not have this limitation.
 
-```python
-query = SimpleDoc(tensor=np.ones(128))
+=== "Search by Document"
 
-docs, scores = doc_index.find(query, limit=5, search_field='tensor')
-```
+    ```python
+    # create a query Document
+    query = SimpleDoc(tensor=np.ones(128))
+
+    # find similar documents
+    matches, scores = doc_index.find(query, search_field='tensor', limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches.text=}')
+    print(f'{scores=}')
+    ```
+
+=== "Search by raw vector"
+
+    ```python
+    # create a query vector
+    query = np.random.rand(128)
+
+    # find similar documents
+    matches, scores = doc_index.find(query, search_field='tensor', limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches.text=}')
+    print(f'{scores=}')
+    ```
+
+To peform a vector search, you need to specify a `search_field`. This is the field that serves as the
+basis of comparison between your query and the documents in the Document Index.
+
+In this particular example you only have one field (`tensor`) that is a vector, so you can trivially choose that one.
+In general, you could have multiple fields of type `NdArray` or `TorchTensor` or `TensorFlowTensor`, and you can choose
+which one to use for the search.
+
+The [`find()`][docarray.index.abstract.BaseDocIndex.find] method returns a named tuple containing the closest
+matching documents and their associated similarity scores.
+
+When searching on the subindex level, you can use the [`find_subindex()`][docarray.index.abstract.BaseDocIndex.find_subindex] method, which returns a named tuple containing the subindex documents, similarity scores and their associated root documents.
+
+How these scores are calculated depends on the backend, and can usually be [configured](#configuration).
+
+
+### Batched search
 
 You can also search for multiple documents at once, in a batch, using the [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method.
 
+=== "Search by Documents"
+
+    ```python
+    # create some query Documents
+    queries = DocList[SimpleDoc](
+        SimpleDoc(tensor=np.random.rand(128)) for i in range(3)
+    )
+
+    # find similar Documents
+    matches, scores = doc_index.find_batched(queries, search_field='tensor', limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches[0].text=}')
+    print(f'{scores=}')
+    ```
+
+=== "Search by raw vectors"
+
+    ```python
+    # create some query vectors
+    query = np.random.rand(3, 128)
+
+    # find similar Documents
+    matches, scores = doc_index.find_batched(query, search_field='tensor', limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches[0].text=}')
+    print(f'{scores=}')
+    ```
+
+The [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
+a list of `DocList`s, one for each query, containing the closest matching documents and their similarity scores.
+
+
 
 ## Filter
 

diff --git a/docs/user_guide/storing/index_hnswlib.md b/docs/user_guide/storing/index_hnswlib.md
@@ -24,6 +24,9 @@ It stores vectors on disk in [hnswlib](https://github.com/nmslib/hnswlib), and s
     - [MilvusDocumentIndex][docarray.index.backends.milvus.MilvusDocumentIndex]
 
 ## Basic usage
+This snippet demonstrates the basic usage of [HnswDocumentIndex][docarray.index.backends.hnswlib.HnswDocumentIndex]. It defines a document schema with a title and an embedding, 
+creates ten dummy documents with random embeddings, initializes an instance of [HnswDocumentIndex][docarray.index.backends.hnswlib.HnswDocumentIndex] to index these documents, 
+and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
 
 ```python
 from docarray import BaseDoc, DocList

diff --git a/docs/user_guide/storing/index_in_memory.md b/docs/user_guide/storing/index_in_memory.md
@@ -21,6 +21,9 @@ utilizes DocArray's [`find()`][docarray.utils.find.find] and [`filter_docs()`][d
 
 
 ## Basic usage
+This snippet demonstrates the basic usage of [InMemoryExactNNIndex][docarray.index.backends.in_memory.InMemoryExactNNIndex]. It defines a document schema with a title and an embedding, 
+creates ten dummy documents with random embeddings, initializes an instance of [InMemoryExactNNIndex][docarray.index.backends.in_memory.InMemoryExactNNIndex] to index these documents, 
+and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
 
 ```python
 from docarray import BaseDoc, DocList

diff --git a/docs/user_guide/storing/index_milvus.md b/docs/user_guide/storing/index_milvus.md
@@ -12,6 +12,10 @@ focusing on special features and configurations of Milvus.
 
 
 ## Basic usage
+This snippet demonstrates the basic usage of [MilvusDocumentIndex][docarray.index.backends.milvus.MilvusDocumentIndex]. It defines a document schema with a title and an embedding, 
+creates ten dummy documents with random embeddings, initializes an instance of [MilvusDocumentIndex][docarray.index.backends.milvus.MilvusDocumentIndex] to index these documents, 
+and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
+
 !!! note "Single Search Field Requirement"
     In order to utilize vector search, it's necessary to define 'is_embedding' for one field only. 
     This is due to Milvus' configuration, which permits a single vector for each data object.
@@ -215,8 +219,43 @@ When searching on the subindex level, you can use the [`find_subindex()]`[docarr
 
 How these scores are calculated depends on the backend, and can usually be [configured](#configuration).
 
+### Batched Search
+
 You can also search for multiple documents at once, in a batch, using the [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method.
 
+=== "Search by documents"
+
+    ```python
+    # create some query documents
+    queries = DocList[MyDoc](
+        MyDoc(embedding=np.random.rand(128), text=f'query {i}') for i in range(3)
+    )
+
+    # find similar documents
+    matches, scores = doc_index.find_batched(queries, limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches[0].text=}')
+    print(f'{scores=}')
+    ```
+
+=== "Search by raw vectors"
+
+    ```python
+    # create some query vectors
+    query = np.random.rand(3, 128)
+
+    # find similar documents
+    matches, scores = doc_index.find_batched(query, limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches[0].text=}')
+    print(f'{scores=}')
+    ```
+
+The [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
+a list of `DocList`s, one for each query, containing the closest matching documents and their similarity scores.
+
 
 ## Filter
 

diff --git a/docs/user_guide/storing/index_qdrant.md b/docs/user_guide/storing/index_qdrant.md
@@ -12,6 +12,10 @@ based on the [Qdrant](https://qdrant.tech/) vector search engine.
 
 
 ## Basic usage
+This snippet demonstrates the basic usage of [QdrantDocumentIndex][docarray.index.backends.qdrant.QdrantDocumentIndex]. It defines a document schema with a title and an embedding, 
+creates ten dummy documents with random embeddings, initializes an instance of [QdrantDocumentIndex][docarray.index.backends.qdrant.QdrantDocumentIndex] to index these documents, 
+and performs a vector similarity search to retrieve the top 10 most similar documents to a given query vector.
+
 ```python
 from docarray import BaseDoc, DocList
 from docarray.index import QdrantDocumentIndex
@@ -253,8 +257,44 @@ When searching on the subindex level, you can use the [`find_subindex()`][docarr
 
 How these scores are calculated depends on the backend, and can usually be [configured](#configuration).
 
+### Batched Search
+
 You can also search for multiple documents at once, in a batch, using the [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method.
 
+=== "Search by documents"
+
+    ```python
+    # create some query documents
+    queries = DocList[MyDoc](
+        MyDoc(embedding=np.random.rand(128), text=f'query {i}') for i in range(3)
+    )
+
+    # find similar documents
+    matches, scores = doc_index.find_batched(queries, search_field='embedding', limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches[0].text=}')
+    print(f'{scores=}')
+    ```
+
+=== "Search by raw vectors"
+
+    ```python
+    # create some query vectors
+    query = np.random.rand(3, 128)
+
+    # find similar documents
+    matches, scores = doc_index.find_batched(query, search_field='embedding', limit=5)
+
+    print(f'{matches=}')
+    print(f'{matches[0].text=}')
+    print(f'{scores=}')
+    ```
+
+The [find_batched()][docarray.index.abstract.BaseDocIndex.find_batched] method returns a named tuple containing
+a list of `DocList`s, one for each query, containing the closest matching documents and their similarity scores.
+
+
 ## Filter
 
 You can filter your documents by using the `filter()` or `filter_batched()` method with a corresponding  filter query.