|
| 1 | +(find-documentarray)= |
| 2 | +# Query by Conditions |
| 3 | + |
| 4 | +We can use {meth}`~docarray.array.mixins.find.FindMixin.find` to select Documents from a DocumentArray based the conditions specified in a `query` object. One can use `da.find(query)` to filter Documents and get nearest neighbours from `da`: |
| 5 | + |
| 6 | +- To filter Documents, the `query` object is a Python dictionary object that defines the filtering conditions using a [MongoDB](https://docs.mongodb.com/manual/reference/operator/query/)-like query language. |
| 7 | +- To find nearest neighbours, the `query` object needs to be a NdArray-like, a Document, or a DocumentArray object that defines embedding. One can also use `.match()` function for this purpose, and there is a minor interface difference between these two functions, which will be described {ref}`in the next chapter<match-documentarray>`. |
| 8 | + |
| 9 | +Let's see some examples in action. First, let's prepare a DocumentArray we will use. |
| 10 | + |
| 11 | +```python |
| 12 | +from jina import Document, DocumentArray |
| 13 | + |
| 14 | +da = DocumentArray([Document(text='journal', weight=25, tags={'h': 14, 'w': 21, 'uom': 'cm'}, modality='A'), |
| 15 | + Document(text='notebook', weight=50, tags={'h': 8.5, 'w': 11, 'uom': 'in'}, modality='A'), |
| 16 | + Document(text='paper', weight=100, tags={'h': 8.5, 'w': 11, 'uom': 'in'}, modality='D'), |
| 17 | + Document(text='planner', weight=75, tags={'h': 22.85, 'w': 30, 'uom': 'cm'}, modality='D'), |
| 18 | + Document(text='postcard', weight=45, tags={'h': 10, 'w': 15.25, 'uom': 'cm'}, modality='A')]) |
| 19 | + |
| 20 | +da.summary() |
| 21 | +``` |
| 22 | + |
| 23 | +```text |
| 24 | + Documents Summary |
| 25 | + |
| 26 | + Length 5 |
| 27 | + Homogenous Documents True |
| 28 | + Common Attributes ('id', 'text', 'tags', 'weight', 'modality') |
| 29 | + |
| 30 | + Attributes Summary |
| 31 | + |
| 32 | + Attribute Data type #Unique values Has empty value |
| 33 | + ────────────────────────────────────────────────────────── |
| 34 | + id ('str',) 5 False |
| 35 | + weight ('int',) 5 False |
| 36 | + modality ('str',) 2 False |
| 37 | + tags ('dict',) 5 False |
| 38 | + text ('str',) 5 False |
| 39 | +``` |
| 40 | + |
| 41 | +## Filter with query operators |
| 42 | + |
| 43 | +A query filter document can use the query operators to specify conditions in the following form: |
| 44 | + |
| 45 | +```text |
| 46 | +{ <field1>: { <operator1>: <value1> }, ... } |
| 47 | +``` |
| 48 | + |
| 49 | +Here `field1` is {ref}`any field name<doc-fields>` of a Document object. To access nested fields, one can use the dunder expression. For example, `tags__timestamp` is to access `doc.tags['timestamp']` field. |
| 50 | + |
| 51 | +`value1` can be either a user given Python object, or a substitution field with curly bracket `{field}` |
| 52 | + |
| 53 | +Finally, `operator1` can be one of the following: |
| 54 | + |
| 55 | +| Query Operator | Description | |
| 56 | +|----------------|------------------------------------------------------------------------------------------------------------| |
| 57 | +| `$eq` | Equal to (number, string) | |
| 58 | +| `$ne` | Not equal to (number, string) | |
| 59 | +| `$gt` | Greater than (number) | |
| 60 | +| `$gte` | Greater than or equal to (number) | |
| 61 | +| `$lt` | Less than (number) | |
| 62 | +| `$lte` | Less than or equal to (number) | |
| 63 | +| `$in` | Is in an array | |
| 64 | +| `$nin` | Not in an array | |
| 65 | +| `$regex` | Match the specified regular expression | |
| 66 | +| `$size` | Match array/dict field that have the specified size. `$size` does not accept ranges of values. | |
| 67 | +| `$exists` | Matches documents that have the specified field. And empty string content is also considered as not exists. | |
| 68 | + |
| 69 | + |
| 70 | +For example, to select all `modality='D'` Documents, |
| 71 | + |
| 72 | +```python |
| 73 | +r = da.find({'modality': {'$eq': 'D'}}) |
| 74 | + |
| 75 | +pprint(r.to_dict(exclude_none=True)) # just for pretty print |
| 76 | +``` |
| 77 | + |
| 78 | +```text |
| 79 | +[{'id': '92aee5d665d0c4dd34db10d83642aded', |
| 80 | + 'modality': 'D', |
| 81 | + 'tags': {'h': 8.5, 'uom': 'in', 'w': 11.0}, |
| 82 | + 'text': 'paper', |
| 83 | + 'weight': 100.0}, |
| 84 | + {'id': '1a9d2139b02bc1c7842ecda94b347889', |
| 85 | + 'modality': 'D', |
| 86 | + 'tags': {'h': 22.85, 'uom': 'cm', 'w': 30.0}, |
| 87 | + 'text': 'planner', |
| 88 | + 'weight': 75.0}] |
| 89 | +``` |
| 90 | + |
| 91 | +To select all Documents whose `.tags['h']>10`, |
| 92 | + |
| 93 | +```python |
| 94 | +r = da.find({'tags__h': {'$gt': 10}}) |
| 95 | +``` |
| 96 | + |
| 97 | +```text |
| 98 | +[{'id': '4045a9659875fd1299e482d710753de3', |
| 99 | + 'modality': 'A', |
| 100 | + 'tags': {'h': 14.0, 'uom': 'cm', 'w': 21.0}, |
| 101 | + 'text': 'journal', |
| 102 | + 'weight': 25.0}, |
| 103 | + {'id': 'cf7691c445220b94b88ff116911bad24', |
| 104 | + 'modality': 'D', |
| 105 | + 'tags': {'h': 22.85, 'uom': 'cm', 'w': 30.0}, |
| 106 | + 'text': 'planner', |
| 107 | + 'weight': 75.0}] |
| 108 | +``` |
| 109 | + |
| 110 | +Beside using a predefined value, one can also use a substitution with `{field}`, notice the curly brackets there. For example, |
| 111 | + |
| 112 | +```python |
| 113 | +r = da.find({'tags__h': {'$gt': '{tags__w}'}}) |
| 114 | +``` |
| 115 | + |
| 116 | +```text |
| 117 | +[{'id': '44c6a4b18eaa005c6dbe15a28a32ebce', |
| 118 | + 'modality': 'A', |
| 119 | + 'tags': {'h': 14.0, 'uom': 'cm', 'w': 10.0}, |
| 120 | + 'text': 'journal', |
| 121 | + 'weight': 25.0}] |
| 122 | +``` |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | +## Combine multiple conditions |
| 127 | + |
| 128 | + |
| 129 | +You can combine multiple conditions using the following operators |
| 130 | + |
| 131 | +| Boolean Operator | Description | |
| 132 | +|------------------|----------------------------------------------------| |
| 133 | +| `$and` | Join query clauses with a logical AND | |
| 134 | +| `$or` | Join query clauses with a logical OR | |
| 135 | +| `$not` | Inverts the effect of a query expression | |
| 136 | + |
| 137 | + |
| 138 | + |
| 139 | +```python |
| 140 | +r = da.find({'$or': [{'weight': {'$eq': 45}}, {'modality': {'$eq': 'D'}}]}) |
| 141 | +``` |
| 142 | + |
| 143 | +```text |
| 144 | +[{'id': '22985b71b6d483c31cbe507ed4d02bd1', |
| 145 | + 'modality': 'D', |
| 146 | + 'tags': {'h': 8.5, 'uom': 'in', 'w': 11.0}, |
| 147 | + 'text': 'paper', |
| 148 | + 'weight': 100.0}, |
| 149 | + {'id': 'a071faf19feac5809642e3afcd3a5878', |
| 150 | + 'modality': 'D', |
| 151 | + 'tags': {'h': 22.85, 'uom': 'cm', 'w': 30.0}, |
| 152 | + 'text': 'planner', |
| 153 | + 'weight': 75.0}, |
| 154 | + {'id': '411ecc70a71a3f00fc3259bf08c239d1', |
| 155 | + 'modality': 'A', |
| 156 | + 'tags': {'h': 10.0, 'uom': 'cm', 'w': 15.25}, |
| 157 | + 'text': 'postcard', |
| 158 | + 'weight': 45.0}] |
| 159 | +``` |
0 commit comments