Skip to content

Commit 397d349

Browse files
committed
chore: improve docs on arbitrary json
1 parent a57d977 commit 397d349

File tree

3 files changed

+20
-3
lines changed

3 files changed

+20
-3
lines changed

docs/fundamentals/document/construct.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ str(uuid.UUID(d.id))
2828

2929
Though possible, it is not recommended modifying `.id` of a Document frequently, as this will lead to unexpected behavior.
3030

31-
31+
(construct-from-dict)=
3232
## Construct with attributes
3333

3434
This is the most common usage of the constructor: initializing a Document object with given attributes.
@@ -135,7 +135,7 @@ When using in Jupyter notebook/Google Colab, Document is automatically prettifie
135135
```{figure} images/doc-in-jupyter.png
136136
```
137137

138-
138+
(unk-attribute)=
139139
### Unknown attributes handling
140140

141141
If you give an unknown attribute (i.e. not one of the built-in Document attributes), they will be automatically "caught" into `.tags` attributes. For example,

docs/fundamentals/document/serialization.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ print(d_as_json, d)
4545

4646
By default, it uses {ref}`JSON Schema and pydantic model<schema-gen>` for serialization, i.e. `protocol='jsonschema'`. You can switch the method to `protocol='protobuf'`, which leverages Protobuf as the JSON serialization backend.
4747

48-
To load an arbitrary JSON file, please set `protocol=None`. But as it is "arbitrary", you should not expect it can be succesfully loaded. DocArray tries its best reasonable effort by first load this JSON into `dict` and then load it via `Document(dict)`.
4948

5049
```python
5150
from docarray import Document
@@ -95,6 +94,22 @@ To find out what extra parameters you can pass to `to_json()`/`to_dict()`, pleas
9594
- [`protocol='protobuf', **kwargs`](https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.MessageToJson)
9695
```
9796

97+
98+
(arbitrary-json)=
99+
100+
### From/to arbitrary JSON
101+
102+
Arbitrary JSON is unschema-ed JSON. It often comes from a handcrafted JSON, or an export file from other libraries. Its schema is unknown to DocArray, so by principle we can not load it.
103+
104+
But load it, we do. To load an arbitrary JSON file set `protocol=None`.
105+
106+
As an _arbitrary_ JSON, you should not expect it always works smoothly. DocArray will try its best reasonable effort to parse its fields: by first loading the JSON into a `dict` object; and then building a Document via `Document(dict)`; when encountering unknown attributes it follows the behavior {ref}`described here<unk-attribute>`.
107+
108+
Rule of thumb, if you only work inside DocArray's ecosystem, please always prefer schema-ed JSON (`.to_json(protocol='jsonschema')`, or `.to_json(protocol='protobuf')`) over unschema-ed JSON. If you are exporting DocArray's JSON to other ecosystems, also prefer schema-ed JSON. Your engineer friends will appreciate it as it is easier for integration. In fact, DocArray does **not** unschema-ed JSON export, and your engineer friends will never be upset.
109+
110+
Read more about {ref}`schema-gen` support in DocArray.
111+
112+
98113
(doc-in-bytes)=
99114
## From/to bytes
100115

docs/fundamentals/documentarray/serialization.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ da_r.summary()
6868

6969

7070
```{seealso}
71+
To load an arbitrary JSON file, please set `protocol=None` {ref}`as descrbied here<arbitrary-json>`.
72+
7173
More parameters and usages can be found in the Document-level {ref}`doc-json`.
7274
```
7375

0 commit comments

Comments
 (0)