|
4 | 4 | DocArray is designed to be "ready-to-wire" at anytime. Serialization is important. DocumentArray provides multiple serialization methods that allows one transfer DocumentArray object over network and across different microservices. |
5 | 5 |
|
6 | 6 | - JSON string: `.from_json()`/`.to_json()` |
7 | | -- Bytes (compressed): `.from_bytes()`/`.to_bytes()` |
| 7 | +- Bytes (compressed): `.from_bytes()`/`.to_bytes()` |
| 8 | +- Base64 (compressed): `.from_base64()`/`.to_base64()` |
8 | 9 | - Protobuf Message: `.from_protobuf()`/`.to_protobuf()` |
9 | 10 | - Python List: `.from_list()`/`.to_list()` |
10 | 11 | - Pandas Dataframe: `.from_dataframe()`/`.to_dataframe()` |
@@ -141,6 +142,47 @@ When set `protocol=pickle` or `protobuf`, the result binary string looks like th |
141 | 142 |
|
142 | 143 | Here `Delimiter` is a 16-bytes separator such as `b'g\x81\xcc\x1c\x0f\x93L\xed\xa2\xb0s)\x9c\xf9\xf6\xf2'` used for setting the boundary of each Document's serialization. Given a `to_bytes(protocol='pickle/protobuf')` binary string, once we know the first 16 bytes, the boundary is clear. Consequently, one can leverage this format to stream Documents, drop, skip, or early-stop, etc. |
143 | 144 |
|
| 145 | +## From/to base64 |
| 146 | + |
| 147 | +```{important} |
| 148 | +Depending on your values of `protocol` and `compress` arguments, this feature may require `protobuf` and `lz4` dependencies. You can do `pip install "docarray[full]"` to install it. |
| 149 | +``` |
| 150 | + |
| 151 | +Serialize into base64 can be useful when binary string is not allowed, e.g. in REST API. This can be easily done via {meth}`~docarray.array.mixins.io.binary.BinaryIOMixin.to_base64` and {meth}`~docarray.array.mixins.io.binary.BinaryIOMixin.from_base64`. Like in binary serialization, one can specify `protocol` and `compress`: |
| 152 | + |
| 153 | +```python |
| 154 | +from docarray import DocumentArray |
| 155 | +da = DocumentArray.empty(10) |
| 156 | + |
| 157 | +d_str = da.to_base64(protocol='protobuf', compress='lz4') |
| 158 | +print(len(d_str), d_str) |
| 159 | +``` |
| 160 | + |
| 161 | +```text |
| 162 | +176 BCJNGEBAwHUAAAD/Iw+uQdpL9UDNsfvomZb8m7sKIGRkNTIyOTQyNzMwMzExZWNiM2I1MWUwMDhhMzY2ZDQ5MgAEP2FiNDIAHD9iMTgyAB0vNWUyAB0fYTIAHh9myAAdP2MzYZYAHD9jODAyAB0fZDIAHT9kMTZkAABQNjZkNDkAAAAA |
| 163 | +``` |
| 164 | + |
| 165 | +To deserialize, remember to set the correct `protocol` and `compress`: |
| 166 | + |
| 167 | +```python |
| 168 | +from docarray import DocumentArray |
| 169 | + |
| 170 | +da = DocumentArray.from_base64(d_str, protocol='protobuf', compress='lz4') |
| 171 | +da.summary() |
| 172 | +``` |
| 173 | + |
| 174 | +```text |
| 175 | + Length 10 |
| 176 | + Homogenous Documents True |
| 177 | + Common Attributes ('id',) |
| 178 | + |
| 179 | + Attributes Summary |
| 180 | + |
| 181 | + Attribute Data type #Unique values Has empty value |
| 182 | + ────────────────────────────────────────────────────────── |
| 183 | + id ('str',) 10 False |
| 184 | +``` |
| 185 | + |
144 | 186 | ## From/to Protobuf |
145 | 187 |
|
146 | 188 | Serializing to Protobuf Message is less frequently used, unless you are using Python Protobuf API. Nonetheless, you can use {meth}`~docarray.array.mixins.io.binary.BinaryIOMixin.from_protobuf` and {meth}`~docarray.array.mixins.io.binary.BinaryIOMixin.to_protobuf` to get a Protobuf Message object in Python. |
|
0 commit comments