-
Notifications
You must be signed in to change notification settings - Fork 235
Description
Release Note
This release contains 3 new features, 2 bug fixes and 1 documentation improvement.
🆕 Features
More serialization options for DocVec (#1562)
DocVec now has the same serialization interface as DocList. This means that that following methods are available for it:
to_protobuf()/from_protobuf()to_base64()/from_base64()save_binary()/load_binary()to_bytes()/from_bytes()to_dataframe()/from_dataframe()
For example, you can now perform Base64 (de)serialization like this:
from docarray import BaseDoc, DocVec
class SimpleDoc(BaseDoc):
text: str
dv = DocVec[SimpleDoc]([SimpleDoc(text=f'doc {i}') for i in range(2)])
base64_repr_dv = dv.to_base64(compress=None, protocol='pickle')
dl_from_base64 = DocVec[SimpleDoc].from_base64(
base64_repr_dv, compress=None, protocol='pickle'
)For further guidance, check out the documentation section on serialization TODO add link once docs are released.
Validate file formats in URL (#1606) (#1669)
Validate the file formats given in URL types such as AudioURL, TextURL, ImageURL to check they correspond to the expected mime type.
Add methods to create BaseDoc from schema (#1667)
Sometimes it can be useful to dynamically create a BaseDoc from a given schema of an original BaseDoc. Using the methods create_pure_python_type_model and create_base_doc_from_schema you can make sure to reconstruct the BaseDoc.
from docarray.utils.create_dynamic_doc_class import (
create_base_doc_from_schema,
create_pure_python_type_model,
)
from typing import Optional
from docarray import BaseDoc, DocList
from docarray.typing import AnyTensor
from docarray.documents import TextDoc
class MyDoc(BaseDoc):
tensor: Optional[AnyTensor]
texts: DocList[TextDoc]
MyDocPurePython = create_pure_python_type_model(MyDoc) # Due to limitation of DocList as Pydantic List, we need to have the MyDoc `DocList` converted to `List`.
NewMyDoc = create_base_doc_from_schema(
MyDocPurePython.schema(), 'MyDoc', {}
)
new_doc = NewMyDoc(tensor=None, texts=[TextDoc(text='text')])🐞 Bug Fixes
Better error message when DocVec is unusable (#1675)
After calling doc_list = doc_vec.to_doc_list(), doc_vec ends up in an unusable state since its data has been transferred to doc_list. This fix gives users a more informative error message when they try to interact with doc_vec after it has been made unusable.
Cap Pydantic version (#1682)
Due to the breaking change in Pydantic v2, we have capped the version to avoid problems when installing docarray.
📗 Documentation Improvements
- Fix a reference in README (docs: fix a reference in readme #1674)
🤟 Contributors
We would like to thank all contributors to this release:
- Saba Sturua (@jupyterjazz )
- Joan Fontanals (@JoanFM )
- Han Xiao (@hanxiao )
- Johannes Messner (@JohannesMessner )