Skip to content

Commit da3b7f0

Browse files
author
Charlotte Gerhaher
authored
feat(v2): add audio url and predefined document (#940)
* feat: add audio url class Signed-off-by: anna-charlotte <[email protected]> * fix: typos Signed-off-by: anna-charlotte <[email protected]> * test: add tests for audio and audio url Signed-off-by: anna-charlotte <[email protected]> * feat: add audio url and audio predefined class Signed-off-by: anna-charlotte <[email protected]> * chore: add types-request Signed-off-by: anna-charlotte <[email protected]> * feat: add audio tensors torch and ndarray Signed-off-by: anna-charlotte <[email protected]> * fix: mypy type hints Signed-off-by: anna-charlotte <[email protected]> * test: empty test file Signed-off-by: anna-charlotte <[email protected]> * test: add more unit and integration tests Signed-off-by: anna-charlotte <[email protected]> * fix: update audio tensors and audio url Signed-off-by: anna-charlotte <[email protected]> * fix: remove print statements Signed-off-by: anna-charlotte <[email protected]> * docs: add documentation Signed-off-by: anna-charlotte <[email protected]> * refactor: rename test audio py to test audio tensor py Signed-off-by: anna-charlotte <[email protected]> * fix: typo in torch tensor py Signed-off-by: anna-charlotte <[email protected]> * feat: add proto stuff to audio tensors Signed-off-by: anna-charlotte <[email protected]> * test: add tests for proto and set tensors Signed-off-by: anna-charlotte <[email protected]> * fix: set tensor to tensor int, since no inplace change Signed-off-by: anna-charlotte <[email protected]> * refactor: rename to save to wav file Signed-off-by: anna-charlotte <[email protected]> * docs: fix typo Signed-off-by: anna-charlotte <[email protected]> * docs: fix docs for save tensor to wav file Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <[email protected]> * test: fix assertions Signed-off-by: anna-charlotte <[email protected]> * fix: move max int multiplication to abstract class Signed-off-by: anna-charlotte <[email protected]> * feat: add ndim method to abstract tensor class and concrete classes Signed-off-by: anna-charlotte <[email protected]> * fix: ndim Signed-off-by: anna-charlotte <[email protected]> * fix: revert ndim in abstract tensor and torch tensor and ndarray Signed-off-by: anna-charlotte <[email protected]> * fix: mypy checks Signed-off-by: anna-charlotte <[email protected]> * docs: add docstring to n dim Signed-off-by: anna-charlotte <[email protected]> * refactor: move n dim to abstract tensor and subclasses Signed-off-by: anna-charlotte <[email protected]> * refactor: make to protobuf abstract, change node to protobuf signature Signed-off-by: anna-charlotte <[email protected]> * fix: remove not needed methods Signed-off-by: anna-charlotte <[email protected]> * fix: change remote audio file to file from github Signed-off-by: anna-charlotte <[email protected]> * fix: raw content from remote file Signed-off-by: anna-charlotte <[email protected]> * fix: path to github remote file Signed-off-by: anna-charlotte <[email protected]> * refactor: tensor field name to proto field name Signed-off-by: anna-charlotte <[email protected]> * test: remove redundant test in test audio tensor Signed-off-by: anna-charlotte <[email protected]> * fix: load audio url to audio ndarray instead of np ndarray Signed-off-by: anna-charlotte <[email protected]> * refactor: move n dim to computational backend Signed-off-by: anna-charlotte <[email protected]> * docs: update docstrings for audio tensors Signed-off-by: anna-charlotte <[email protected]> * feat: make dtype in audiourl load optional Signed-off-by: anna-charlotte <[email protected]> * test: fix document refactor and ndarray import Signed-off-by: anna-charlotte <[email protected]> * fix: fix mypy check Signed-off-by: anna-charlotte <[email protected]> Signed-off-by: anna-charlotte <[email protected]>
1 parent f39e202 commit da3b7f0

25 files changed

+747
-27
lines changed

docarray/__init__.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
from docarray.array.array import DocumentArray
44
from docarray.document.document import BaseDocument
5-
from docarray.predefined_document import Image, Mesh3D, PointCloud3D, Text
5+
from docarray.predefined_document import Audio, Image, Mesh3D, PointCloud3D, Text
66

7-
__all__ = ['BaseDocument', 'DocumentArray', 'Image', 'Text', 'Mesh3D', 'PointCloud3D']
7+
__all__ = [
8+
'BaseDocument',
9+
'DocumentArray',
10+
'Image',
11+
'Audio',
12+
'Text',
13+
'Mesh3D',
14+
'PointCloud3D',
15+
]

docarray/computation/abstract_comp_backend.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@ def stack(
2626
"""
2727
...
2828

29+
@staticmethod
30+
@abstractmethod
31+
def n_dim(array: 'TTensor') -> int:
32+
...
33+
2934
class Retrieval(ABC, typing.Generic[TTensorRetrieval]):
3035
"""
3136
Abstract class for retrieval and ranking functionalities

docarray/computation/numpy_backend.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,10 @@ def stack(
4040
) -> 'np.ndarray':
4141
return np.stack(tensors, axis=dim)
4242

43+
@staticmethod
44+
def n_dim(array: 'np.ndarray') -> int:
45+
return array.ndim
46+
4347
class Retrieval(AbstractComputationalBackend.Retrieval[np.ndarray]):
4448
"""
4549
Abstract class for retrieval and ranking functionalities

docarray/computation/torch_backend.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ def stack(
3939
) -> 'torch.Tensor':
4040
return torch.stack(tensors, dim=dim)
4141

42+
@staticmethod
43+
def n_dim(array: 'torch.Tensor') -> int:
44+
return array.ndim
45+
4246
class Retrieval(AbstractComputationalBackend.Retrieval[torch.Tensor]):
4347
"""
4448
Abstract class for retrieval and ranking functionalities
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1+
from docarray.predefined_document.audio import Audio
12
from docarray.predefined_document.image import Image
23
from docarray.predefined_document.mesh import Mesh3D
34
from docarray.predefined_document.point_cloud import PointCloud3D
45
from docarray.predefined_document.text import Text
56

6-
__all__ = ['Text', 'Image', 'Mesh3D', 'PointCloud3D']
7+
__all__ = ['Text', 'Image', 'Audio', 'Mesh3D', 'PointCloud3D']
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from typing import Optional, TypeVar
2+
3+
from docarray.document import BaseDocument
4+
from docarray.typing import AudioUrl, Embedding
5+
from docarray.typing.tensor.audio.audio_tensor import AudioTensor
6+
7+
T = TypeVar('T', bound='Audio')
8+
9+
10+
class Audio(BaseDocument):
11+
"""
12+
Document for handling audios.
13+
14+
The Audio Document can contain an AudioUrl (`Audio.url`), an AudioTensor
15+
(`Audio.tensor`), and an Embedding (`Audio.embedding`).
16+
17+
EXAMPLE USAGE:
18+
19+
You can use this Document directly:
20+
21+
.. code-block:: python
22+
23+
from docarray import Audio
24+
25+
# use it directly
26+
audio = Audio(
27+
url='https://github.com/docarray/docarray/tree/feat-add-audio-v2/tests/toydata/hello.wav?raw=true'
28+
)
29+
audio.tensor = audio.url.load()
30+
model = MyEmbeddingModel()
31+
audio.embedding = model(audio.tensor)
32+
33+
You can extend this Document:
34+
35+
.. code-block:: python
36+
37+
from docarray import Audio, Text
38+
from typing import Optional
39+
40+
# extend it
41+
class MyAudio(Audio):
42+
name: Optional[Text]
43+
44+
45+
audio = MyAudio(
46+
url='https://github.com/docarray/docarray/tree/feat-add-audio-v2/tests/toydata/hello.wav?raw=true'
47+
)
48+
audio.tensor = audio.url.load()
49+
model = MyEmbeddingModel()
50+
audio.embedding = model(audio.tensor)
51+
audio.name = 'my first audio'
52+
53+
54+
You can use this Document for composition:
55+
56+
.. code-block:: python
57+
58+
from docarray import Document, Audio, Text
59+
60+
# compose it
61+
class MultiModalDoc(Document):
62+
audio: Audio
63+
text: Text
64+
65+
66+
mmdoc = MultiModalDoc(
67+
audio=Audio(
68+
url='https://github.com/docarray/docarray/tree/feat-add-audio-v2/tests/toydata/hello.wav?raw=true'
69+
),
70+
text=Text(text='hello world, how are you doing?'),
71+
)
72+
mmdoc.audio.tensor = mmdoc.audio.url.load()
73+
"""
74+
75+
url: Optional[AudioUrl]
76+
tensor: Optional[AudioTensor]
77+
embedding: Optional[Embedding]

docarray/proto/docarray.proto

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,11 @@ message NodeProto {
6363

6464
string point_cloud_url = 13;
6565

66+
string audio_url = 14;
67+
68+
NdArrayProto audio_ndarray = 15;
69+
70+
NdArrayProto audio_torch_tensor = 16;
6671

6772
}
6873

docarray/proto/pb2/docarray_pb2.py

Lines changed: 14 additions & 14 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docarray/typing/__init__.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,23 @@
11
from docarray.typing.id import ID
2+
from docarray.typing.tensor.audio import AudioNdArray
23
from docarray.typing.tensor.embedding.embedding import Embedding
34
from docarray.typing.tensor.ndarray import NdArray
45
from docarray.typing.tensor.tensor import AnyTensor
5-
from docarray.typing.url import AnyUrl, ImageUrl, Mesh3DUrl, PointCloud3DUrl, TextUrl
6+
from docarray.typing.url import (
7+
AnyUrl,
8+
AudioUrl,
9+
ImageUrl,
10+
Mesh3DUrl,
11+
PointCloud3DUrl,
12+
TextUrl,
13+
)
614

715
__all__ = [
16+
'AudioNdArray',
817
'NdArray',
918
'Embedding',
1019
'ImageUrl',
20+
'AudioUrl',
1121
'TextUrl',
1222
'Mesh3DUrl',
1323
'PointCloud3DUrl',
@@ -22,5 +32,6 @@
2232
pass
2333
else:
2434
from docarray.typing.tensor import TorchEmbedding, TorchTensor # noqa: F401
35+
from docarray.typing.tensor.audio.audio_torch_tensor import AudioTorchTensor # noqa
2536

26-
__all__.extend(['TorchEmbedding', 'TorchTensor'])
37+
__all__.extend(['AudioTorchTensor', 'TorchEmbedding', 'TorchTensor'])

docarray/typing/tensor/abstract_tensor.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
class AbstractTensor(AbstractType, Generic[ShapeT], ABC):
1717

1818
__parametrized_meta__ = type
19+
_PROTO_FIELD_NAME: str
1920

2021
@classmethod
2122
@abc.abstractmethod

0 commit comments

Comments
 (0)