-
Notifications
You must be signed in to change notification settings - Fork 235
feat(v2): add audio url and predefined document #940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
bebc9d4
feat: add audio url class
6025c2f
fix: typos
9a599e5
test: add tests for audio and audio url
04abdae
feat: add audio url and audio predefined class
f8d700d
Merge remote-tracking branch 'origin/feat-rewrite-v2' into feat-add-a…
d58f804
chore: add types-request
bdf8e88
feat: add audio tensors torch and ndarray
6572df8
fix: mypy type hints
9cd4baa
test: empty test file
b3c1948
test: add more unit and integration tests
7774181
fix: update audio tensors and audio url
af840d4
fix: remove print statements
797f488
docs: add documentation
8b48a77
refactor: rename test audio py to test audio tensor py
e135438
fix: typo in torch tensor py
14fcf6b
feat: add proto stuff to audio tensors
c623a13
test: add tests for proto and set tensors
1be8e3f
fix: set tensor to tensor int, since no inplace change
17786eb
refactor: rename to save to wav file
97355f7
docs: fix typo
20e2344
docs: fix docs for save tensor to wav file
7fc06e1
Merge branch 'feat-rewrite-v2' into feat-add-audio-v2
b34d783
fix: apply suggestions from code review
130d8ab
fix: apply suggestions from code review
2954351
test: fix assertions
61cb103
fix: move max int multiplication to abstract class
5943c0f
feat: add ndim method to abstract tensor class and concrete classes
131c5ff
fix: ndim
83ef649
fix: revert ndim in abstract tensor and torch tensor and ndarray
eecca41
fix: mypy checks
4762c3c
docs: add docstring to n dim
6948122
refactor: move n dim to abstract tensor and subclasses
d174087
refactor: make to protobuf abstract, change node to protobuf signature
3a52303
fix: remove not needed methods
a0be12e
fix: change remote audio file to file from github
9623d29
fix: raw content from remote file
6efdcf2
fix: path to github remote file
5026543
refactor: tensor field name to proto field name
703de43
test: remove redundant test in test audio tensor
83ece31
fix: load audio url to audio ndarray instead of np ndarray
de079e2
refactor: move n dim to computational backend
2ef1350
docs: update docstrings for audio tensors
d51d38e
feat: make dtype in audiourl load optional
3901cfa
Merge branch 'feat-rewrite-v2' into feat-add-audio-v2
a571898
test: fix document refactor and ndarray import
71af630
fix: fix mypy check
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,7 @@ | ||
| from docarray.predefined_document.audio import Audio | ||
| from docarray.predefined_document.image import Image | ||
| from docarray.predefined_document.mesh import Mesh3D | ||
| from docarray.predefined_document.point_cloud import PointCloud3D | ||
| from docarray.predefined_document.text import Text | ||
|
|
||
| __all__ = ['Text', 'Image', 'Mesh3D', 'PointCloud3D'] | ||
| __all__ = ['Text', 'Image', 'Audio', 'Mesh3D', 'PointCloud3D'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| from typing import Optional, TypeVar | ||
|
|
||
| from docarray.document import BaseDocument | ||
| from docarray.typing import AudioUrl, Embedding | ||
| from docarray.typing.tensor.audio.audio_tensor import AudioTensor | ||
|
|
||
| T = TypeVar('T', bound='Audio') | ||
|
|
||
|
|
||
| class Audio(BaseDocument): | ||
| """ | ||
| Document for handling audios. | ||
|
|
||
| The Audio Document can contain an AudioUrl (`Audio.url`), an AudioTensor | ||
| (`Audio.tensor`), and an Embedding (`Audio.embedding`). | ||
|
|
||
| EXAMPLE USAGE: | ||
|
|
||
| You can use this Document directly: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from docarray import Audio | ||
|
|
||
| # use it directly | ||
| audio = Audio( | ||
| url='https://github.com/docarray/docarray/tree/feat-add-audio-v2/tests/toydata/hello.wav?raw=true' | ||
| ) | ||
| audio.tensor = audio.url.load() | ||
| model = MyEmbeddingModel() | ||
| audio.embedding = model(audio.tensor) | ||
|
|
||
| You can extend this Document: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from docarray import Audio, Text | ||
| from typing import Optional | ||
|
|
||
| # extend it | ||
| class MyAudio(Audio): | ||
| name: Optional[Text] | ||
|
|
||
|
|
||
| audio = MyAudio( | ||
| url='https://github.com/docarray/docarray/tree/feat-add-audio-v2/tests/toydata/hello.wav?raw=true' | ||
| ) | ||
| audio.tensor = audio.url.load() | ||
| model = MyEmbeddingModel() | ||
| audio.embedding = model(audio.tensor) | ||
| audio.name = 'my first audio' | ||
|
|
||
|
|
||
| You can use this Document for composition: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from docarray import Document, Audio, Text | ||
|
|
||
| # compose it | ||
| class MultiModalDoc(Document): | ||
| audio: Audio | ||
| text: Text | ||
|
|
||
|
|
||
| mmdoc = MultiModalDoc( | ||
| audio=Audio( | ||
| url='https://github.com/docarray/docarray/tree/feat-add-audio-v2/tests/toydata/hello.wav?raw=true' | ||
| ), | ||
| text=Text(text='hello world, how are you doing?'), | ||
| ) | ||
| mmdoc.audio.tensor = mmdoc.audio.url.load() | ||
| """ | ||
|
|
||
| url: Optional[AudioUrl] | ||
| tensor: Optional[AudioTensor] | ||
| embedding: Optional[Embedding] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| from docarray.typing.tensor.audio.audio_ndarray import AudioNdArray | ||
|
|
||
| __all__ = ['AudioNdArray'] | ||
|
|
||
| try: | ||
| import torch # noqa: F401 | ||
| except ImportError: | ||
| pass | ||
| else: | ||
| from docarray.typing.tensor.audio.audio_torch_tensor import AudioTorchTensor # noqa | ||
|
|
||
| __all__.extend(['AudioTorchTensor']) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| import wave | ||
| from abc import ABC, abstractmethod | ||
| from typing import BinaryIO, TypeVar, Union | ||
|
|
||
| from docarray.typing.tensor.abstract_tensor import AbstractTensor | ||
|
|
||
| T = TypeVar('T', bound='AbstractAudioTensor') | ||
|
|
||
|
|
||
| class AbstractAudioTensor(AbstractTensor, ABC): | ||
| @abstractmethod | ||
| def to_audio_bytes(self): | ||
| """ | ||
| Convert audio tensor to bytes. | ||
| """ | ||
| ... | ||
|
|
||
| def save_to_wav_file( | ||
| self: 'T', | ||
| file_path: Union[str, BinaryIO], | ||
| sample_rate: int = 44100, | ||
| sample_width: int = 2, | ||
| ) -> None: | ||
| """ | ||
| Save audio tensor to a .wav file. Mono/stereo is preserved. | ||
|
|
||
| :param file_path: path to a .wav file. If file is a string, open the file by | ||
| that name, otherwise treat it as a file-like object. | ||
| :param sample_rate: sampling frequency | ||
| :param sample_width: sample width in bytes | ||
| """ | ||
| comp_backend = self.get_comp_backend() | ||
| n_channels = 2 if comp_backend.n_dim(array=self) > 1 else 1 # type: ignore | ||
|
|
||
| with wave.open(file_path, 'w') as f: | ||
| f.setnchannels(n_channels) | ||
| f.setsampwidth(sample_width) | ||
| f.setframerate(sample_rate) | ||
| f.writeframes(self.to_audio_bytes()) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| from typing import TypeVar | ||
|
|
||
| from docarray.typing.tensor.audio.abstract_audio_tensor import AbstractAudioTensor | ||
| from docarray.typing.tensor.ndarray import NdArray | ||
|
|
||
| MAX_INT_16 = 2**15 | ||
|
|
||
| T = TypeVar('T', bound='AudioNdArray') | ||
|
|
||
|
|
||
| class AudioNdArray(AbstractAudioTensor, NdArray): | ||
| """ | ||
| Subclass of NdArray, to represent an audio tensor. | ||
| Adds audio-specific features to the tensor. | ||
|
|
||
|
|
||
| EXAMPLE USAGE | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from typing import Optional | ||
|
|
||
| from pydantic import parse_obj_as | ||
|
|
||
| from docarray import Document | ||
| from docarray.typing import AudioNdArray, AudioUrl | ||
| import numpy as np | ||
|
|
||
|
|
||
| class MyAudioDoc(Document): | ||
| title: str | ||
| audio_tensor: Optional[AudioNdArray] | ||
| url: Optional[AudioUrl] | ||
|
|
||
|
|
||
| # from tensor | ||
| doc_1 = MyAudioDoc( | ||
| title='my_first_audio_doc', | ||
| audio_tensor=np.random.rand(1000, 2), | ||
| ) | ||
|
|
||
| doc_1.audio_tensor.save_to_wav_file(file_path='path/to/file_1.wav') | ||
|
|
||
| # from url | ||
| doc_2 = MyAudioDoc( | ||
| title='my_second_audio_doc', | ||
| url='https://www.kozco.com/tech/piano2.wav', | ||
| ) | ||
|
|
||
| doc_2.audio_tensor = parse_obj_as(AudioNdArray, doc_2.url.load()) | ||
| doc_2.audio_tensor.save_to_wav_file(file_path='path/to/file_2.wav') | ||
|
|
||
| """ | ||
|
|
||
| _PROTO_FIELD_NAME = 'audio_ndarray' | ||
|
|
||
| def to_audio_bytes(self): | ||
| tensor = (self * MAX_INT_16).astype('<h') | ||
| return tensor.tobytes() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| from typing import Union | ||
|
|
||
| from docarray.typing.tensor.audio.audio_ndarray import AudioNdArray | ||
|
|
||
| try: | ||
| import torch # noqa: F401 | ||
| except ImportError: | ||
| AudioTensor = AudioNdArray | ||
|
|
||
| else: | ||
| from docarray.typing.tensor.audio.audio_torch_tensor import AudioTorchTensor | ||
|
|
||
| AudioTensor = Union[AudioNdArray, AudioTorchTensor] # type: ignore | ||
samsja marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.