-
Notifications
You must be signed in to change notification settings - Fork 235
feat: add map function #1187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add map function #1187
Conversation
0c3bdfd to
d72cbf5
Compare
93be562 to
d71de73
Compare
|
Is |
docarray/utils/apply.py
Outdated
| func: Callable[[BaseDocument], BaseDocument], | ||
| num_worker: Optional[int] = None, | ||
| pool: Optional['Pool'] = None, | ||
| show_progress: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a backend options to switch between multi-processing and multi-threading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood we only wanted to keep multiprocessing, not multithreading @samsja
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoanFM maybe know more. But to me only multi processing makes sense. Can multi threading really improve performance here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the keep the PR discussion up to date with what we discussed on Discord: there will be multi threading since it makes sense for IO bound ops and tf/np/torch stuff
Map is already included, but private right now. Not sure if we want to expose it again, or only one of the two. |
3a943a6 to
6c8176a
Compare
92d7a46 to
4c27409
Compare
9263bdb to
14e3cc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! We will make the map functions public, right?
c7bef8f to
a84b56d
Compare
Why should we keep only one @samsja? I think both make sense, the user might already have an in-place or pure function that they want to use without rewriting |
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
6a8b753 to
4a3a290
Compare
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
|
📝 Docs are deployed on https://ft-feat-map-apply--jina-docs.netlify.app 🎉 |
Goals:
Add
map_docsfunction andmap_batch:This will be different from doing
leverage multiprocessing by benchmarking in test, check that using 2 CPUS is faster then using 1
map_docs()
map_docs_batch()
benchmarking tests
check and update documentation, if required. See guide