Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pyarrow-backed DataFrame with read_csv #283

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gsheni
Copy link
Contributor

@gsheni gsheni commented Oct 10, 2023

  • Use PyArrow backed data types for at reading data since PyArrow can be significantly faster and more memory-efficient than NumPy
  • Starting with pandas 3.0 will change the default for strings to PyArrow strings (such as when calling read_csv).

@WillKoehrsen
Copy link
Collaborator

@kmax12 Is this still something we want to do? Should be relatively easy to make the changes.

@kmax12
Copy link
Collaborator

kmax12 commented Mar 11, 2024

i think all else equal, worth doing. not sure if it'll have a huge impact on us right now, so if it causes downstream problems, we can punt for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants