-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent file based catalog #122
Comments
Regarding the point on storing data - i wonder if the CacheManager could be extended / used to serialize the files / file_stats caches. |
That is an excellent idea -- it would be really sweet to have an excuse to work on that API (I bet it is not used anywhere near as much as it could be) |
I have subsequently learned about the However, one difference is that the config file basically applies to all sessions, where this catalog would be very explicitly selected by a user when running Thus it seem like the config file would be a great place to put credentials, for example, that you wanted to apply to all catalogs / sessions |
There are two different setups that I think are relevant to this feature:
For example, this is mine:
I think the setup for this could be improved (its still based on the old setup from a couple years ago). I think moving the file to One thing to note, I have been having trouble getting the custom catalog / schemas to work (created an issue for it. Maybe just a user error, but i havent had time to look too deeply into it. |
I agree that there is an importatnt distinction between "configuration" (with e.g. credentials) that potentially apply to all sessions and the DDL/catalog setup part I think some systems permit creating credentials that are stored as part of the catalog ( I'll have to mess with it to see what is happening |
This is my own personal aspirations / goals for a "file based catalog"
Usecase
Usecase 1: pre-configured
EXTERNAL TABLES
I would like to be able to setup some table definitions in dft and then reuse them from session to session
For example
CREATE EXTERNAL TABLE ... STORED AS DELTA TABLE WITH CREDENTIALS ....
And then have this configuration available to any dft session
I believe this usecase is already partly handled by the
config
file feature. However, there are some other things I would like:Usecase 1: ephemeral data
Today when you run queries like this in
dft
If you start another session of
dft
foo
is gone:The issue is that the default catalog in datafusion is an ephemeral file based one so there is no place to store data such as shown above.
Desired Behavior
What I would like is for dft to operate similarly to sqlite or duckdb.
By default, an ephemeral in memory catalog is used and nothing is saved after the session quits
However a database file can be "opened" and if so then all changes made to the catalog are stored in that file. If the file is reopened on a subsequent invocation of the program all the DDL / catalog information is still present
Something like
The text was updated successfully, but these errors were encountered: