-
Notifications
You must be signed in to change notification settings - Fork 240
Support for ODBC driver connection type #116
Conversation
jtcohen6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working for me locally, which is very exciting. Tiny comment for now around the nomenclature of the new connection endpoint.
dbt/adapters/spark/connections.py
Outdated
| class SparkClusterType(StrEnum): | ||
| ALL_PURPOSE = "all-purpose" | ||
| VIRTUAL = "virtual" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Databricks isn't using the name "virtual clusters" anymore, I believe they're just calling them "endpoints."
Rather than a combination of cluster and cluster_type, I think we should make the distinction between cluster (old style, all-purpose/interactive) and endpoint (new style). Users should specify either a cluster or an endpoint when connecting via odbc.
47381a5 to
9366d2e
Compare
1e84db3 to
cfd804b
Compare
cfd804b to
a18be08
Compare
bb62760 to
165d83b
Compare
| f"{self.method} connection method requires " | ||
| "additional dependencies. \n" | ||
| "Install the additional required dependencies with " | ||
| "`pip install dbt-spark[ODBC]`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I landed on pip install dbt-spark[ODBC] because pyodbc is the only python dep that I imagine will require OS dependencies. Also, I think this should line up with connection methods (thrift, http, odbc), rather than connection locations (Databricks, etc..)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, I think we might want to try moving PyHive[hive] as an extra instead of principal requirement, since it's only necessary for the http connection method.
Justification: we see some installation errors (e.g. #114) resulting from less-maintained dependencies exclusive to PyHive
Not something we need to do right now!
jtcohen6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for the excellent, wide-reaching work to get this running with automated tests.
| f"{self.method} connection method requires " | ||
| "additional dependencies. \n" | ||
| "Install the additional required dependencies with " | ||
| "`pip install dbt-spark[ODBC]`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy it!
| f"{self.method} connection method requires " | ||
| "additional dependencies. \n" | ||
| "Install the additional required dependencies with " | ||
| "`pip install dbt-spark[ODBC]`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, I think we might want to try moving PyHive[hive] as an extra instead of principal requirement, since it's only necessary for the http connection method.
Justification: we see some installation errors (e.g. #114) resulting from less-maintained dependencies exclusive to PyHive
Not something we need to do right now!
jtcohen6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few tiny notes after rechecking readme
|
FYI removing myself from review, but this looks |
Co-authored-by: Jeremy Cohen <[email protected]>
Resolves #104
Description
Add support for ODBC driver connections with Databricks via cluster specific path and sql endpoint path.
pyodbcfor using ODBC driver for connectiondriverandendpointto profile config (clusterandendpointare mutually exclusive, they determine how to connect to Databricks)Note
At the time of writing this, the new SQL Endpoint does not support
create temporary view. Also,extendedis prohibited by the ODBC driver for some operations dbt-labs/dbt-adapter-tests#8