Update harlequin-databricks docs with latest (#97)

connection methods and recommendations
tconbeer · Sep 3, 2024 · 66af2fd · 66af2fd
1 parent 9210923
commit 66af2fd
Showing 1 changed file with 64 additions and 23 deletions.
diff --git a/src/docs/databricks/index.md b/src/docs/databricks/index.md
@@ -53,51 +53,92 @@ pipx install harlequin[databricks]
 
 ## Usage and Configuration
 
-For a minimum connection you are going to need:
+To connect to Databricks you are going to need to provide as CLI arguments:
 
 - server-hostname
 - http-path
-- access-token
+- credentials for one of the following authentication methods:
+  - a personal access token (PAT)
+  - a username and password
+  - an OAuth U2M type
+  - a service principle client ID and secret for OAuth M2M
+
+
+### Personal Access Token (PAT) authentication:
 
 ```bash
-harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --access-token dabpi***
+harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***
 ```
 
-Authentication is also possible using a username and password (known as basic authentication):
+### Username and password (basic) authentication:
 
 ```bash
-harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --username my_user --password my_pass
+harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --username *** --password ***
 ```
 
-Or by using [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/en/dev-tools/python-sql-connector.html#auth-u2m):
+### OAuth U2M authentication:
+
+For [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/en/dev-tools/python-sql-connector.html#auth-u2m)
+supply either `databricks-oauth` or `azure-oauth` to the `--auth-type` CLI argument:
 
 ```bash
-harlequin -a databricks --server-hostname my_databricks.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --auth-type databricks-oauth
+harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --auth-type databricks-oauth
 ```
 
-For more details on command line options, run:
+### OAuth M2M authentication:
+
+For [OAuth machine-to-machine (M2M) authentication](https://docs.databricks.com/en/dev-tools/python-sql-connector.html#oauth-machine-to-machine-m2m-authentication)
+you need to `pip install databricks-sdk` as an additional dependency
+([databricks-sdk](https://github.com/databricks/databricks-sdk-py) is an optional dependency of
+`harlequin-databricks`) and supply `--client-id` and `--client-secret` CLI arguments:
 
 ```bash
-harlequin --help
+harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --client-id *** --client-secret ***
 ```
 
-## Using Unity Catalog and experiencing slow legacy `hive_metastore` indexing?
+## Store an alias for your connection string
+
+We recommend you include an alias for your connection string in your `.bash_profile`/`.zprofile` so
+you can launch harlequin-databricks with a short command like `hdb` each time.
+
+Run this command (once) to create the alias:
+
+```bash
+echo 'alias hdb="harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/1234567890abcdef --access-token dabpi***"' >> .bash_profile    
+```
 
-Indexing legacy metastores is slow on Databricks because it requires a SQL call for every table in
-the legacy metastore to extract column metadata. This means refreshing Harlequin's Data Catalog
-pane takes a long time for Databricks instances with lots of tables in legacy metastores like
-`hive_metastore`.
+## Using Unity Catalog and want fast Data Catalog indexing?
 
-If your Databricks instance runs Unity Catalog, and you only want the Unity Catalog assets
-listed in the Data Catalog pane, supply the `--skip-legacy-indexing` CLI flag when loading
-Harlequin.
+Supply the `--skip-legacy-indexing` command line flag if you do not care about legacy metastores
+(e.g. `hive_metastore`) being indexed in Harlequin's Data Catalog pane.
 
-This flag means only Unity Catalogs will be indexed - legacy metastores will not appear.
+This flag will skip indexing of old non-Unity Catalog metastores (i.e. they won't appear in the
+Data Catalog pane with this flag).
+
+Because of the way legacy Databricks metastores works, a separate SQL query is required to fetch
+the metadata of each table in a legacy metastore. This means indexing them for Harlequin's Data Catalog pane is slow.
+
+Databricks's Unity Catalog upgrade brought
+[Information Schema](https://docs.databricks.com/en/sql/language-manual/sql-ref-information-schema.html),
+which allows harlequin-databricks to fetch metadata for all Unity Catalog assets with only two SQL queries.
+
+So if your Databricks instance is running Unity Catalog, and you no longer care about the legacy
+metastores, setting the `--skip-legacy-indexing` CLI flag is recommended as it will mean
+much faster indexing & refreshing of the assets in the Data Catalog pane.
+
+## Other CLI options:
+
+For more details on command line options, run:
+
+```bash
+harlequin --help
+```
 
-Indexing Unity Catalogs is a super-fast operation requiring Harlequin to send only two SQL queries
-to Databricks because of
-[Information Schema](https://docs.databricks.com/en/sql/language-manual/sql-ref-information-schema.html).
+## Issues, Contributions and Feature Requests
 
-## Issues and Contributing
+Please report bugs/issues with the harlequin-databricks adapter via its GitHub
+[issues](https://github.com/alexmalins/harlequin-databricks/issues) page. You are welcome to
+attempt fixes yourself by forking that repo then opening a [PR](https://github.com/alexmalins/harlequin-databricks/pulls).
 
-Head over to the [alexmalins/harlequin-databricks](https://github.com/alexmalins/harlequin-databricks/) repo on GitHub.
+For feature suggestions, please post in the harlequin-databricks repo
+[discussions](https://github.com/alexmalins/harlequin-databricks/discussions).