[CT-202] Workaround for some limitations due to `list_relations_without_caching` method

### Describe the feature
I am currently facing an issue using DBT with Spark on AWS/Glue/EMR environment as discussed already in https://github.com/dbt-labs/dbt-spark/issues/215 (but already raised here https://github.com/dbt-labs/dbt-spark/issues/93).

The current issue is about the adapter's method `list_relations_without_caching`:

https://github.com/dbt-labs/dbt/blob/HEAD/core/dbt/include/global_project/macros/adapters/common.sql#L240

which in the Spark Adapter implementation is:

https://github.com/dbt-labs/dbt-spark/blob/a8a85c54d10920af1c5efcbb4d2a51eb7cfcad11/dbt/include/spark/macros/adapters.sql#L133-L139

In this case you can see that the command `show table extended in {{ relation }} like '*'` is executed. It will force Spark to go through all the tables info in the schema (as Spark has not Information Schema Layer) in a sort of "discovery mode" and this approach produces two main issues:

1) Bad performance: some environments can have hundreds or even thousands of tables generated not only by DBT but also by other processes in the same schema. In that case this operation can be very costly, especially when you have different DBT processes that run some updates at different times on a few tables.

2) Instability, as I verified in AWS/Glue/EMR environment, where you can have views without "S3 Location" defined, like an Athena/Presto view, that will make crash a DBT process running SparkSql on EMR with errors like:

```
show table extended in my_schema like '*'
  ' with 62bb6394-b13a-4b79-92dd-2e0918831cf3
21/09/18 13:00:22 INFO SparkExecuteStatementOperation: Running query with 62bb6394-b13a-4b79-92dd-2e0918831cf3
21/09/18 13:01:03 INFO DAGScheduler: Asked to cancel job group 62bb6394-b13a-4b79-92dd-2e0918831cf3
21/09/18 13:01:03 ERROR SparkExecuteStatementOperation: Error executing query with 62bb6394-b13a-4b79-92dd-2e0918831cf3, currentState RUNNING,
org.apache.spark.sql.AnalysisException: java.lang.IllegalArgumentException: Can not create a Path from an empty string
```

### Describe alternatives you've considered
I do not see the reason why DBT process should care of the "rest of the world" like the Athena views from before or tables created from other processes that are in the same schema.

So I can think ideally to replace the method:

`show table extended in <schema> like '*'`

with something like:

`show table extended in <schema> like ('<table1>|<table2>|&hellip;')`

where my `<table1>`, `<table2>`, etc. are determined automatically when I run a command like

`dbt run --models my_folder`

where `my_folder` contains the files: `table1.sql`, `table2.sql`, etc

but from the current method interface, only the schema params can be passed.

Two questions here:
How can I infer automatically the name of the tables involved when a command like `dbt run --models my_folder` run and how can I pass them eventually to the `list_relations_without_caching`?

### Additional context
I found it relevant for Spark on AWS environment but can be potentially a similar issue for other implementations.

### Who will this benefit?
On DBT's slack channel I talked to another used "affected" by similar issue, but probably whoever is going to use Spark in distributed environment can be affected by this (AWS and non).

### Are you interested in contributing this feature?
Sure, both coding and testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-202] Workaround for some limitations due to `list_relations_without_caching` method #513

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CT-202] Workaround for some limitations due to list_relations_without_caching method #513

Description

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[CT-202] Workaround for some limitations due to `list_relations_without_caching` method #513