You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using dbt_utils.union_relations to union two tables that have the same column names, but not same data types. So the macro compiles the code by doing a SELECT cast(col1 as {data_type}) as col1.... for each table it has in the UNION on the compiled code. However, I am trying to union two models A and B, both have already materialized as tables. And Both have a column called keyword_id as numeric but of different size. Table A has it defined as numeric(1,0) while table B as numeric(18,0) and it's fine because at table A the values are single digits while at B they are very large.
The issue is, when I do dbt_utils.union_relations(relations=[ref('A'),ref('B')] it fails, while if I just invert the order and do dbt_utils.union_relations(relations=[ref('B'),ref('A')] it works
I checked the compiled code to see why. And its because when A comes before B, it guesses the data type of col keyword_id as number(1,0) (in other words, it took its decision based on table A apparently). While in the scenario where we have table B and then A it guesses the data type as number(18,0).
In the compiled code of scenario 1 (A and then B) it does
SELECT ...., cast(keyword_id as number(1,0)) as keyword_id,...<more_columns> FROM A
UNION
SELECT ...., cast(keyword_id as number(1,0)) as keyword_id,...<more_columns> FROM B
While in scenario 2 (B then A):
SELECT ...., cast(keyword_id as number(18,0)) as keyword_id,...<more_columns> FROM B
UNION
SELECT ...., cast(keyword_id as number(18,0)) as keyword_id,...<more_columns> FROM A
So the order in which we add the relations in the macro has a an impact on how it guesses the data types. And in a certain combination of order of tables and their data types it might choose data types that will error out.
Steps to reproduce
Have two models which have the same column names, they can be identical, just in one model add a column keyword_id as always a single digit number; and in the other model make it a lot of digits (18 for example). Both models need to materialize as tables (in my case it was in snowflake) and have to be built and populated. Then create a third model and use the union macro to see the compile code in the scenario where you put the order of relations as model1 and then model2, vs the reverse scenario. Under the compile code check the way it casts the keyword_id column and how it changes from one scenario to another
Expected results
Ideally, the order on how we pass the references to the macro should not impact in the final result on how it guesses the data type. Or in another possible solution, the union macro should accept some data types parameter or some parameter to define a strategy on how to guess data types
Describe the bug
I am using
dbt_utils.union_relations
to union two tables that have the same column names, but not same data types. So the macro compiles the code by doing aSELECT cast(col1 as {data_type}) as col1....
for each table it has in the UNION on the compiled code. However, I am trying to union two modelsA
andB
, both have already materialized as tables. And Both have a column calledkeyword_id
as numeric but of different size. TableA
has it defined as numeric(1,0) while tableB
as numeric(18,0) and it's fine because at table A the values are single digits while at B they are very large.The issue is, when I do
dbt_utils.union_relations(relations=[ref('A'),ref('B')]
it fails, while if I just invert the order and dodbt_utils.union_relations(relations=[ref('B'),ref('A')]
it worksI checked the compiled code to see why. And its because when A comes before B, it guesses the data type of col
keyword_id
as number(1,0) (in other words, it took its decision based on table A apparently). While in the scenario where we have table B and then A it guesses the data type as number(18,0).In the compiled code of scenario 1 (A and then B) it does
While in scenario 2 (B then A):
So the order in which we add the relations in the macro has a an impact on how it guesses the data types. And in a certain combination of order of tables and their data types it might choose data types that will error out.
Steps to reproduce
Have two models which have the same column names, they can be identical, just in one model add a column
keyword_id
as always a single digit number; and in the other model make it a lot of digits (18 for example). Both models need to materialize as tables (in my case it was in snowflake) and have to be built and populated. Then create a third model and use the union macro to see the compile code in the scenario where you put the order of relations as model1 and then model2, vs the reverse scenario. Under the compile code check the way it casts thekeyword_id
column and how it changes from one scenario to anotherExpected results
Ideally, the order on how we pass the references to the macro should not impact in the final result on how it guesses the data type. Or in another possible solution, the union macro should accept some data types parameter or some parameter to define a strategy on how to guess data types
System information
The contents of your
packages.yml
file:packages:
version: 1.3.0
version: [">=0.10.0", "<0.11.0"]
Which database are you using dbt with?
The output of
dbt --version
:Are you interested in contributing the fix?
I would like to contribute if I had some guidance, but I have never navigated the repo before and I am not an expert on the source code
The text was updated successfully, but these errors were encountered: