Description
Describe the bug
quick_are_queries_identical
macros is not working as expected when there is a empty query result/table is involved.
Steps to reproduce
Scenario 1:
when both the query_a and query_b returns no result(empty output) then the quick_are_queries_identical returns false but ideally it should return true since both the query returns empty result.
Expected results True
Actual results False
{% set old_query %}
select
*
from `bigquery-public-data.faa.us_airports` LIMIT 0
{% endset %}
{% set new_query %}
select
*
from `bigquery-public-data.faa.us_airports` LIMIT 0
{% endset %}
{{
audit_helper.quick_are_queries_identical(
old_query,
new_query,
columns=['object_id','global_id','faa_identifier','name','latitude','longitude','airport_geom','elevation','icao_id','airport_type','service_city','state_abbreviation','country','oper_status','airport_use','iap_exists','dod_hiflip','far_91','far_93','mil_code','airspace_analysis','us_high','us_low','ak_high','ak_low','us_area','pacific']
)
}}
Screenshots and log output
Scenario 2:
when one of the query_a and query_b returns no result(empty output) then the quick_are_queries_identical returns true but ideally it should return false since one of the query returns value and other return empty result.
Expected results False
Actual results True
{% set old_query %}
select
*
from `bigquery-public-data.faa.us_airports`
{% endset %}
{% set new_query %}
select
*
from `bigquery-public-data.faa.us_airports` LIMIT 0
{% endset %}
{{
audit_helper.quick_are_queries_identical(
old_query,
new_query,
columns=['object_id','global_id','faa_identifier','name','latitude','longitude','airport_geom','elevation','icao_id','airport_type','service_city','state_abbreviation','country','oper_status','airport_use','iap_exists','dod_hiflip','far_91','far_93','mil_code','airspace_analysis','us_high','us_low','ak_high','ak_low','us_area','pacific']
)
}}
Screenshots and log output
System information
The contents of your packages.yml
file:
packages:
- package: dbt-labs/dbt_external_tables
version: [">=0.8.0", "<0.9.0"]
- package: dbt-labs/dbt_utils
version: [">=0.9.0", "<1.0.0"]
- package: dbt-labs/codegen
version: [">=0.8.0", "<=0.9.0"]
- package: dbt-labs/audit_helper
version: [">=0.6.0", "<0.12.1"]
Which database are you using dbt with?
- postgres
- redshift
- bigquery
- snowflake
- other (specify: ____________)
The output of dbt --version
:
latest
Additional context
I believe the reason this is happening is because bit_xor of no result will be NULL and when we do count(distinct hash_result) the NULL value won't be taken into account.
select count(distinct hash_result) = 1 as are_tables_identical
from (
select bit_xor(farm_fingerprint(to_json_string(query_a))) as hash_result
from query_a
union all
select bit_xor(farm_fingerprint(to_json_string(query_b))) as hash_result
from query_b
) as hashes
to handle this we can add IFNULL() to the hash_result
Are you interested in contributing the fix?
Yes!
Activity