-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add unit argument and apply conversion for kms #340
feat: add unit argument and apply conversion for kms #340
Conversation
ee1807a
to
250eb00
Compare
🙌 Thanks @bastienboutonnet! This repo contains another project which imports dbt-utils and runs integration tests against seed files. Check out https://github.com/fishtown-analytics/dbt-utils/pull/334/files for an example of configuring tests |
Thanks for the pointers @joellabes I'll see about writing a test for this function at some point this week |
eaa7549
to
cf3c822
Compare
@joellabes I added some tests and they seem to pass locally and on circle CI except for snowflake and BQ. I'm assuming because I'm a 3rd party these tests are bound to fail? Either way, let me know if that's the issue of if there is anything I should do (or not do) in the code. Regarding the changelog todo on the PR template my question still stands let me know what I should do when we get closer to merge I guess? |
I don't think that's true - I've had all the adapters succeed on past PRs. My best guess is that BQ and SF do something differently - perhaps their column type inference isn't playing nicely with the seed, or the In 0.7.0 is the next significant (intentionally avoiding semver words) version of dbt-utils, so I expect that this sort of net-new functionality would go there as opposed to a patch version. Again a good q for Claire |
@joellabes with the union part of your message I'm not sure I understand. I think I understand you'd like a test where we call the macro without the unit argument which makes tons of sense. But I don't know why union. Did you have something in mind? Btw yeah you're right the Snowflake issue seems to be with the fact that it doesn't like |
Broadly, I was thinking something like
(written on my phone so only very broadly accurate) By doing that, you’d get 0, 1 or 2 failing rows depending on which part broke. It’s probably bad practice to overload the test, but I was trying to avoid having to make a third separate query 😬 not sure if there’s a SF-friendly version of the macro sorry! Weird that it got this far without breaking. Maybe it’s never worked but no one uses this on BQ or SF |
@joellabes Ok I see what you meant. I can get that done in that way I think it's not that bad actually coming to think about it. So yeah I checked into my team's macro for haversine and we have used the But YES! very strange people didn't complain so far, or maybe they have been like us and have used their own macros for haversine. Anyway, would be a great idea to fix that for everyone right? UPDATE: I checked the postgres documentation and it looks like they also have a UPDATE 2 actually BQ fails on the |
@joellabes I just implemented the unioned test and it seems to work fine on pg and redshift. What do you think should happen qua the other dbs? Do we want to make adapters part of this PR? I think if so, maybe we shouldn't as it's gonna be a pretty haphazard PR but I'd be happy to contribute to a snowflake adapter if I can figure out how the adapters macros are done here. It's been a while since my materialisation macros that I contributed in core. |
I think I may just be good with creating a new macro and name it For BQ I was looking and they don't seem to have Let me know what makes sense. |
Sorry for the delay getting back to you!
Yes I think that's right! Re BigQuery, if it was already broken and you don't use BQ, I don't think you need to feel obliged to bend over backwards to make it work. You could either add an implementation that no-ops and returns null, or throw the equivalent of a NotImplementedException, or do nothing at all and it will throw a more confusing error. When a BQ user comes along who wants this, they can always open another PR. I think there are other macros that do this but I don't remember which offhand. |
no worries at all @joellabes Ok I'll have a go at a snowflake implementation for sure because I can actually test it out. For BQ I think what you propose makes sense, however not sure why you're pointing me to the Microsoft website here:
I'd actually like to be able to throw a proper error if BQ users were to use this macro rather than a null. How best do you think we can do that? |
In a past life I was a C# developer, so that was the concept I knew it as - I'm not sure what the dbtonic way of doing this is. The closest equivalent I could find was https://github.com/fishtown-analytics/dbt-utils/blob/c64b52068070023bc3524525e06a14ad72d485a4/macros/cross_db_utils/datediff.sql#L55. I think you'd basically have to make a stub of a bigquery implementation that threw the exception. Then there would have to be some way of excluding that one from CI, because an exception that's intentionally thrown is still an exception... Hopefully @clrcrl has some way of excluding them somehow 🤔 |
05b2eb7
to
a4db0f3
Compare
@joellabes so I looked into the postgres and redshift documentations and they also have the I'll work on making the BQ exception raising happen soon. |
fa915ca
to
c1ae430
Compare
@joellabes @clrcrl in the end I went ahead and also worked an adaptor macro for big query, it was a bit of a pain but I managed. I created so many failing CI runs I hope you folks don't get notified on each failures. Anyway, let me know if you have any additional feedback |
Incredible, thanks so much @bastienboutonnet! Sorry that this turned out to be so much more work than first thought 😭 |
Ah no worries, if I didn't want to do it I would have said it. I was stuck on my other open source projects today so it was something fun and useful to do. Is there a changelog entry I should be writing btw? And or some documentation? Also, not sure if this should be merged into the main branch or develop. I guess by this point this PR is more of a feature than a bugfix so I rebase on |
Not sure on those logistics - Claire's got "clear the packages backlog" on her sprint for this fortnight so I imagine she'll be checking this out pretty soon though! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love these changes! Yay km!!!
I wrote some suggestions for how this code might be made a little cleaner :)
macros/geo/haversine_distance.sql
Outdated
{% endmacro %} | ||
|
||
{% macro default__haversine_distance(lat1,lon1,lat2,lon2) -%} | ||
{% macro default__haversine_distance(lat1,lon1,lat2,lon2,unit) -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set this as an optional argument with a default!
{% macro default__haversine_distance(lat1,lon1,lat2,lon2,unit) -%} | |
{% macro default__haversine_distance(lat1, lon1, lat2, lon2, unit='mi') -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, wasn't sure if we wanted to implement it on the specific macros but that makes sense.
macros/geo/haversine_distance.sql
Outdated
{% macro default__haversine_distance(lat1,lon1,lat2,lon2) -%} | ||
{% macro default__haversine_distance(lat1,lon1,lat2,lon2,unit) -%} | ||
{# vanilla macro is in miles #} | ||
{% set conversion_rate = '' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little cleaner, and has some error handling. I like setting variables as numbers instead of strings where possible, so I've updated the code below as well
{% set conversion_rate = '' %} | |
{%- if unit == 'mi' %} | |
{% set conversion_rate = 1 %} | |
{% elif unit == 'km' %} | |
{% set conversion_rate = 1.60934 %} | |
{% else %} | |
{{ exceptions.raise_compiler_error("unit input must be one of 'mi' or 'km'. Got ' ~ unit) }} | |
{% endif -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks thats a lot better indeed! Done :)
ffe57ea
to
ba5d2c6
Compare
multiply instead of divide and move logic in core macro rename things a bit update readme
too many curly brackets make params be strings pass strings remove round to see if this was the problem let's try casting round the output field instead add commas cast the macro result directly rename expected to output is it a rounding error swap the units what is going on... throw a log to see what we get remove curlies try computing in a CTE first' add comma I'm sleeping... fixup! I'm sleeping... rename in CTE dont quote? fix macro and separate unit in two tests fix yaml indent fix indentation in schema yaml alias the field clean up fixup! clean up make the macro log stuff remove logging in test macro remove the select star
ba5d2c6
to
0e151dd
Compare
Hey @clrcrl thanks a lot for the feedback all done. Hope this works 🥳 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@clrcrl Thanks for merging the PR. Btw, I had not added any changelog entries yet because I wasn't sure whether this was bug or feature. Would you like me to add something to the changelog or you will do it when you release? |
This is a:
bug fix PR with no breaking changes — please ensure the base branch is
master
new functionality — please ensure the base branch is the latest
dev/
branchTo be honest I'm not too sure what this counts as. Haversine distance macro doesn't support kms #338 is labelled as a bug, but it's also kind of a feature request --let me know what you want to consider this one as and I'll adjust the diff target
a breaking change — please ensure the base branch is the latest
dev/
branchDescription & motivation
the
haversine_distance()
calculation outputs results in miles only at the minute. Interest in being able to output haversine in kilometers was registered in #338 .Changes
calling with
unit='km'
(distance between Amsterdam and Paris)returns ~ 430
calling witout a unit param or (
unit='mi'
)returns ~267
Closes #338
Checklist
I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
I have updated the README.md (if applicable)
I have added tests & descriptions to my models (and macros if applicable)
I have added an entry to CHANGELOG.md
0.7.0
but not sure if you follow that here, so let me know and then I can update the changelog accordingly.