Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling for android-app:// in get_url_host() macro #426

Merged
merged 15 commits into from
Nov 9, 2021

Conversation

foundinblank
Copy link
Contributor

@foundinblank foundinblank commented Oct 1, 2021

This is a:

  • bug fix PR with no breaking changes — please ensure the base branch is master
  • new functionality — please ensure the base branch is the latest dev/ branch
  • a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

In my Segment page view data, we see referrer values such as android-app://m.facebook.com/. These seem to be specific for Android apps and take the place of http://. The current get_url_host() macro will return an android-app:// value instead of the hostname for those referrer URLs. Be great if the get_url_host() macro handled those too!

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I have "dispatched" any new macro(s) so non-core adapters can also use them (e.g. the star() source)
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@foundinblank
Copy link
Contributor Author

  1. I'd add an entry to CHANGELOG but it seems it's on 0.7.1 while the latest release is 0.7.3? Not sure what to do here.
  2. Is this a bug fix or new functionality? I tried first as new functionality but Github suggested I couldn't merge to the latest dev branch, so I'm trying this as a bug fix and merging to master.

@foundinblank foundinblank marked this pull request as ready for review October 1, 2021 11:31
Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @foundinblank, thanks for the PR! It looks reasonable to me.

Could I also ask you to flesh out the integration test (source seed here) to reflect your newly-supported URI scheme? This will mean that we get tests across all of the adapters.

The only thing that strikes me is that we're now at 3 replacements deep (http://, https://, android-app://). If it went any further, I'd be thinking that a regex_replace for something along the lines of ^[-A-Za-z]+:\/\/ would be more appropriate (in English: from the start of the string, 1+ alpha chars or hyphens, followed by ://). What do you think? Are there likely to be other URI schemes that we need to handle?

Regarding your target branch question, it's on the borderline but I'd probably call it new functionality. Let me look into the branch issue you mentioned!

With dbt 1.0 coming later this year, we're thinking about exactly how we handle the migration, but I'm expecting another minor release of dbt-utils with new functionality like this PR before then!

@foundinblank
Copy link
Contributor Author

Hey @joellabes, that's a great idea to use regexp_replace, so I've swapped that in and updated the integration test seed file. I keep running into this error, though:

Compilation Error in model test_url_host (models/web/test_url_host.sql)
  'regexp_replace' is undefined

I'm pretty confused by it and have checked that the regexp_replace() function exists on all 4 databases. Googling it isn't much help either.

@joellabes
Copy link
Contributor

Ah! I think your problem is that you'll need to wrap the whole command in quotes, because the entire command is being passed into dbt_utils.split_part(). Right now, the dbt compiler is looking for a Jinja macro called regexp_replace.

"regexp_replace(field, '^[-A-Za-z]+:\/\/', "''""

Unfortunately, there's already a lot of quotes in the mix so I doubt that will work as written. Does Jinja let you escape with backslashes etc? I don't know off the top of my head.

If it doesn't, then maybe we want to go down the regexp_substr route instead of using split_part.

Or, we could set the replace command outside of the parsed block. In handwavy pseudocode, something like

{% set replaced = regexp_replace() %}

{% set parsed = split_part(split_part(replaced)) %}

{{ dbt_utils.safe_cast() }} --as it currently is

would probably work.

Sorry that this is blowing up a bit!

@foundinblank
Copy link
Contributor Author

@joellabes I couldn't make it work so I reverted back to the original approach with 3 nested replacements, which does work. I don't think we'll have more URI schemas other than http https and android-app in the future, but if we do maybe whoever updates this macro next will be better at Jinja parsing than I am!

@joellabes
Copy link
Contributor

I am 100% on board with this as an approach. Godspeed the next person.

I feel guilty that I didn't have a stable dbt installation until last week (😬 😳 😱 ) so couldn't properly dive into the Jinja-weeds with you.

I'll do a quick check of this in the next couple of days and get it merged!

@foundinblank
Copy link
Contributor Author

Haha - no problem! I didn't have a great dbt installation on my machine either, was relying mostly on CI/CD to do the lifting. We shall be better at this next time! 😄 Thanks for your help in making this happen!

@joellabes joellabes changed the base branch from main to next/minor November 9, 2021 04:40
@joellabes
Copy link
Contributor

@foundinblank Could I get you to add yourself to the changelog please?

@foundinblank
Copy link
Contributor Author

@joellabes done!

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! 🚀

@joellabes joellabes merged commit 296de1f into dbt-labs:next/minor Nov 9, 2021
joellabes added a commit that referenced this pull request Dec 2, 2021
@joellabes joellabes mentioned this pull request Dec 2, 2021
15 tasks
joellabes added a commit that referenced this pull request Dec 2, 2021
joellabes added a commit that referenced this pull request Dec 2, 2021
* dbt 0.7.4 release (#441)

* Update require-dbt-version to be 1.0

* Fix SQL 42000 on Exasol (#420)

" SQL-Error [42000]: syntax error, unexpected '*' "
If you specify the * in the unioned with their respectiv names <name>.* you do not receive the SQL Error posted above. This should not inflict any further problems since it is redundant for most DBs.

* Minor readme link fixes (#431)

* minor readme link fixes

* changelog addition

Co-authored-by: Joel Labes <[email protected]>

* 0.7.4 changelog (#432)

* Update CHANGELOG.md

* Note branch name change

* use `limit_zero` macro instead of `limit 0` (#437)

* Utils 0.7.4b1  (#433)

* Update require-dbt-version to be 1.0

* Fix SQL 42000 on Exasol (#420)

" SQL-Error [42000]: syntax error, unexpected '*' "
If you specify the * in the unioned with their respectiv names <name>.* you do not receive the SQL Error posted above. This should not inflict any further problems since it is redundant for most DBs.

* Minor readme link fixes (#431)

* minor readme link fixes

* changelog addition

Co-authored-by: Joel Labes <[email protected]>

* 0.7.4 changelog (#432)

* Update CHANGELOG.md

* Note branch name change

Co-authored-by: Timo Kruth <[email protected]>
Co-authored-by: Joe Markiewicz <[email protected]>

* standard convention

* Update integration_tests/tests/jinja_helpers/test_slugify.sql

Taking the liberty of committing on your behalf so that the CI job starts again

* Change limit_zero to be a macro

Co-authored-by: Joel Labes <[email protected]>
Co-authored-by: Timo Kruth <[email protected]>
Co-authored-by: Joe Markiewicz <[email protected]>

* Add col_name alias to else state too (#437)

* Remove extra semicolon in `insert_by_period` materialization (#439)

* Remove extra semicolon in `insert_by_period` materialization.

`create_table_as()` generates a SQL statement that already ends with a semicolon, so the extra semicolon after a `create_table_as()` call in the `insert_by_period` materialization ends up being an empty SQL statement, and at least when using Snowflake this causes the dbt run to fail with a "cannot unpack non-iterable NoneType object" error.

* Update changelog for PR 439.

* Use the relation object passed into get_column_values, instead of making our own (#440)

* Use the relation object passed into get_column_values, instead of making our own

* Rename variables in get column value test to be clearer

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Timo Kruth <[email protected]>
Co-authored-by: Joe Markiewicz <[email protected]>
Co-authored-by: Anders <[email protected]>
Co-authored-by: Sean Rose <[email protected]>

* Regression: Correctly handle missing relations in get_column_values (#448)

* Create integration test for a dropped relation

* Update get_column_values.sql

* Swap out adapter call for a good old fashioned drop table

* Add missing curlies

* what person wrote this code :/ (it was me)

* wrap values in quotes

* GOOD

* bigquery compat (they don't like except)

* Backport android url changes from #426 (#452)

* Update CHANGELOG.md

* Change require-dbt-version, update dbt_project.yml for integration tests proj

* Upgrade python version in CI, improve drop relation integration test

* Clarify version pinning

* Drop support for release candidates of 1.0.0

Co-authored-by: Timo Kruth <[email protected]>
Co-authored-by: Joe Markiewicz <[email protected]>
Co-authored-by: Anders <[email protected]>
Co-authored-by: Sean Rose <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants