-
Notifications
You must be signed in to change notification settings - Fork 7
MySQL 5.7 compatibility #33
MySQL 5.7 compatibility #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- keeps working with MySQL 8.0.11
https://airflow.apache.org/docs/stable/installation.html - Airflow itself starts MySQL support from 5.6.4, discreETLy says we need 8.0+
We can make changes to unofficially broaden support (keep Airflow parity?) for now, what do you think @Wikia/data-engineering?
Probably no promises that future changes won't break it again though
SELECT | ||
dag_run.dag_id, | ||
dag_run.state | ||
FROM dag_run | ||
INNER JOIN (SELECT | ||
dag_id, | ||
MAX(execution_date) AS date | ||
FROM dag_run | ||
GROUP BY dag_id) mx | ||
ON | ||
dag_run.dag_id = mx.dag_id | ||
AND dag_run.execution_date = mx.date | ||
JOIN dag ON dag.dag_id = dag_run.dag_id AND is_active = 1 AND is_paused = 0'''.replace("\n", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks OK to me
SELECT | ||
dr.dag_id, | ||
dr.execution_date, | ||
dr_latest.state as dag_state, | ||
ti.task_id, | ||
ti.state as task_state, | ||
ti.duration, | ||
ti.start_date, | ||
ti.end_date | ||
FROM ( | ||
SELECT dag_id, | ||
MAX(execution_date) as execution_date | ||
FROM dag_run | ||
JOIN dag ON dag.dag_id = dag_run.dag_id AND is_active = 1 AND is_paused = 0) dr | ||
JOIN task_instance ti ON ti.dag_id = dr.dag_id AND ti.execution_date = dr.execution_date | ||
WHERE dr.age = 1'''.replace("\n", "") | ||
GROUP BY dag_id) dr | ||
JOIN dag_run dr_latest ON dr.dag_id = dr_latest.dag_id AND dr.execution_date = dr_latest.execution_date | ||
JOIN task_instance ti ON dr.dag_id = ti.dag_id AND dr.execution_date = ti.execution_date | ||
JOIN dag ON dag.dag_id = dr.dag_id AND is_active = 1 AND is_paused = 0'''.replace("\n", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also looks OK
SELECT | ||
@row_number:=CASE | ||
WHEN @dag_id_current = clean_dag_id | ||
THEN | ||
@row_number + 1 | ||
ELSE | ||
1 | ||
END AS rownum, | ||
@dag_id_current:=clean_dag_id clean_dag_id, | ||
dag_id, | ||
execution_date as date, | ||
state | ||
FROM ( | ||
SELECT *, | ||
CASE | ||
WHEN substring_index(reverse(dag_id), "v_", 1) rlike '[0-9]+\.[0-9]' | ||
THEN substring(dag_id, 1, length(dag_id) -1 - locate("v_", reverse(dag_id))) | ||
ELSE dag_id | ||
END as clean_dag_id from dag_run) dag_run, (SELECT @dag_id_current:='',@row_number:=0) as t | ||
ORDER BY clean_dag_id, execution_date DESC | ||
) dr WHERE rownum <= {days}''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is much less readible to me than the previous version.
We'd have to comment the method like it was so it's clear it's done to ignore the DAG's name versioning part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is much less readable. But well, it's the only way to do it in one query for all DAGs on the DB that doesn't support windowing functions. It passes all the tests in the project.
@MikolajBalcerek can we define the API for retrieving data from Airflow and provide two classes with separate implementations? It complicates the implementation but we can continue using all features of the latest MySQL versions and at the same time provide compatibility with older versions. If we decide to have two implementations or use only the 5.7 version of SQL, we should have automated tests for that version too because in the future nobody will bother to test it manually after changes. |
I like your idea about ability to add possibility to switch underlying implementation of Airflow's data provider. It can be useful for integrating discreETLy with other DBs like Postgres. Current automated tests cover many cases about striping the DAGs versions, they were actually very useful when implementing this change. The both implementations (current for mysql >= 8 and the from the PR for mysql < 8) passes the same set of unit tests from the project, so there is no need to duplicate. |
Hey @szczeles , In the future we are going to continue developing with MySQL 8+ in mind, but if there are any incompatibilities feel free to prepare PRs or raise an issue and we'll see what can be done. Thanks for contributions and good luck :)! |
Many thanks @szczeles for sharing your work! |
Thanks @MikolajBalcerek, @mikulskibartosz 🙇♂️ |
|
SQLs modification to make DiscreETLy run on MySQL 5.7 (which has no
row_number
and poor regular expressions support)