Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for serverless job in Databricks operators #45188

Merged
merged 289 commits into from
Jan 28, 2025

Conversation

HariGS-DB
Copy link
Contributor

closes: #45138

This PR enables running Databricks jobs as a serverless mode. It relaxes the check for the presence of both job_cluster_key and existing_cluster_id (as for serverless, both settings should not be present). It also added a named parameter for databricks environment
Also added relevant unit test cases and updated the documentation.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

boring-cyborg bot commented Dec 23, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@HariGS-DB
Copy link
Contributor Author

@alexott Please could you give your review comments when possible.

Copy link
Member

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great addition @HariGS-DB . I have few minor suggestions. Happy to take another pass once you add your thoughts or have addressed the comments.

Thanks @eladkal for requesting review here

@pankajkoti pankajkoti changed the title Added support for serverless job in Databricks operators Add support for serverless job in Databricks operators Jan 2, 2025
kaxil and others added 10 commits January 16, 2025 13:13
Fixed the test to include typing.Dict and removed the old skipif check.
Fixing this so main works with most of the XCom example. But proper fix will be part of apache#45231
* Add HTTP retry handling into task SDK api.client

* Add logging of call failures

* Prevent task sdk tests with LocalExecutor fail with retries

* Review feedback

* Review feedback, Adjust wording

* Correct time parameters to float

* Review Feedback
* Add directory put and get functions for sftp provider

* Add test code

* Add directory exists check

* Fix merge conflict

* Add path exists check
* delete a variable

* success toaster

* key

* look

* delete modal

* rename

* rename
* add the api

* tests

* ImportVariablesBody

* action_if_exists

* rename

* add desc

* fix

* remove unwanted teardown
* initial structure setup

* add editVariable

* initial edit setup

* decouple add variable with form

* edit modal

* fix checks

* rename folder

* bug fix

* reset error

* rename

* rename add

* not edit key
@HariGS-DB HariGS-DB reopened this Jan 16, 2025
@HariGS-DB
Copy link
Contributor Author

Thanks for the great addition @HariGS-DB . I have few minor suggestions. Happy to take another pass once you add your thoughts or have addressed the comments.

Thanks @eladkal for requesting review here

Hi @pankajkoti I have incorporated your review comments. I had to close and reopen the PR due to some merge conflicts which I have resolved. Please could you check and give me your new feedback

Copy link
Member

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for addressing the suggestions @HariGS-DB

@pankajkoti
Copy link
Member

@HariGS-DB some checks are failing in the CI. Could you please take care of that? And then we're good to merge I guess

@HariGS-DB
Copy link
Contributor Author

@HariGS-DB some checks are failing in the CI. Could you please take care of that? And then we're good to merge I guess

I have addressed some of the issues. Could you please re-check? I think it needs your approval before the complete tests are run.

@HariGS-DB
Copy link
Contributor Author

@pankajkoti I have resolved the issues and the error check in the test and docs. But I am still getting a mypy error in the CI for code which I have not even touched. I might need some advice on how should I resolve them

@HariGS-DB
Copy link
Contributor Author

@pankajkoti I see you have pushed a new change which seem to have fixed the CI pipelines. Is there anything else need to be done for merging this. Please let me know

@pankajkoti pankajkoti merged commit 0631ba7 into apache:main Jan 28, 2025
61 checks passed
Copy link

boring-cyborg bot commented Jan 28, 2025

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@pankajkoti
Copy link
Member

Thanks, @HariGS-DB, for the fantastic contribution! I’ve merged the PR.

Since this was your first contribution, the CI pipeline required maintainer approval to run. Going forward, for your subsequent PRs, the CI pipelines will run automatically without needing approval. 🎉

got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Jan 30, 2025
@HariGS-DB
Copy link
Contributor Author

@pankajkoti Do you know when this PR will be released in the next airflow version? I will need to add another PR to introduce a validation for serverless (check for presence of environment_key in the json payload when submitting it for serverless ). I will submit this PR this week, is it possible to hold this PR in the next release until then?

@pankajkoti
Copy link
Member

pankajkoti commented Feb 5, 2025

Hi @HariGS-DB,

Providers are typically released on a bi-weekly cycle, as outlined here: https://github.com/apache/airflow/blob/main/PROVIDERS.rst#community-providers-release-process. While it’s not possible to delay a release for a single PR or skip a PR from the release candidate, you can cast a negative vote if the provider release includes this PR and it introduces a breaking change. Release managers initiate a voting thread once the provider release candidates are created. If you’re not already subscribed to the dev list (https://airflow.apache.org/community/#mailing-list), I recommend doing so to stay informed about upcoming releases, allowing you to test and vote accordingly.

In this case, it looks like the change is an addition rather than a regression, so proceeding with the release seems reasonable. Additional validation can be incorporated in the next release if needed, without delaying the current one. Does that sound good to you?

niklasr22 pushed a commit to niklasr22/airflow that referenced this pull request Feb 8, 2025
MikhailKulikov-db added a commit to MikhailKulikov-db/airflow that referenced this pull request Feb 13, 2025
pankajkoti pushed a commit that referenced this pull request Feb 13, 2025
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for serverless cluster in Databricks operator in databricks provider