Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Kubernetes) Add metadata-ingestion as a Helm component #2236

Merged
merged 3 commits into from
Mar 16, 2021
Merged

feat(Kubernetes) Add metadata-ingestion as a Helm component #2236

merged 3 commits into from
Mar 16, 2021

Conversation

pedro93
Copy link
Collaborator

@pedro93 pedro93 commented Mar 15, 2021

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Adds metadata ingestion as a disabled-by-default helm component of Datahub's kubernetes deployment.
This component will render a Kubernetes CronJob resource for each cron resource specified.

Metadata-ingestion configuration recipes must be supplied by the consumer of this chart as a mounted volume and then explicitly referenced by the crawlerConfigPath.

An example is as follows:

datahub-crawler:
  enabled: true
  crons:
    - name: "hive"
      schedule: "0 0 * * *" 
      crawlerConfigPath: "/var/crawler-config.yml"  
      extraVolumes:
        - name: config-volume
          configMap:
            name: hive-crawler-config  
      extraVolumeMounts:  
        - name: config-volume        
          mountPath: "/var/crawler-config.yml"
          subPath: config.yml
    - name: "postgres"
      schedule: "0 1 * * *" 
      crawlerConfigPath: "/var/crawler-config.yml"  
      extraVolumes:
        - name: config-volume
          configMap:
            name: postgres-crawler-config  
      extraVolumeMounts:  
        - name: config-volume        
          mountPath: "/var/crawler-config.yml"
          subPath: config.yml

With the appropriate ConfigMaps holding each concrete recipes deployed to Kubernetes, this example will generate 2 separate cronJob resources in Kubernetes, allowing for reuse of this component for however may metadata ingestion cron jobs may be required.

Test method:

  • Successfully rendered valid Kubernetes resource which works as intended when deployed to a cluster.

Copy link
Contributor

@dexter-mh-lee dexter-mh-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks great. Has all the configurations we need.

Minor comment: related to the comment below. Can we name this component to indicate that this is a cronjob based ingestion?

Copy link
Contributor

@dexter-mh-lee dexter-mh-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pedro93
Copy link
Collaborator Author

pedro93 commented Mar 15, 2021

Thank you @dexter-mh-lee!
That said, if possible someone other than myself could try to test this new component I would appreciate it, especially after the last change. I tried to be careful and change only what needed to change but it was a find & replace sort of operation.

@dexter-mh-lee
Copy link
Contributor

I will try it out right now!

@dexter-mh-lee
Copy link
Contributor

@pedro93 Works!! Thanks for working on this!

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this contribution @pedro93

@shirshanka shirshanka merged commit 45d622b into datahub-project:master Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants