Skip to content

Commit

Permalink
feat(k8s): Add metadata-ingestion as a Helm component (#2236)
Browse files Browse the repository at this point in the history
Co-authored-by: Pedro Silva <[email protected]>
  • Loading branch information
pedro93 and pedro93 authored Mar 16, 2021
1 parent f8b88c5 commit 45d622b
Show file tree
Hide file tree
Showing 9 changed files with 229 additions and 5 deletions.
9 changes: 5 additions & 4 deletions contrib/kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,11 @@ The following table lists the configuration parameters and its default values

| Repository | Name | Version |
|------------|------|---------|
| file://./charts/datahub-frontend | datahub-frontend | 0.2.0 |
| file://./charts/datahub-gms | datahub-gms | 0.2.0 |
| file://./charts/datahub-mae-consumer | datahub-mae-consumer | 0.2.0 |
| file://./charts/datahub-mce-consumer | datahub-mce-consumer | 0.2.0 |
| file://./charts/datahub-frontend | datahub-frontend | 0.2.1 |
| file://./charts/datahub-gms | datahub-gms | 0.2.1 |
| file://./charts/datahub-mae-consumer | datahub-mae-consumer | 0.2.1 |
| file://./charts/datahub-mce-consumer | datahub-mce-consumer | 0.2.1 |
| file://./charts/datahub-ingestion-cron | datahub-ingestion-cron | 0.2.1 |

## Install DataHub
Navigate to the current directory and run the below command. Update the `datahub/values.yaml` file with valid hostname/IP address configuration for elasticsearch, neo4j, schema-registry, broker & mysql.
Expand Down
6 changes: 5 additions & 1 deletion contrib/kubernetes/datahub/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,8 @@ dependencies:
- name: datahub-mce-consumer
version: 0.2.1
repository: file://./charts/datahub-mce-consumer
condition: datahub-mce-consumer.enabled
condition: datahub-mce-consumer.enabled
- name: datahub-ingestion-cron
version: 0.2.1
repository: file://./charts/datahub-ingestion-cron
condition: datahub-ingestion-cron.enabled
2 changes: 2 additions & 0 deletions contrib/kubernetes/datahub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Current chart version is `0.1.1`
| file://./charts/datahub-gms | datahub-gms | 0.2.1 |
| file://./charts/datahub-mae-consumer | datahub-mae-consumer | 0.2.1 |
| file://./charts/datahub-mce-consumer | datahub-mce-consumer | 0.2.1 |
| file://./charts/datahub-ingestion-cron | datahub-ingestion-cron | 0.2.1 |

#### Chart Values

Expand All @@ -29,6 +30,7 @@ Current chart version is `0.1.1`
| datahub-mce-consumer.enabled | bool | `true` | |
| datahub-mce-consumer.image.repository | string | `"linkedin/datahub-mce-consumer"` | |
| datahub-mce-consumer.image.tag | string | `"latest"` | |
| datahub-ingestion-cron.enabled | bool | `false` | |
| global.datahub.appVersion | string | `"1.0"` | |
| global.datahub.gms.port | string | `"8080"` | |
| global.elasticsearch.host | string | `"elasticsearch"` | |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: v2
name: datahub-ingestion-cron
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 0.2.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 0.3.1
25 changes: 25 additions & 0 deletions contrib/kubernetes/datahub/charts/datahub-ingestion-cron/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
datahub-ingestion-cron
================
A Helm chart for datahub's metadata-ingestion framework with kerberos authentication.

## Chart Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| image.pullPolicy | string | `"Always"` | Image pull policy |
| image.repository | string | `"linkedin/datahub-ingestion"` | DataHub Ingestion image repository |
| image.tag | string | `"latest"` | DataHub Ingestion image tag |
| imagePullSecrets | array | `[]` (does not add image pull secrets to deployed pods) | Docker registry secret names as an array |
| labels | string | `{}` | Metadata labels to be added to each crawling cron job |
| crons | type | `[]` | A list of crawling parameters per different technology being crawler |
| crons.name | string | `crawler` | Name of the crawler container |
| crons.schedule | string | `""0 0 * * *"` | Cron expression (daily at midnight) for crawler jobs |
| crons.crawlerConfigPath | string | N/A | Path to metadata configuration file. This must explicitly defined as a mount and is **required**. |
| crons.hostAliases | array | `[]` | host aliases |
| crons.env | object | `{}` | Environment variables to add to the cronjob container |
| crons.envFromSecrets | object | `{}` | Environment variables from secrets to the cronjob container |
| crons.envFromSecrets*.secret | string | | secretKeyRef.name used for environment variable |
| crons.envFromSecrets*.key | string | | secretKeyRef.key used for environment variable |
| crons.extraVolumes | array | `[]` | Additional volumes to add to the pods |
| crons.extraVolumeMounts | array | `[]` | Additional volume mounts to add to the pods |
| crons.extraInitContainers | object | `{}` | Init containers to add to the cronjob container |
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
*/}}
{{- define "datahub-ingestion-cron.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "datahub-ingestion-cron.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "datahub-ingestion-cron.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}

{{/*
Common labels
*/}}
{{- define "datahub-ingestion-cron.labels" -}}
helm.sh/chart: {{ include "datahub-ingestion-cron.chart" . }}
{{ include "datahub-ingestion-cron.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end -}}

{{/*
Selector labels
*/}}
{{- define "datahub-ingestion-cron.selectorLabels" -}}
app.kubernetes.io/name: {{ include "datahub-ingestion-cron.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end -}}

{{/*
Create the name of the service account to use
*/}}
{{- define "datahub-ingestion-cron.serviceAccountName" -}}
{{- if .Values.serviceAccount.create -}}
{{ default (include "datahub-ingestion-cron.fullname" .) .Values.serviceAccount.name }}
{{- else -}}
{{ default "default" .Values.serviceAccount.name }}
{{- end -}}
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
{{- $baseName := include "datahub-ingestion-cron.fullname" .}}
{{- $labels := include "datahub-ingestion-cron.labels" .}}
{{- range $job, $val := .Values.crons }}
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: "{{ $baseName }}-{{ .name }}"
labels: {{- $labels | nindent 4 }}
spec:
schedule: {{ default "0 0 * * *" .schedule | quote}}
jobTemplate:
spec:
template:
spec:
{{- with $.Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 12 }}
{{- end }}
{{- if .extraInitContainers }}
initContainers:
{{- toYaml .extraInitContainers | nindent 12 }}
{{- end }}
{{- if .hostAliases }}
hostAliases: {{- include "common.tplvalues.render" (dict "value" .hostAliases "context" $) | nindent 10 }}
{{- end }}
containers:
- name: {{ default "crawler" .name }}
image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag }}"
imagePullPolicy: {{ $.Values.image.pullPolicy }}
{{- if .extraVolumeMounts }}
volumeMounts:
{{- toYaml .extraVolumeMounts | nindent 14 }}
{{- end }}
command:
- /bin/sh
- -c
- datahub ingest -c {{ required "Path to configuration file is required" .crawlerConfigPath }}
env:
{{- if .env }}
{{- range $key,$value := .env }}
- name: {{ $key | quote}}
value: {{ $value | quote}}
{{- end }}
{{- end }}
{{- if .envFromSecrets }}
{{- range $key,$value := .envFromSecrets }}
- name: {{ $key | quote}}
valueFrom:
secretKeyRef:
name: {{ $value.secret | quote}}
key: {{ $value.key | quote}}
{{- end }}
{{- end }}
restartPolicy: OnFailure
{{- if .extraVolumes }}
volumes:
{{- toYaml .extraVolumes | nindent 12 }}
{{- end }}
---
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Default values for datahub-ingestion-cron.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

image:
repository: linkedin/datahub-ingestion
tag: latest
pullPolicy: Always

imagePullSecrets: []

crons: []
#### Example data
## Metadata ingestion name
##
#name: "crawler"

## Daily at midnight (we may want to offset this to not conflict with other processes)
#schedule: "0 0 * * *"

## Deployment pod host aliases
## https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/
##
#hostAliases: []

## Environment variables.
#env: {}

## Environment variables from Secret resources.
#envFromSecrets: {}

## Additional primary volume mounts
##
#extraVolumeMounts: []

## Additional primary volumes
##
#extraVolumes: []

## Add your own init container or uncomment and modify the given example.
##
#extraInitContainers: {}
6 changes: 6 additions & 0 deletions contrib/kubernetes/datahub/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ datahub-mce-consumer:
repository: linkedin/datahub-mce-consumer
tag: "latest"

datahub-ingestion-cron:
enabled: false
image:
repository: linkedin/datahub-ingestion
tag: "latest"

global:
elasticsearch:
host: "elasticsearch"
Expand Down

0 comments on commit 45d622b

Please sign in to comment.