drill

Helm Charts for deploying Apache Drill Clusters on Kubernetes

1.3.5
apachedrilldataquery

Helm Charts for Apache Drill

!NOTE This project was originally forked from Agirish/drill-helm-charts, but was moved to this repo for automated build and discoverability.

Overview

This repository contains a collection of files that can be used to deploy Apache Drill on Kubernetes using Helm Charts. Supports single-node and cluster modes.

What are Helm and Charts?

Helm is a package manager for Kubernetes. Charts are a packaging format in Helm that can simplify deploying Kubernetes applications such as Drill Clusters.

Pre-requisites

  • A Kubernetes Cluster (this project is tested on a K3s cluster)
  • Helm version 3 or greater
  • Kubectl version 1.16.0 or greater

Usage

Helm must be installed to use the charts. Please refer to Helm's documentation to get started.

Once Helm has been set up correctly, add the repo as follows:

helm repo add wearefrank https://wearefrank.github.io/charts

If you had already added this repo earlier, run helm repo update to retrieve the latest versions of the packages. You can then run helm search repo wearefrank to see the charts.

To install the Drill chart:

helm install drill wearefrank/drill

To uninstall the chart:

helm delete drill

Values

Helm Charts use values.yaml for providing default values to 'variables' used in the chart templates. These values may be overridden either by editing the values.yaml file or during helm install. For example, such as the namespace, number of drillbits and more to the template files

Please refer to the values.yaml file for details on default values for Drill Helm Charts.

Access Drill Web UI

There is a service that can be used, but this one will jump from pod, which isn't very friendly. Depending on ingress class you can make this sticky with annotations. You could also change the

Chart Structure

Drill Helm charts are organized as a collection of files inside the drill directory. As Drill depends on Zookeeper for cluster co-ordination, a zookeeper chart added as dependency in the chart definition. The Zookeeper chart is maintained by Bitnami.

drill/   
  Chart.yaml    # A YAML file with information about the chart
  Chart.lock    # A YAML file containing information about the fetched dependencies
  values.yaml   # The default configuration values for this chart
  charts/       # A directory containing the ZK charts
  templates/    # A directory of templates, when combined with values, will generate valid Kubernetes manifest files
  docs/         # A directory containing files for the documentation

Templates

Helm Charts contain templates which are used to generate Kubernetes manifest files. These are YAML-formatted resource descriptions that Kubernetes can understand. These templates contain 'variables', values for which are picked up from the values.yaml file.

Drill Helm Charts contain the following templates:

Autoscaling Drill Clusters

The size of the Drill cluster (number of Drill Pod replicas / number of drill-bits) can not only be manually scaled up or down, but can also be autoscaled to simplify cluster management. When enabled, with a higher CPU utilization, more drill-bits are added automatically and as the cluster load goes down, so do the number of drill-bits in the Drill Cluster. The drill-bits deemed excessive gracefully shut down, by going into quiescent mode to permit running queries to complete.

!IMPORTANT For the graceful shutdown to succeed, a sigfile is made in the $DRILL_HOME folder. This requires running as root ( uid 0). If the application is run as drilluser the stop commando will be used.

Enable autoscaling by editing the autoscale section in drill/values.yaml file.

Parameters

Common parameters

NameDescriptionValue
nameOverrideString to partially override common.names.fullname template (will maintain the release name)""
fullnameOverrideString to fully override common.names.fullname template""

Drill image parameters

NameDescriptionValue
image.registryDrill image registry""
image.repositoryDrill image repositoryapache/drill
image.tagDrill image tag (immutable tags are recommended)""
image.pullPolicyDrill image pull policyIfNotPresent
image.pullSecretsDrill image pull secrets[]

Drill deployment parameters

NameDescriptionValue
replicaCountNumber of Drill replicas to deploy1
startupProbe.initialDelaySecondsInitial delay seconds for livenessProbe10
startupProbe.periodSecondsPeriod seconds for livenessProbe10
startupProbe.timeoutSecondsTimeout seconds for livenessProbe1
startupProbe.failureThresholdFailure threshold for livenessProbe6
startupProbe.successThresholdSuccess threshold for livenessProbe1
readinessProbe.initialDelaySecondsInitial delay seconds for livenessProbe0
readinessProbe.periodSecondsPeriod seconds for livenessProbe5
readinessProbe.timeoutSecondsTimeout seconds for livenessProbe1
readinessProbe.failureThresholdFailure threshold for livenessProbe3
readinessProbe.successThresholdSuccess threshold for livenessProbe1
livenessProbe.initialDelaySecondsInitial delay seconds for livenessProbe0
livenessProbe.periodSecondsPeriod seconds for livenessProbe10
livenessProbe.timeoutSecondsTimeout seconds for livenessProbe1
livenessProbe.failureThresholdFailure threshold for livenessProbe6
livenessProbe.successThresholdSuccess threshold for livenessProbe1
resourcesSet the resources for the Drill containers{}
resources.limitsThe resources limits for the Drill containers""
resources.requests.memoryThe requested memory for the Drill containers""
resources.requests.cpuThe requested cpu for the Drill containers""
terminationGracePeriodSecondsNumber of seconds after which pods are forcefully killed25
terminationGracePeriodSecondsNote: Lower values may cause running queries to fail
nodeSelectorNode labels for pod assignment{}
tolerationsSet tolerations for pod assignment[]
affinitySet affinity for pod assignment{}
timeZoneused for database connection and log timestampsEtc/UTC

Traffic Exposure Parameters

NameDescriptionValue
service.web.typeDrill Web service typeClusterIP
service.web.portDrill Web service port80
service.user.typeDrill User Api service typeClusterIP
service.user.portDrill User Api service port31010
ingress.enabledEnable ingress record generation for Drillfalse
ingress.classNameIngressClass that will be used to implement the Ingress (Kubernetes 1.18+)""
ingress.annotationsAdditional annotations for the Ingress resource. To enable certificate auto-generation, place here your cert-manager annotations.{}
ingress.hostsSet hosts for ingress[]
ingress.hosts.hostSet hostname""
ingress.hosts.pathsSet multiple paths[]
ingress.hosts.paths.pathSet path (context url)""
ingress.hosts.paths.pathTypeSet type of path""
ingress.tlsDefine tls secrets for hosts (implementation not done yet)[]

Other Parameters

NameDescriptionValue
serviceAccount.createEnable creation of ServiceAccount for Drill podtrue
serviceAccount.annotationsAdditional custom annotations for the ServiceAccount{}
serviceAccount.nameThe name of the ServiceAccount to use.""
podAnnotationsAnnotations for Drill pods{}
podLabelsExtra labels for Drill pods{}
podSecurityContextSet Drill pod's Security Context{}
securityContextSet Drill container's Security Context{}

Drill configuration

Configuring Drill can be done with override files or in the web ui, although some properties can only be set in the override file. When using the web ui, ZooKeeper will be used to store the values. Make sure that the storage of ZooKeeper is persistent if you intend to configure this way.

This is an example where the web ui and authentication for local (plain) users is enabled.

drill.exec: {
http.enabled: true,
impersonation: {
enabled: true,
max_chained_user_hops: 3
},
security: {
auth.mechanisms: ["PLAIN"]
},
security.user.auth: {
enabled: true,
packages += "org.apache.drill.exec.rpc.user.security",
impl: "pam4j",
pam_profiles: [ "sudo", "login" ]
}
}

For more options refer to the Apache Drill documentation.

NameDescriptionValue
drill.driversJDBC Drivers can be configured to download here. This can be used if the Docker image doesn't contain the correct drivers[]
drill.drivers.nameThe name of the driver, will be used as filename (with .jar appended) and as name for initContainer""
drill.drivers.urlThe URL to download the driver from""
drill.drivers.noCheckCertificateSkip certificate check""
drill.overrideConfiguration.existingConfigMapThe name of the configmap, containing configuration files to override""
drill.overrideConfiguration.drillMultiline value for drill-override.conf
drill.overrideConfiguration.drillMetastoreMultiline value for drill-metastore-override.conf""
drill.overrideConfiguration.drillOnYarnMultiline value for drill-on-yarn-override.conf""
drill.overrideConfiguration.drillSqlLineMultiline value for drill-sqlline-override.conf""
drill.overrideConfiguration.storagePluginsMultiline value for storage-plugins-override.conf Can also be configured in the Web UI and saved by persistent ZooKeeper""
drill.authentication.existingSecretName of the secret containing a passwd file""
drill.authentication.usersUsers to create on the system[]
drill.authentication.users.nameUsername for the user""
drill.authentication.users.passwordPassword for the user""
drill.authentication.users.adminConfigures if the user should be admin""
extraVolumesOptionally specify extra list of additional volumes for Drill pods[]
extraVolumeMountsOptionally specify extra list of additional volumeMounts for Drill container(s)[]

Persistence

Persistence is used for logging and for JDBC drivers. These can be configured separately.

Configuration for Drill will be saved in ZooKeeper. Make sure that ZooKeeper is persistent if you want to keep changes in the Web UI.

NameDescriptionValue
persistence.enabledEnable persistence using Persistent Volume Claimsfalse
persistence.storageClassPersistent Volume storage class""
persistence.accessModesPersistent Volume access modes[]
persistence.sizePersistent Volume size2Gi
persistence.dataSourceCustom PVC data source{}
persistence.existingClaimThe name of an existing PVC to use for persistence""
persistence.selectorSelector to match an existing Persistent Volume for Drill's data PVC{}
persistence.annotationsPersistent Volume Claim annotations{}
persistence.dataLogDir.sizePVC Storage Request for Drill's dedicated data log directory2Gi
persistence.dataLogDir.existingClaimThe name of an existing PVC to use for persistence""
persistence.dataLogDir.selectorSelector to match an existing Persistent Volume for Drill's data log PVC{}

Notable changes

1.2.7

The notation for .Values.service has been changed. This makes it possible to configure web and user services separately.

1.2.6

.Values.replicaCount has been changed from 3 to 1. This is to default to a less complex install. Having three replicas introduces some complexity regarding, authentication, logging and how queries will be executed. Until features have been added to simple this, the user needs to take these things into account.

The notation for .Values.persistence has changed so storage for logs and data can be configured secretly. The values for persistent logging are now located at .Values.persistence.dataLogDir.

Application Version

1.21.1

Chart Versions

1.3.5 - 15/11/2024
1.3.4 - 02/08/2024
1.3.3 - 01/08/2024
1.3.2 - 29/07/2024
1.3.1 - 26/07/2024
1.3.0 - 23/01/2024
1.2.13 - 04/01/2024
1.2.12 - 30/11/2023
1.2.11 - 20/11/2023
1.2.10 - 02/11/2023
+ Show all releases