Spatacoli | Kubernetes Monitoring with Helm, Prometheus, and Grafana

In Episode 3 we wrote five YAML files to deploy one application. The kube-prometheus-stack chart, which gives us Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics, would require 20+ manifests written by hand. We're not doing that. This is where Helm enters the picture.

This is the companion article to Episode 4 of the Kubernetes on Raspberry Pi series. We deploy full cluster monitoring using Prometheus and Grafana, and work through the Talos-specific issues that come up along the way.

All configs are in the kubernetes-series GitHub repo under video-04-helm-prometheus-grafana/.

What Is Helm?

Helm is Kubernetes' package manager, like apt or brew but for cluster apps. A chart is a package of Kubernetes manifests, templated and versioned. A repository is a collection of charts. A release is a deployed instance of a chart, and values are configuration overrides you provide at install time.

Install Helm and add the Prometheus community repository:

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Configuring the Install

Rather than accepting all defaults, we customize a few things in a values.yaml file:

# values.yaml
grafana:
  adminPassword: "your-secure-password"
  service:
    type: ClusterIP

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec: {}

prometheus-node-exporter:
  hostRootFsMount:
    enabled: false

The hostRootFsMount: false setting is critical for Talos. More on that shortly.

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --values values.yaml

Watch the resources get created:

kubectl get all -n monitoring

One command, 20+ resources. That's the value of Helm.

Talos PSS Troubleshooting

This is where Talos starts pushing back. Pod Security Standards are enforced at the API server level, and several monitoring components need capabilities that Talos blocks by default.

node-exporter

After install, you may notice node-exporter pods aren't running:

kubectl get daemonset kube-prometheus-stack-prometheus-node-exporter -n monitoring
# DESIRED: 6, CURRENT: 0

Describing the DaemonSet reveals the problem:

violates PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true)...

node-exporter needs deep host access (hostPID, hostNetwork) to collect node metrics. We address this with two fixes. The hostRootFsMount: false value we already added to values.yaml handles Talos's read-only root filesystem conflict. For the PSS restriction, add monitoring to the namespace exemptions in the Talos machine config:

export EDITOR=nano
talosctl edit machineconfig --nodes <control-plane-ip>

Find the exemptions section and add monitoring:

exemptions:
    namespaces:
        - kube-system
        - monitoring

Reboot the control plane:

talosctl reboot --nodes <control-plane-ip>

All 6 node-exporter pods should appear after reboot.

kube-scheduler and kube-controller-manager

Prometheus targets for kube-scheduler and kube-controller-manager will show connection refused. Talos binds these components to 127.0.0.1 by default, making them unreachable from other pods. Patch them to bind on all interfaces:

cluster:
  scheduler:
    extraArgs:
      bind-address: 0.0.0.0
  controllerManager:
    extraArgs:
      bind-address: 0.0.0.0

talosctl patch machineconfig --nodes <control-plane-ip> --patch @scheduler-patch.yaml
talosctl reboot --nodes <control-plane-ip>

kube-proxy

kube-proxy has the same bind address issue, but it runs on all 6 nodes. Patch all nodes at once, then delete the DaemonSet to force regeneration:

cluster:
  proxy:
    extraArgs:
      metrics-bind-address: 0.0.0.0:10249

talosctl patch machineconfig \
  --nodes <node1>,<node2>,<node3>,<node4>,<node5>,<node6> \
  --patch @proxy-patch.yaml

kubectl delete daemonset kube-proxy -n kube-system

Note: kube-proxy is a DaemonSet managed by Talos. Unlike control plane components, it doesn't pick up changes on reboot. Talos regenerates it when it detects the DaemonSet is missing.

Accessing Grafana

Port-forward to Grafana:

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

Open http://localhost:3000 and log in with admin and the password from your values file. Explore the pre-built dashboards: cluster overview, per-node CPU and memory, per-pod metrics. The cluster stops being a black box.

Two Helm commands worth memorizing for ongoing maintenance:

helm list -n monitoring          # see what's installed
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values values.yaml           # apply config changes

What's Next

Grafana is accessible via port-forward, which is a temporary shortcut and not a real solution. In Episode 5 we add MetalLB and Traefik so every service gets a real URL with no port numbers.