Spatacoli | Centralized Logging in Kubernetes with Loki and Promtail

Delete a pod, then try to read its logs:

kubectl delete pod -n monitoring <some-pod>
kubectl logs <that-same-pod>
# Error: pod not found

The pod is gone. The logs are gone. Whatever caused it to fail? Unknown. This is the limitation of kubectl logs: it only works on running pods. If a container crashes and restarts, the previous container's logs disappear. If you're debugging a problem that spans multiple pods, you're grepping through multiple terminal windows.

Centralized logging fixes this. This is the companion article to Episode 8 of the Kubernetes on Raspberry Pi series.

All configs are in the kubernetes-series GitHub repo under video-08-logging-loki-promtail/.

Observability: Metrics, Logs, and Traces

With Prometheus from Episode 4, we have metrics: numerical data about what's happening in the cluster right now. Logs are the second pillar, what actually happened in detail when something went wrong. Traces, the third pillar, capture how a request flows through multiple services. That's a future episode.

Pillar	Tool	What it answers
Metrics	Prometheus	What is happening, numerically
Logs	Loki	What actually happened, in detail
Traces	Jaeger/Tempo	How a request flowed through services

What Is Loki?

Loki is to logs what Prometheus is to metrics, made by the same company (Grafana Labs) with the same design philosophy. The important difference from Elasticsearch or Splunk is how it handles indexing. Traditional log systems index the full content of every log line, which means huge storage requirements and slow writes. Loki only indexes labels (pod name, namespace, container) and stores log content compressed. Much smaller footprint, much faster ingestion.

Two components work together: Loki is the log aggregation backend, and Promtail is the log collector that runs on every node.

DaemonSets: One Pod Per Node

Before deploying Promtail, it's worth introducing a workload type we haven't used yet. A Deployment runs N replicas, with the scheduler deciding where they go. A DaemonSet runs exactly one pod on every node in the cluster. When a new node joins, Kubernetes automatically starts the DaemonSet pod on it.

Promtail must be a DaemonSet because pod logs live on the node running the pod (at /var/log/pods/ on disk). To collect logs from all pods, you need a collector on every node. Our cluster has 1 control plane and 5 workers, so Promtail will run on all 6.

Prerequisites

Loki uses local-path storage for its index and chunk files. NFS is unreliable for these because of file locking behavior. Install the local-path provisioner first:

kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

Then label the namespace to allow privileged pods (required for hostPath volumes):

kubectl label namespace local-path-storage \
  pod-security.kubernetes.io/enforce=privileged \
  pod-security.kubernetes.io/warn=privileged \
  pod-security.kubernetes.io/audit=privileged \
  --overwrite

Deploying Loki

Add the Grafana Helm repository:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Create loki-values.yaml. We're using SingleBinary deployment mode because Loki can run as distributed microservices, but a single binary handles everything fine for a homelab. local-path storage keeps Loki's index and chunks off NFS. replication_factor: 1 means one copy, which is acceptable here.

loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: filesystem
        schema: v13
        index:
          prefix: loki_index_
          period: 24h

deploymentMode: SingleBinary

singleBinary:
  replicas: 1
  persistence:
    enabled: true
    storageClass: local-path
    size: 10Gi

chunksCache:
  enabled: false
resultsCache:
  enabled: false
backend:
  replicas: 0
read:
  replicas: 0
write:
  replicas: 0
grafana:
  enabled: false

helm install loki grafana/loki \
  --namespace monitoring \
  --values loki-values.yaml

kubectl get pods -n monitoring | grep loki
# loki-0 should show 2/2 Running (may take a minute)

Deploying Promtail

Promtail's config points it at Loki and adds a toleration to run on the control plane node. By default, Kubernetes prevents regular pods from scheduling on control plane nodes, but we want logs from control plane components too.

config:
  clients:
    - url: http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push

tolerations:
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule

helm install promtail grafana/promtail \
  --namespace monitoring \
  --values promtail-values.yaml

Verify it's running on all 6 nodes:

kubectl get pods -n monitoring -l app.kubernetes.io/name=promtail -o wide
# Should show one pod per node

Connecting Loki to Grafana

In Grafana at https://grafana.spatacoli.xyz, go to Connections > Data Sources > Add data source > Loki. Set the URL to http://loki-gateway.monitoring.svc.cluster.local and click Save & Test. You should see "Data source connected and labels found."

Querying Logs with LogQL

Open Grafana > Explore and select the Loki data source. LogQL syntax is familiar if you've used PromQL.

A basic label filter returns all logs from a namespace:

{namespace="monitoring"}

Filter by pod name with regex:

{namespace="monitoring", pod=~"prometheus-.*"}

Search within log content (|= means "contains"):

{namespace="monitoring"} |= "level=error"

Turn log counts into a metric, graphing error rate over time:

rate({namespace="monitoring"} |= "level=error" [5m])

That last query is particularly useful. You can add it to a Grafana dashboard alongside your Prometheus metrics, giving you error rate from logs as a panel.

The Real Debugging Workflow

Delete a pod so it restarts, then try the old approach:

kubectl delete pod -n monitoring <grafana-pod>
kubectl logs <that-same-pod>
# Error: pod not found

Now go to Grafana > Explore > Loki, query {namespace="monitoring", pod=~".*grafana.*"}, and set the time range to the last 30 minutes. The full shutdown sequence is there. Loki caught it even though the pod is gone.

Metrics tell you when something went wrong. Logs tell you why.

By default, Loki retains logs for 744 hours (31 days). To change this, add limits_config.retention_period to your loki-values.yaml and upgrade the Helm release.

What's Next

In Episode 9 we migrate from Promtail to Grafana Alloy — Grafana's next-generation observability collector and the official supported replacement for Promtail. The Loki backend, your LogQL queries, and your Grafana dashboards stay exactly the same. Only the collection layer changes.