Delete a pod, then try to read its logs:
kubectl delete pod -n monitoring <some-pod>
kubectl logs <that-same-pod>
# Error: pod not found
The pod is gone. The logs are gone. Whatever caused it to fail? Unknown. This is the limitation of kubectl logs: it only works on running pods. If a container crashes and restarts, the previous container's logs disappear. If you're debugging a problem that spans multiple pods, you're grepping through multiple terminal windows.
Centralized logging fixes this. This is the companion article to Episode 8 of the Kubernetes on Raspberry Pi series.
All configs are in the kubernetes-series GitHub repo under video-08-logging-loki-promtail/.
Observability: Metrics, Logs, and Traces
With Prometheus from Episode 4, we have metrics: numerical data about what's happening in the cluster right now. Logs are the second pillar, what actually happened in detail when something went wrong. Traces, the third pillar, capture how a request flows through multiple services. That's a future episode.
| Pillar | Tool | What it answers |
|---|---|---|
| Metrics | Prometheus | What is happening, numerically |
| Logs | Loki | What actually happened, in detail |
| Traces | Jaeger/Tempo | How a request flowed through services |
What Is Loki?
Loki is to logs what Prometheus is to metrics, made by the same company (Grafana Labs) with the same design philosophy. The important difference from Elasticsearch or Splunk is how it handles indexing. Traditional log systems index the full content of every log line, which means huge storage requirements and slow writes. Loki only indexes labels (pod name, namespace, container) and stores log content compressed. Much smaller footprint, much faster ingestion.
Two components work together: Loki is the log aggregation backend, and Promtail is the log collector that runs on every node.
DaemonSets: One Pod Per Node
Before deploying Promtail, it's worth introducing a workload type we haven't used yet. A Deployment runs N replicas, with the scheduler deciding where they go. A DaemonSet runs exactly one pod on every node in the cluster. When a new node joins, Kubernetes automatically starts the DaemonSet pod on it.
Promtail must be a DaemonSet because pod logs live on the node running the pod (at /var/log/pods/ on disk). To collect logs from all pods, you need a collector on every node. Our cluster has 1 control plane and 5 workers, so Promtail will run on all 6.
Prerequisites
Loki uses local-path storage for its index and chunk files. NFS is unreliable for these because of file locking behavior. Install the local-path provisioner first:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
Then label the namespace to allow privileged pods (required for hostPath volumes):
kubectl label namespace local-path-storage \
pod-security.kubernetes.io/enforce=privileged \
pod-security.kubernetes.io/warn=privileged \
pod-security.kubernetes.io/audit=privileged \
--overwrite
Deploying Loki
Add the Grafana Helm repository:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
Create loki-values.yaml. We're using SingleBinary deployment mode because Loki can run as distributed microservices, but a single binary handles everything fine for a homelab. local-path storage keeps Loki's index and chunks off NFS. replication_factor: 1 means one copy, which is acceptable here.
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: filesystem
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
deploymentMode: SingleBinary
singleBinary:
replicas: 1
persistence:
enabled: true
storageClass: local-path
size: 10Gi
chunksCache:
enabled: false
resultsCache:
enabled: false
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
grafana:
enabled: false
helm install loki grafana/loki \
--namespace monitoring \
--values loki-values.yaml
kubectl get pods -n monitoring | grep loki
# loki-0 should show 2/2 Running (may take a minute)
Deploying Promtail
Promtail's config points it at Loki and adds a toleration to run on the control plane node. By default, Kubernetes prevents regular pods from scheduling on control plane nodes, but we want logs from control plane components too.
config:
clients:
- url: http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
helm install promtail grafana/promtail \
--namespace monitoring \
--values promtail-values.yaml
Verify it's running on all 6 nodes:
kubectl get pods -n monitoring -l app.kubernetes.io/name=promtail -o wide
# Should show one pod per node
Connecting Loki to Grafana
In Grafana at https://grafana.spatacoli.xyz, go to Connections > Data Sources > Add data source > Loki. Set the URL to http://loki-gateway.monitoring.svc.cluster.local and click Save & Test. You should see "Data source connected and labels found."
Querying Logs with LogQL
Open Grafana > Explore and select the Loki data source. LogQL syntax is familiar if you've used PromQL.
A basic label filter returns all logs from a namespace:
{namespace="monitoring"}
Filter by pod name with regex:
{namespace="monitoring", pod=~"prometheus-.*"}
Search within log content (|= means "contains"):
{namespace="monitoring"} |= "level=error"
Turn log counts into a metric, graphing error rate over time:
rate({namespace="monitoring"} |= "level=error" [5m])
That last query is particularly useful. You can add it to a Grafana dashboard alongside your Prometheus metrics, giving you error rate from logs as a panel.
The Real Debugging Workflow
Delete a pod so it restarts, then try the old approach:
kubectl delete pod -n monitoring <grafana-pod>
kubectl logs <that-same-pod>
# Error: pod not found
Now go to Grafana > Explore > Loki, query {namespace="monitoring", pod=~".*grafana.*"}, and set the time range to the last 30 minutes. The full shutdown sequence is there. Loki caught it even though the pod is gone.
Metrics tell you when something went wrong. Logs tell you why.
By default, Loki retains logs for 744 hours (31 days). To change this, add limits_config.retention_period to your loki-values.yaml and upgrade the Helm release.
What's Next
In Episode 9 we migrate from Promtail to Grafana Alloy โ Grafana's next-generation observability collector and the official supported replacement for Promtail. The Loki backend, your LogQL queries, and your Grafana dashboards stay exactly the same. Only the collection layer changes.