Monitoring a Kubernetes Cluster with OpenTelemetry Collector: Agent + Gateway Pattern That Actually Works

The Problem: One Collector Per Cluster Is a Reliability Trap

The single-collector setup is almost always how this starts: one DaemonSet or Deployment running a monolithic OpenTelemetry Collector that scrapes node metrics, receives traces from instrumented apps, and ships everything to your backend. It works fine right up until it doesn’t. One pipeline misconfiguration — say, a regex scrub rule that burns CPU on high-cardinality trace attributes — and the collector process OOMs. Every metric, every trace, every log from every node: gone simultaneously. There’s no blast radius containment because there’s no architectural separation.

The failure mode is especially sharp on self-hosted and home-lab clusters where you’re not running multiple collector replicas behind a load balancer. A single pod with a memory limit of 512Mi hits a traffic spike during a deployment, the kernel OOM killer fires, and you lose observability exactly when you needed it most — during the rollout you were trying to watch. The monolithic collector is a single point of failure dressed up as a convenience.

The agent + gateway split fixes this structurally. Agents run as a DaemonSet — one collector pod per node — and their only jobs are local scraping (kubelet metrics, cAdvisor, node exporters) and short-term buffering. They’re intentionally lightweight. The gateway runs as a separate Deployment (typically two replicas minimum) and handles everything expensive: batching, retry logic, tail sampling, and the actual export to Prometheus remote write, Grafana Cloud OTLP, or wherever your backend lives. A crashed agent takes out observability for one node. A crashed gateway replica doesn’t take out collection — agents keep buffering locally until the gateway recovers.

# agent config: scrape local kubelet, forward to gateway only
exporters:
  otlp:
    endpoint: "otel-gateway.monitoring.svc.cluster.local:4317"
    tls:
      insecure: true  # internal cluster traffic, terminate TLS at gateway

receivers:
  kubeletstats:
    collection_interval: 15s
    auth_type: serviceAccount
    endpoint: "${K8S_NODE_NAME}:10250"
    insecure_skip_verify: true

service:
  pipelines:
    metrics:
      receivers: [kubeletstats]
      exporters: [otlp]  # never export to backend directly from agent

The concern separation also makes debugging tractable. If your Prometheus remote write endpoint starts rejecting samples, you debug the gateway config in one place instead of hunting through every node’s collector logs. If one node’s kubelet metrics are spiking cardinality, you adjust that agent’s pipeline without touching anything else. This maps cleanly to how other automation systems isolate concerns — the same logic applies if you’re wiring observability signals into external workflows; Workflow Automation in 2026: n8n, Zapier, and Self-Hosted Pipelines covers how those handoff points work when your cluster signals need to trigger downstream pipeline steps.

One gotcha that doesn’t appear in the OpenTelemetry Collector docs until you’ve already been burned: agents buffering to disk (via the file_storage extension) need a PersistentVolume or at minimum a hostPath mount, otherwise a pod restart loses whatever was queued. Most example configs skip this entirely and leave you with in-memory queuing only — fine for traces with short TTLs, quietly catastrophic for metrics you’re trying to preserve across a gateway outage.

Architecture Before You Touch a YAML File

The pattern that actually works at scale splits responsibility along a hard boundary: agents are dumb collectors with tight resource limits, the gateway is where all the expensive processing happens. Conflating these two roles into a single collector deployment is the most common mistake I see in OTel setups — you either end up with fat DaemonSet pods starving your workloads, or you lose the HA guarantees you need on the processing side.

The agent runs as a DaemonSet — one pod per node, no exceptions. Its job is narrow: pull kubelet metrics via the hostMetrics receiver, tail container logs via the filelog receiver, and accept spans from app pods on localhost over OTLP gRPC port 4317. That last part is important — pods send to localhost:4317 because the agent is guaranteed to be on the same node, which avoids cross-node hops for span traffic. Processing on the agent should be near-zero: maybe a memory_limiter and a basic resourcedetection processor for node-level labels, nothing else. A 128Mi memory limit is realistic for this role and you should enforce it. Agents that balloon past that are usually doing work they shouldn’t be doing.

# agent memory_limiter — keep it lean
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 110        # soft ceiling below the 128Mi pod limit
    spike_limit_mib: 20   # headroom for burst without OOMKill

  resourcedetection:
    detectors: [env, k8snode]  # node name + any env vars you inject
    timeout: 5s
    override: false

The gateway is a regular Deployment — run at least two replicas for HA, with a HorizontalPodAutoscaler if your span volume is spiky. It receives OTLP from every agent over gRPC, and this is where the real pipeline work happens: k8sattributes processor to enrich with pod/namespace/deployment labels, a batch processor to reduce backend write pressure, and tail-based sampling if you’re running traces at any meaningful volume. All your exporters — Prometheus remote write, Jaeger, Loki — live here, not on the agent. The gateway is the single egress point, which means you can change backends without redeploying anything on your nodes.

Traffic flow is strictly unidirectional: pod → agent (4317) → gateway (4317) → backend. That unidirectionality is load-bearing. The moment you introduce a feedback loop — say, a gateway that tries to scrape something that also pushes to the gateway — you get circular pipeline failures that are genuinely hard to debug because they only manifest under backpressure. The Prometheus Operator is the reason a lot of teams end up in this trap: they already have it running, it handles metrics fine, so they bolt OTel on top without rethinking the flow. The fundamental problem with stopping at the Prometheus Operator is that it doesn’t handle traces or logs at all, and its scrape interval model (default 15s, minimum practical ~5s) is the wrong primitive for span data where you need sub-second decisions on sampling. The OTel Collector handles all three signals — metrics, logs, traces — in one binary with one config surface, which is why the agent/gateway split on top of it beats a hybrid Prometheus + something-else approach once you have more than one signal to care about.

Deploying the Agent DaemonSet: Config and Gotchas

The most expensive mistake you can make with the agent tier is under-provisioning memory on nodes that run GPU workloads. Set resources.limits.memory: 256Mi and resources.requests.memory: 128Mi — GPU pods on the same node will spike memory pressure during model loading, and a collector agent with no headroom will get OOM-killed silently. You won’t see a crash loop because the DaemonSet restarts it immediately, but you’ll have gaps in your metrics that look like a scrape timing issue until you check kubectl describe pod on the agent and see the OOM eviction history.

Install with the official Helm chart at version 0.97 or later — earlier versions had issues with the filelog receiver’s multiline parsing defaults that caused partial log lines to flush incorrectly:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

helm upgrade --install otel-agent open-telemetry/opentelemetry-collector \
  --namespace monitoring \
  --create-namespace \
  --version 0.97.0 \
  --values agent-values.yaml

The agent-values.yaml core receiver block looks like this in practice. The kubeletstats receiver endpoint using the node name env var is not optional — hardcoding a node IP breaks pod scheduling entirely:

mode: daemonset

resources:
  limits:
    memory: 256Mi
    cpu: 200m
  requests:
    memory: 128Mi
    cpu: 50m

config:
  receivers:
    hostmetrics:
      collection_interval: 30s
      scrapers:
        cpu: {}
        memory: {}
        network: {}
        filesystem:
          # exclude tmpfs and overlay to avoid noise from container layers
          exclude_mount_points:
            match_type: regexp
            mount_points: ["/dev.*", "/sys.*", "/proc.*", "overlay", "tmpfs"]

    kubeletstats:
      collection_interval: 30s
      auth_type: serviceAccount
      # requires K8S_NODE_NAME env var injected via fieldRef
      endpoint: "https://${env:K8S_NODE_NAME}:10250"
      insecure_skip_verify: true
      metric_groups:
        - node
        - pod
        - container

    filelog:
      include:
        - /var/log/pods/*/*/*.log
      include_file_path: true
      include_file_name: false
      operators:
        - type: kubernetes_metadata
          # enriches log records with pod name, namespace, container name, labels
        - type: json_parser
          timestamp:
            parse_from: attributes.time
            layout: "%Y-%m-%dT%H:%M:%S.%LZ"

  exporters:
    otlp:
      endpoint: otel-gateway.monitoring.svc.cluster.local:4317
      compression: gzip
      sending_queue:
        enabled: true
        num_consumers: 4
        queue_size: 300
      retry_on_failure:
        enabled: true
        initial_interval: 5s
        max_interval: 30s

  service:
    pipelines:
      metrics:
        receivers: [hostmetrics, kubeletstats]
        exporters: [otlp]
      logs:
        receivers: [filelog]
        exporters: [otlp]

The kubeletstats 403 gotcha will burn you if you assume the Helm chart handles RBAC completely. It creates a ServiceAccount and basic Role, but does not create a ClusterRole for nodes/stats and nodes/proxy. The failure mode is subtle: the collector starts successfully, the DaemonSet reports Ready, but kubeletstats silently drops metrics. You’ll only see it if you’re watching collector logs directly — it surfaces as repeated failed to scrape: 403 Forbidden lines, not as a container that won’t start. Create this manually:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-agent-kubeletstats
rules:
  - apiGroups: [""]
    resources:
      - nodes/stats
      - nodes/proxy
      - nodes/metrics
    verbs: ["get", "list", "watch"]
  # needed by kubernetes_metadata operator in filelog
  - apiGroups: [""]
    resources: [pods, namespaces]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-agent-kubeletstats
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-agent-kubeletstats
subjects:
  - kind: ServiceAccount
    name: otel-agent-opentelemetry-collector  # default name from Helm chart
    namespace: monitoring

The sending_queue size of 300 on the OTLP exporter is deliberate. Gateway restarts during upgrades or rollouts can take 15–30 seconds, and at 30-second collection intervals across a node with 40+ pods, you’ll queue up a meaningful number of metric batches before the gateway is back. 300 entries covers that window without letting the agent balloon in memory — the queue is in-memory, so a 300-item queue on a busy node adds roughly 20–30MB depending on cardinality. If your nodes run dense, high-cardinality workloads, watch the agent’s own memory usage via kubectl top pod in the monitoring namespace and tune the limit up before hitting the OOM wall.

Deploying the Gateway: Batching, Retries, and Memory Limiter

The most counterintuitive thing about the gateway’s memory_limiter processor is that its position is non-negotiable — it must be first in every pipeline, before batching, before sampling, before anything else. If you put it second, the batch processor has already allocated memory for a full buffer before the limiter gets a chance to refuse incoming data. By the time the limiter fires, you’re already in an OOM spiral on a pod with a 512Mi limit. The processor ordering in OTEL Collector config isn’t cosmetic; the runtime processes them in array order.

Here’s the Helm values block I use for the gateway deployment:

# values-gateway.yaml
mode: deployment
replicaCount: 2

resources:
  limits:
    memory: 512Mi
    cpu: "1"
  requests:
    memory: 256Mi
    cpu: "250m"

config:
  processors:
    memory_limiter:
      limit_mib: 400
      spike_limit_mib: 80
      check_interval: 5s

    batch:
      send_batch_size: 8192
      send_batch_max_size: 10000
      timeout: 10s

  service:
    pipelines:
      metrics:
        processors: [memory_limiter, batch]  # order matters
      traces:
        processors: [memory_limiter, tail_sampling, batch]

The spike_limit_mib: 80 gives you a cushion between the soft limit (400Mi) and the hard container limit (512Mi). Without that headroom, a sudden burst of spans can push the process over the cgroup limit before the check interval fires. With check_interval: 5s, the limiter polls every five seconds — coarse, but cheap. If you set it to 1s you’ll see CPU overhead in the collector metrics itself.

On batching: the collector’s default send_batch_size is 8192 items per the current docs, but depending on the chart version you’re pulling, it may default to something far smaller — I’ve seen deployments shipping 200-item batches to a Prometheus remote write endpoint, which means hundreds of HTTP POSTs per minute for any reasonably active cluster. That hammers Mimir’s ingester with tiny requests and inflates your HTTP overhead ratio badly. The config above sets send_batch_max_size: 10000 as a ceiling so a single goroutine flush can’t send an unbounded payload if a queue backs up, and timeout: 10s forces a flush even if the batch isn’t full — useful during low-traffic windows so metrics don’t sit in the buffer stale.

For Prometheus remote write to a self-hosted Mimir instance, the exporter config looks like this:

exporters:
  prometheusremotewrite:
    endpoint: http://mimir.monitoring.svc:9009/api/v1/push
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    # tls is optional if you're terminating inside the cluster mesh
    tls:
      insecure: true

The retry_on_failure block is the part most people skip on first deploy and then regret. Without it, a Mimir pod restart or a brief network hiccup between namespaces silently drops whatever was in the exporter’s send queue. With max_elapsed_time: 300s, the collector will retry for up to five minutes before giving up on a batch — which covers most rolling restarts cleanly. Note that retried data is held in memory, so size your limit_mib with that in mind if your backend goes down for extended periods.

Tail-based sampling is where the gateway topology gets sticky. The tail_sampling processor needs to see all spans from a single trace before it can make a keep/drop decision — so if trace abc123 has spans landing on both gateway pod replicas, neither pod has enough information to evaluate the policy. The standard approach is sticky routing from the agent side using the loadbalancing exporter, which hashes on trace ID:

# agent config — routes traces to gateway by trace ID hash
exporters:
  loadbalancing:
    routing_key: traceID
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      dns:
        hostname: otel-gateway.monitoring.svc.cluster.local
        port: 4317

With this, every span for a given trace ID routes to the same gateway replica regardless of which node the agent pod lives on. Then on the gateway, the tail sampling policy can do the useful thing — keep 100% of traces that contain at least one error span, and sample down to 5% of clean traces:

processors:
  tail_sampling:
    decision_wait: 10s   # wait up to 10s for all spans before deciding
    num_traces: 50000    # in-memory trace buffer; tune to your trace volume
    policies:
      - name: keep-errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: sample-healthy
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

The decision_wait: 10s means the processor holds spans in memory for up to 10 seconds before committing a sampling decision. Combined with num_traces: 50000, you can do the math on how much memory this buffer consumes relative to your average spans-per-trace count. High-cardinality microservice traces with 200+ spans per trace will eat that budget much faster than a simple 3-service path. If you see the gateway’s heap climbing toward the limit_mib threshold under load, reducing num_traces or tightening decision_wait is the first dial to turn.

Kubernetes-Specific Receivers Worth Enabling

The receiver placement question trips up almost every first-time OTel deployment on Kubernetes: the k8s_cluster receiver belongs on the gateway, full stop. It talks directly to the Kubernetes API server to pull pod phase transitions, node conditions, and deployment replica counts — that’s cluster-wide state, not node-local telemetry. If you accidentally run it on every agent DaemonSet pod (easy to do when you’re copying config between components), you’ll end up with the same metric emitted from a dozen sources simultaneously, each with a slightly different collection timestamp. Prometheus or your OTLP backend will either deduplicate them incorrectly or surface false gaps. Run exactly one instance on the gateway, and give that gateway pod a service account with list and watch on pods, nodes, namespaces, and replicasets.

receivers:
  k8s_cluster:
    collection_interval: 30s
    node_conditions_to_report:
      - Ready
      - MemoryPressure
      - DiskPressure
    allocatable_types_to_report:
      - cpu
      - memory
    # auth_type defaults to serviceAccount — works inside the cluster
    # Do NOT add this block to your agent config

The k8sevents receiver is the sleeper feature most people ignore until they’ve spent an hour grepping through logs for why a pod won’t start. It surfaces Kubernetes events — OOMKills, BackOff loops, failed scheduling decisions, image pull failures — as structured log records with resource attributes. Route these into Loki with a pipeline that tags them by k8s.namespace.name and k8s.pod.name, and suddenly you can run a single LogQL query that shows both the application’s stdout and the Kubernetes event that preceded the crash. The key config detail is the auth_type: serviceAccount plus a ClusterRole that grants get and list on the events resource. Without the resource attribute filter on the Loki exporter side, events from all namespaces land in the same stream and become useless noise.

receivers:
  k8sevents:
    auth_type: serviceAccount

exporters:
  loki:
    endpoint: http://loki-gateway:3100/loki/api/v1/push
    default_labels_enabled:
      exporter: false
      job: true
    # resource attributes promoted to Loki stream labels:
    resource_to_labels_mappings:
      k8s.namespace.name: namespace
      k8s.object.kind: kind
      k8s.object.name: object_name

service:
  pipelines:
    logs/k8sevents:
      receivers: [k8sevents]
      processors: [memory_limiter, batch]
      exporters: [loki]

The resourcedetection processor with detectors: [k8snode, env] is the piece that makes cross-signal correlation actually work in Grafana. Without it, a metric arrives tagged with a pod name but nothing that tells you which node it ran on or which cloud region that node lives in. The k8snode detector reads the node name from the downward API (you expose it as K8S_NODE_NAME env var on the agent pod), then enriches every span, metric, and log with k8s.node.name. The env detector picks up whatever you’ve set in OTEL_RESOURCE_ATTRIBUTES — this is where you inject cluster name since there’s no native detector for it yet. Drop this processor early in every pipeline on both the agent and gateway, or you’ll find metrics that correlate fine within a single node but fall apart when you try to build a node-comparison dashboard.

processors:
  resourcedetection:
    detectors: [k8snode, env]
    k8snode:
      # reads NODE_NAME from downward API env injection
      node_from_env_var: K8S_NODE_NAME
    timeout: 2s
    override: false  # don't overwrite attributes set by the app itself

# In your agent DaemonSet spec:
# env:
#   - name: K8S_NODE_NAME
#     valueFrom:
#       fieldRef:
#         fieldPath: spec.nodeName
#   - name: OTEL_RESOURCE_ATTRIBUTES
#     value: "k8s.cluster.name=prod-us-east-1"

One gotcha with the k8s_cluster receiver: it emits metrics under the k8s. namespace but the exact metric names shifted between OTel Collector versions around 0.80. If you’re referencing dashboards built against an older collector, you may find deployment replica metrics named k8s.deployment.desired vs k8s.deployment.desired_pods depending on what version generated them. Pin your collector image tag explicitly and validate metric names with a one-off debug export to debug exporter (verbosity: detailed) before wiring up Grafana panels — saves a lot of “why is this panel empty” time.

Sizing, Resource Reality, and When This Pattern Breaks Down

The numbers here are smaller than most blog posts admit. On a three-node home-lab cluster — 8 cores, 32GB RAM per node — the agent DaemonSet pods sit at roughly 50–80Mi RSS and 0.05–0.1 CPU cores at steady state with the k8sattributes, hostmetrics, and kubeletstats receivers active. The gateway Deployment pair (two replicas) lands at 150–300Mi RAM depending on trace and log volume. Those numbers sound fine until you enable the filelog receiver without a max_log_size cap on a node running chatty Java services — memory climbs past 600Mi per agent within hours as the receiver buffers partial reads. Always set this:

receivers:
  filelog:
    include: [/var/log/pods/*/*/*.log]
    operators:
      - type: recombine
        is_last_entry: 'body matches "\\n$"'
    # Without this, a single runaway container will OOM your agent
    max_log_size: 1MiB

The gateway pattern solves the wrong problem if your observability backend can’t absorb what the gateway is forwarding. A self-hosted Prometheus with a 15-second scrape interval and no remote write configured is almost always the actual bottleneck — not the collector. Once you’re retaining more than 30 days of metrics, the local Prometheus TSDB becomes a liability: compaction stalls, WAL replay on restart takes minutes, and head block size blows up. Mimir (single-binary mode) or Thanos sidecar are the practical exits. Mimir’s single-binary mode is underrated for home-lab — one pod, object storage backend, and you get multi-tenancy and long-term retention without the Thanos component sprawl. The gateway’s prometheusremotewrite exporter points at Mimir and you’re done:

exporters:
  prometheusremotewrite:
    endpoint: "http://mimir:9009/api/v1/push"
    headers:
      X-Scope-OrgID: homelab
    tls:
      insecure: true  # internal cluster, no cert needed
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_elapsed_time: 120s

Single-node k3s or a local kind cluster doesn’t need this pattern. Two Helm releases, separate ConfigMaps, a Service for gateway ingestion — that’s real operational overhead when you’re the only person maintaining it and the cluster has four pods running. A single collector Deployment with hostmetrics, k8s_cluster, and kubeletstats receivers combined is simpler and covers every signal you’ll realistically query. The agent + gateway split pays for itself when you have three or more nodes, multiple teams writing to the cluster, or a backend that needs traffic shaping before it receives data. Before that threshold, it’s complexity theater.

The debug workflow that actually catches misconfiguration before it silently drops your data: wire a debug exporter with verbosity: detailed into a parallel pipeline that mirrors your production receivers, then check the zpages endpoint. You don’t need to redeploy anything in production to see what’s flowing:

exporters:
  debug:
    verbosity: detailed        # logs full telemetry to collector stdout
    sampling_initial: 5        # first 5 per second, then throttle
    sampling_thereafter: 20

service:
  extensions: [zpages]
  pipelines:
    metrics/debug:
      receivers: [prometheus]  # same receiver, separate pipeline
      processors: [memory_limiter]
      exporters: [debug]

extensions:
  zpages:
    endpoint: 0.0.0.0:55679   # port-forward this to localhost

Then from your workstation:

kubectl port-forward -n otel svc/otel-gateway 55679:55679
# Open: http://localhost:55679/debug/pipelinez

The /debug/pipelinez page shows accepted, refused, and dropped counts per pipeline in real time. Refused means your processor rejected it (often a memory limiter threshold hit). Dropped means the exporter failed and the retry budget ran out. That distinction matters — refused data never left the agent; dropped data made it to the gateway and then got lost trying to reach the backend. If you’re seeing drops and not refused counts, the problem is downstream, not in your pipeline config.

Validating the Stack End-to-End Before You Trust It

The most common mistake after wiring up an agent/gateway pipeline is trusting the absence of errors as proof it works. Collector logs staying quiet and Prometheus targets showing green means nothing if your tail sampler is silently dropping everything or the k8sattributes processor can’t call the API server. Verify the full signal path before you put any workload under this stack.

Firing a Synthetic Trace Span

otel-cli is a statically compiled binary that speaks the OTLP gRPC protocol directly — no SDK, no app code, no sidecars. Install it on any jump box or run it inside the cluster as a one-shot pod, then fire a span at the agent’s ClusterIP:

# Run from inside the cluster — replace agent-svc with your actual Service name
otel-cli span \
  --name test-span \
  --service my-app \
  --endpoint http://agent-svc:4317 \
  --tp-required false \
  --verbose

The --verbose flag prints the gRPC status response so you can distinguish “agent accepted” from “connection refused” immediately. If the span doesn’t appear in Jaeger or Tempo after about 10 seconds, resist the urge to blame the agent first. Check the gateway’s tail sampler policy. A tail sampler configured with policy: probabilistic, sampling_percentage: 10 will silently discard 90% of traffic, including your synthetic span if it lands in a sampled-out trace ID bucket. Temporarily set sampling_percentage: 100, refire the span, then tune back down. If it still doesn’t appear, check whether the gateway’s OTLP exporter is targeting the right backend address — a wrong hostname fails silently unless you have verbosity: detailed set under the exporter’s sending_queue block.

Metric Pipeline Integrity via Internal Telemetry

Both the agent and the gateway expose their own Prometheus metrics on port 8888 by default. The two counters worth watching are otelcol_receiver_accepted_metric_points and otelcol_exporter_sent_metric_points, scoped per receiver/exporter label. A healthy pipeline has these tracking each other closely. When accepted exceeds sent for longer than a rolling 60-second window, data is being dropped inside the collector — not lost on the wire, but discarded after ingestion.

# Prometheus query to catch sustained drop gaps on the gateway
(
  rate(otelcol_receiver_accepted_metric_points{job="otelcol-gateway"}[2m])
  -
  rate(otelcol_exporter_sent_metric_points{job="otelcol-gateway"}[2m])
) > 0

A non-zero result here almost always points to the memory limiter processor. The default memory_limiter config in most example pipelines sets limit_mib: 512 — fine for small clusters, but under real scrape load with hundreds of pods the gateway resident set climbs fast. Either raise limit_mib to match the container’s actual memory limit (leave 20% headroom), or reduce the agent’s scrape interval from 15s to 30s if you have flexibility there. The other thing to check is otelcol_exporter_send_failed_metric_points: that counter increments when the downstream backend (Prometheus remote write, Cortex, Mimir) is rejecting writes, which looks identical to a memory limiter problem in dashboards unless you separate the two counters.

Log Pipeline Label Smoke Test

The log path has a failure mode that produces no errors anywhere: the k8sattributes processor runs, calls the API server to enrich log records, fails due to an RBAC gap, and then passes the log through without the Kubernetes metadata labels rather than dropping it or logging a warning. Logs arrive in Loki, they look fine, but every query that filters on kubernetes.pod_name or kubernetes.namespace_name returns nothing.

# Write a known log line from an arbitrary pod
kubectl exec -n your-ns deploy/any-deployment -- \
  sh -c 'echo "otel-label-test-$(date +%s)"'

# Then query Loki — requires logcli or the Loki HTTP API
logcli query '{namespace="your-ns"}' --limit=5 --output=jsonl | \
  jq '.labels | has("kubernetes.pod_name")'

If the has() check returns false, the processor isn’t attaching metadata. The most common cause is a missing ClusterRole binding. The k8sattributes processor needs get, watch, and list on pods, namespaces, nodes, and replicasets — that last one catches people because it’s needed for resolving owner references back to Deployments. A quick way to confirm it’s an RBAC problem rather than a misconfigured processor is to check the collector pod logs for lines containing k8sattributes and Forbidden; those only appear at startup when the processor does its initial list call. If the collector started cleanly but labels are still missing, the issue is more likely the extract.metadata block in your processor config not listing k8s.pod.name explicitly.


Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.


Eric Woo

Written by Eric Woo

Self-Hosted AI & Automation Engineer

Eric runs his own self-hosted stack: local LLM pipelines on Ollama with dual-model VRAM scheduling on a single 32GB workstation, n8n workflows in Docker, and a TypeScript automation engine that publishes to WordPress on cron. He writes about the systems he actually operates — configs, failure modes, and GPU bills included.

Leave a Comment