Skip to main content

Objective

Set up Prometheus and Grafana for TON node metrics. kube-prometheus-stack is recommended because the chart includes a ServiceMonitor template for automatic scrape discovery.

Prerequisites

  1. Enable the metrics HTTP server in node config (config.json):
    {
      "metrics": {
        "address": "0.0.0.0:9100",
        "global_labels": {
          "network": "mainnet",
          "node_id": "my-node-0"
        }
      }
    }
    
    The server exposes /metrics (Prometheus format), /healthz (liveness), and /readyz (readiness). If metrics is absent, the server is not started.
  2. Set ports.metrics in Helm values:
    ports:
      metrics: 9100
    
    The port must match the metrics.address port in node config.

Network security

The metrics port is never exposed on public per-replica LoadBalancer services. The chart creates a dedicated internal <release>-metrics ClusterIP service instead, accessible only inside the cluster. External metrics access can be added with a custom LoadBalancer service that targets the metrics port. The recommended approach is an ingress with authentication (basic auth, OAuth2 proxy, and similar) that proxies to <release>-metrics.

Quick start

Minimal values to enable metrics, probes, and ServiceMonitor: Not runnable
ports:
  metrics: 9100

probes:
  startup:
    httpGet:
      path: /healthz
      port: metrics
    failureThreshold: 60
    periodSeconds: 10
  liveness:
    httpGet:
      path: /healthz
      port: metrics
    periodSeconds: 30
    failureThreshold: 3
  readiness:
    httpGet:
      path: /readyz
      port: metrics
    periodSeconds: 10
    failureThreshold: 3

metrics:
  serviceMonitor:
    enabled: true

ServiceMonitor configuration

Enable ServiceMonitor so kube-prometheus-stack discovers and scrapes node metrics automatically: Not runnable
metrics:
  serviceMonitor:
    enabled: true

Label matching

Some Prometheus Operator installations filter ServiceMonitor resources by labels (serviceMonitorSelector in the Prometheus custom resource). If a Prometheus instance requires labels: Not runnable
metrics:
  serviceMonitor:
    enabled: true
    labels:
      release: kube-prometheus-stack

Scrape interval

By default, ServiceMonitor inherits the global Prometheus scrape interval (typically 30s). To override: Not runnable
metrics:
  serviceMonitor:
    enabled: true
    interval: "15s"
    scrapeTimeout: "10s"

Cross-namespace monitoring

If Prometheus runs in a different namespace, set the ServiceMonitor namespace to the namespace where Prometheus looks: Not runnable
metrics:
  serviceMonitor:
    enabled: true
    namespace: monitoring
A namespaceSelector is added automatically so Prometheus can discover services in the release namespace.

Alternative: Prometheus annotations

If Prometheus Operator is not used and services are scraped through prometheus.io/* annotations: Not runnable
metrics:
  annotations:
    enabled: true
This adds prometheus.io/scrape, prometheus.io/port, and prometheus.io/path to the <release>-metrics ClusterIP service.

Alternative: static scrape config

For other Prometheus setups, the metrics endpoint is available through the internal ClusterIP service: <release>-metrics.<namespace>.svc.cluster.local

Grafana dashboard

The Grafana dashboard is authored as TypeScript with Grafana Foundation SDK and generated to JSON. Dashboard source is available in TON Rust Node Grafana source. Generated output file name is ton-node-overview.json. The dashboard uses two multi-select template variables:
  • network
  • node_id
These correspond to global_labels in node metrics config. Dashboard sections:
  • Node Status
  • Build Info
  • Transactions per second
  • Sync and Block Progress
  • Validation and Collation
  • Outbound Message Queue
  • Network
  • Database and Storage

Generate dashboard JSON

Run from the TON Rust Node repository root.
cd grafana
bun install
bun run generate
bun run generate writes ton-node-overview.json.

Import into Grafana

  1. Open Dashboards > New > Import.
  2. Upload ton-node-overview.json.
  3. Select a Prometheus data source.
  4. Click Import.

Edit workflow

  1. Edit dashboard TypeScript source files.
  2. Run bun run generate.
  3. Import the generated JSON and verify panels.
  4. Commit TypeScript source files. The generated JSON file is ignored by Git.

Alert rules

PrometheusRule resources can be created to trigger alerts based on TON node metrics.