This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Troubleshooting

Learn how to troubleshoot Tetragon

1 - System dump

Learn how to collect system dumps.

Before you report a problem, make sure to retrieve the necessary information from your cluster.

Tetragon’s bugtool captures potentially useful information about your environment for debugging. The tool is meant to be used for debugging a single Tetragon agent node but can be run automatically in a cluster. Note that in the context of Kubernetes, the command needs to be run from inside the Tetragon Pod’s container.

Key information collected by bugtool:

  • Tetragon configuration
  • Network configuration
  • Kernel configuration
  • eBPF maps
  • Process traces (if tracing is enabled)

Automatic Kubernetes cluster sysdump

You can collect information in a Kubernetes cluster using the Cilium CLI:

cilium-cli sysdump

More details can be found in the Cilium docs. The Cilium CLI sysdump command will automatically run tetra bugtool on each nodes where Tetragon is running.

Manual single node sysdump

It’s also possible to run the bug collection tool manually with the scope of a single node using tetra bugtool.

Kubernetes installation

  1. Identify the Tetragon Pod (<tetragon-namespace> is likely to be kube-system with the default install):

    kubectl get pods -n <tetragon-namespace> -l app.kubernetes.io/name=tetragon
    
  2. Execute tetra bugtool within the Pod:

    kubectl exec -n <tetragon-namespace> <tetragon-pod-name> -c tetragon -- tetra bugtool
    
  3. Retrieve the created archive from the Pod’s filesystem:

    kubectl cp -c tetragon <tetragon-namespace>/<tetragon-pod-name>:tetragon-bugtool.tar.gz tetragon-bugtool.tar.gz
    

Container installation

  1. Enter the Tetragon Container:

    docker exec -it <tetragon-container-id> tetra bugtool
    
  2. Retrieve the archive using docker cp:

    docker cp <tetragon-container-id>:/tetragon-bugtool.tar.gz tetragon-bugtool.tar.gz
    

Systemd host installation

  1. Execute tetra bugtool with Elevated Permissions:

    sudo tetra bugtool
    

Enable debug log level

When debugging, it might be useful to change the log level. The default log level is controlled by the log-level option at startup:

  • Enable debug level with --log-level=debug
  • Enable trace level with --log-level=trace

Change log level on Kubernetes

It is possible to change the log level of Tetragon’s DaemonSet Pods by setting tetragon.debug to true.

2 - Log level

Learn how to configure log levels.

When debugging, it might be useful to change the log level. The default log level is controlled by the log-level option at startup:

  • Enable debug level with --log-level=debug
  • Enable trace level with --log-level=trace

Change log level on Kubernetes

It is possible to change the log level of Tetragon’s DaemonSet Pods by setting tetragon.debug to true.

Change log level dynamically

It is possible to change the log level dynamically by using the tetra loglevel sub-command. tetra needs access to Tetragon’s gRPC server endpoint which can be configured via --server-address.

  • Get the current log level:

    tetra loglevel get
    
  • Dynamically change the log level. Allowed values are [trace|debug|info|warning|error|fatal|panic]:

    tetra loglevel set debug
    

3 - BPF programs statistics

Monitor BPF programs statistics

This page shows you how to monitor BPF programs statistics.

Concept

The BPF subsystem provides performance data for each loaded program and tetragon exports that in metrics or display that in terminal in top like tool.

In terminal

The tetra command allows to display loaded BPF programs in terminal with:

tetra debug progs

The default output shows tetragon programs only and looks like:

2024-10-31 11:12:45.94715546 +0000 UTC m=+8.038098448

Ovh(%)  Id      Cnt     Time    Name                            Pin
  0.00  22201   0       0       event_execve                    /sys/fs/bpf/tetragon/__base__/event_execve/prog
  0.00  22198   0       0       event_exit_acct_process         /sys/fs/bpf/tetragon/__base__/event_exit/prog
  0.00  22200   0       0       event_wake_up_new_task          /sys/fs/bpf/tetragon/__base__/kprobe_pid_clear/prog
  0.00  22207   0       0       tg_cgroup_rmdir                 /sys/fs/bpf/tetragon/__base__/tg_cgroup_rmdir/prog
  0.00  22206   0       0       tg_kp_bprm_committing_creds     /sys/fs/bpf/tetragon/__base__/tg_kp_bprm_committing_creds/prog
  0.00  22221   0       0       generic_kprobe_event            /sys/fs/bpf/tetragon/syswritefollowfdpsswd/generic_kprobe/__x64_sys_close/prog
  0.00  22225   0       0       generic_kprobe_event            /sys/fs/bpf/tetragon/syswritefollowfdpsswd/generic_kprobe/__x64_sys_write/prog
  0.00  22211   0       0       generic_kprobe_event            /sys/fs/bpf/tetragon/syswritefollowfdpsswd/generic_kprobe/fd_install/prog

The fields have following meaning:

  • Ovh is system wide overhead of the BPF program
  • Id is global BPF ID of the program (as shown by bpftool prog)
  • Cnt is count with number of BPF program executions
  • Time is sum of the time of all BPF program executions
  • Pin is BPF program pin path in bpfffs

It’s possible to display all BPF programs with --all:

tetra debug progs --all

That has following output:

2024-10-31 11:19:37.720137195 +0000 UTC m=+7.165535117

Ovh(%)  Id      Cnt     Time    Name            Pin
  0.00  159     2       82620   event_execve    -
  0.00  171     68      18564   iter            -
  0.00  158     2       10170   event_wake_up_n -
  0.00  164     2       4254    tg_kp_bprm_comm -
  0.00  157     2       3868    event_exit_acct -
  0.00  97      2       1680                    -
  0.00  35      2       1442                    -
  0.00  83      0       0       sd_devices      -
  0.00  9       0       0                       -
  0.00  7       0       0                       -
  0.00  8       0       0                       -
  0.00  87      0       0       sd_devices      -
...

The bpffs mount and iterator object path are auto detected by default, but it’s possible to override them with –bpf-lib and and –bpf-lib options, like:

kubectl exec -ti -n kube-system tetragon-66rk4 -c tetragon -- tetra debug progs --bpf-dir /run/cilium/bpffs/tetragon/ --all --bpf-lib /var/lib/tetragon/

Note that there are other options to customize the behaviour:

tetra debug progs --help
Retrieve information about BPF programs on the host.

Examples:
- tetragon BPF programs top style
  # tetra debug progs
- all BPF programs top style
  # tetra debug progs --all
- one shot mode (displays one interval data)
  # tetra debug progs --once
- change interval to 10 seconds
  # tetra debug progs  --timeout 10
- change interval to 10 seconds in one shot mode
  # tetra debug progs --once --timeout 10

Usage:
  tetra debug progs [flags]

Aliases:
  progs, top

Flags:
      --all              Get all programs
      --bpf-dir string   Location of bpffs tetragon directory (auto detect by default)
      --bpf-lib string   Location of Tetragon libs, btf and bpf files (auto detect by default)
  -h, --help             help for progs
      --no-clear         Do not clear screen between rounds
      --once             Run in one shot mode
      --timeout int      Interval in seconds (delay in one shot mode) (default 1)

Metrics

The BPF subsystem provides performance data for each loaded program and tetragon exports that in metrics.

For each loaded BPF program we get:

  • run count which counts how many times the BPF program was executed
  • run time which sums the time BPF program spent in all its executions

Hence for each loaded BPF program we export 2 related metrics:

  • tetragon_overhead_time_program_total[namespace,policy,sensor,attach]
  • tetragon_overhead_cnt_program_total[namespace,policy,sensor,attach]

Each loaded program is identified by labels:

  • namespace is policy Kubernetes namespace
  • policy is policy name
  • sensor is sensor name
  • attach is program attachment name

If we have generic_kprobe sensor attached on __x64_sys_close kernel function under syswritefollowfdpsswd policy, the related metrics will look like:

tetragon_overhead_program_runs_total{attach="__x64_sys_close",policy="syswritefollowfdpsswd",policy_namespace="",sensor="generic_kprobe"} 15894
tetragon_overhead_program_seconds_total{attach="__x64_sys_close",policy="syswritefollowfdpsswd",policy_namespace="",sensor="generic_kprobe"} 1.03908217e+08

Limitations

Note that the BPF programs statistics are not enabled by default, because they introduce extra overhead, so it’s necessary to enable them manually.

  • Either with sysctl:

    sysctl kernel.bpf_stats_enabled=1
    

    and make sure you disable the stats when it’s no longer needed:

    sysctl kernel.bpf_stats_enabled=0
    
  • Or with following tetra command:

    tetra debug enable-stats
    ^C
    

    where the stats are enabled as long as the command is running (sleeping really).