Proto commits in castai/kvisor

These 67 commits are when the Protocol Buffers files have changed:

Commit:250227a
Author:Ivaylo Papratilov
Committer:GitHub

feat: Add GPU metrics pipeline to kvisor (#663) * feat: Add GPU metrics pipeline to kvisor * Remove Chart.lock and dependency chart * fix: Run dcgm-exporter on GKE * Address CR feedback and add more tests * fix: Disable GPU pipeline in kvisor if gpu-metrics-exporters is installed * Fix bad conflict resolution * Remove infinite loop in dcgm-exporter pod

The documentation is generated from this commit.

Commit:e36f31b
Author:krystiancastai
Committer:GitHub

feat: k8s reverse tunnel proxy for remote API access (#654) Enables CAST AI backend to query the Kubernetes API of connected clusters without requiring direct network access. The kvisor controller establishes an outbound gRPC connection to CAST AI and forwards incoming HTTP requests to the local k8s API using a dedicated restricted ServiceAccount, streaming responses back through the tunnel. Key points: Read-only access enforced (GET only, blocked exec/attach/portforward/proxy subresources) Dedicated gRPC connection with keepalive, separate from the main castai client Short-lived SA tokens via TokenRequest API instead of mounted credentials Proxy RBAC resources provisioned automatically via Helm when enabled Enabled by setting cluster-proxy-enabled: "true" in controller.extraArgs

Commit:ae66986
Author:Krystian Bednarczuk

1

Commit:fd657a4
Author:Roman Melnyk
Committer:GitHub

add static CIDR to zone/region mapping via ConfigMap (#646)

Commit:3821819
Author:Patrick Pichler
Committer:Patrick Pichler

feat: add CloudVolumeID to block device metrics The block device metrics have been extended by the volume ID of the underlying hyperscaler disk it belongs to. It currently supports mapping for AWS (Xen, Nitro) and GCP. AWS Xen and GCP do rely on the controller having `cloud-provider-storage-sync-enabled` set to true, as both need information from the CSP to fully identify cloud volume IDs from local identifiers.

Commit:fd59ea2
Author:Ciprian Focsaneanu

Collect ephemeral storage usage per pod

Commit:8bd1dce
Author:Roman Melnyk
Committer:GitHub

Periodically refresh cluster network ranges (#622)

Commit:6c4277b
Author:Samu
Committer:GitHub

Collect cloud volume metrics from AWS (#620)

Commit:dfa4e9c
Author:Samuel Veloso
Committer:Samuel Veloso

Collect cloud volume metrics from AWS

Commit:1fbd28f
Author:Samuel Veloso
Committer:Samuel Veloso

Collect cloud volume metrics from AWS

Commit:9ca3f24
Author:Ciprian Focșăneanu
Committer:GitHub

Gather Kubernetes data about pod volumes and add K8s context to filesystem metrics (#613) * Gather Kubernetes data about pod volumes and add K8s context to filesystem metrics * Take into account block volumes as well * Filter pod volume metrics to only include PVC-backed volumes and exclude kube-system * Add RBAC permissions for PVCs and PVs in controller ClusterRole * Deduplicate filesystem metrics * Simplify FilesystemMetric and add in-tree volume source support * Add k8s_pod_volume_metrics to E2E test

Commit:aed57ba
Author:Roman Melnyk
Committer:GitHub

Improve netflow zone/region discovery (#614)

Commit:28148bd
Author:Roman Melnyk
Committer:GitHub

Integrate VPC metadata enrichment for netflow IP addresses (#611) - Added VPCIndex in-memory index for IP metadata lookups across VPCs, subnets, peered networks, and service ranges - Periodic refresh from cloud provider with configurable interval - gRPC endpoint GetIPsInfo enriches IPs with VPC metadata (region, zone, cloud service detection via domain) - Improved zone/region detection with kube nodes info

Commit:224af44
Author:Aleksandr Danilov
Committer:Aleksandr Danilov

add node configs scrapper

Commit:c72d9e7
Author:Aleksandr Danilov
Committer:Aleksandr Danilov

add node ID

Commit:bbdeb06
Author:Ciprian Focșăneanu
Committer:GitHub

Add support for collecting node storage metrics from nodes/proxy endpoint (#592) * Add support for collecting Kubernetes node storage metrics Implement collection of kubelet storage stats (ImageFs, ContainerFs) via stats/summary API. - Add GetNodeStatsSummary gRPC method to KubeAPI - Implement controller endpoint to proxy kubelet stats/summary - Add agent collection pipeline for node storage metrics - Write metrics to storage_node_metrics collection - Add tests and mocks for node stats functionality - Update helm chart configuration

Commit:330db05
Author:Andžej Maciusovič
Committer:GitHub

Keep historical ips index records (#591)

Commit:94f48a1
Author:Samu
Committer:GitHub

Send index manifest and digest in image metadata (#556)

Commit:21752bb
Author:krystiancastai
Committer:GitHub

REP-1617: Add storage metrics collection (#580) - Introducing storage monitoring capabilities, enabling collection of disk performance and filesystem utilization metrics for nodes. - Storage metrics collection is enabled via the `agent.extraArgs.storage-stats-enabled` chart flag. - New GetNode RPC endpoint in KubeAPI service in controller for retrieving node information and labels.

Commit:ea7a549
Author:Andžej Maciusovič
Committer:GitHub

Add image digest to runtime events (#581)

Commit:2b75f1f
Author:tautvydasliekis

feat: Add sustainability metrics pipeline

Commit:f0d3b9f
Author:Andžej Maciusovič
Committer:GitHub

Revert "Add image digest to runtime events (#573)" (#577) This reverts commit 508bccb67e73017b2b61a5595335af76817ae6f4.

Commit:508bccb
Author:Andžej Maciusovič
Committer:GitHub

Add image digest to runtime events (#573)

Commit:befab32
Author:Andžej Maciusovič
Committer:GitHub

Send file access (#568)

Commit:62e34e6
Author:Andžej Maciusovič
Committer:GitHub

Add netflow pid start time (#567)

Commit:667fea1
Author:Andžej Maciusovič
Committer:GitHub

Unify data ingestion (#561)

Commit:6f0d4f1
Author:Andžej Maciusovič
Committer:GitHub

Add security file open even tracing (#549)

Commit:aa518af
Author:Patrick Pichler
Committer:Patrick Pichler

Add ingress nightmare exploit detection signature There is a new signature capable of detecting attempts to exploit the ingress nightmare vulnerability (CVE-2025-1974). It works by hooking the `security_inode_follow_link` hook and validates is a `nginx` process is trying to resolve any `FD`s in `/proc/<pid>/fd/`. The signature is far from perfect, as it is pretty specific to the ingress nightmare case, but it works reliable, as the ingress process should not resolve `FD` symlinks.

Commit:35c1a69
Author:Patrick Pichler
Committer:Patrick Pichler

Calculate sha256 hash for MAGIC_WRITE events In order to have a more robust identicator for matching dropped files, files in the MAGIC_WRITE event are now enriched with the file hash.

Commit:7fffaad
Author:Patrick Pichler
Committer:Patrick Pichler

Implement basic git clone detection signature Git clones are a surprisingly good indicator of some sort of mischievous behavior (besides in CI infrastructure of course). There is now a dedicated signature for detecting `git clone` commands. It works by consuming `SchedProcessExec` event, figuring out if the executed command was `git` and try to parse the passed arguments to detect a potential clone + the used repository.

Commit:c72354d
Author:Andžej Maciusovič
Committer:GitHub

Events batching (#479)

Commit:71a466f
Author:anjmao

Try single goroutine for events batching

Commit:6babe0b
Author:anjmao

Debug invalid data

Commit:624d0e1
Author:anjmao
Committer:anjmao

Aggregate eBPF events

Commit:55aba86
Author:anjmao

Aggregate eBPF events

Commit:e8d2818
Author:Samu
Committer:GitHub

Add dual-stack support to netflows (#452)

Commit:84551b0
Author:Andžej Maciusovič
Committer:GitHub

Container stats collections (#448)

Commit:8a120bb
Author:Matas Kulkovas
Committer:GitHub

Send labels and annotations (#421) * collect selected labels and annotations from CRI API and send them with runtime events --------- Co-authored-by: Patrick Pichler <patrick@cast.ai>

Commit:69d47b5
Author:Samu
Committer:GitHub

Add process_pid and process_start_time to ebpf events (#381) * Add process_pid and process_start_time to ebpf events These two fields are needed to correlate data between events and process_tree_events tables. We can uniquely identify events in process_tree_events by using (container_id, process_pid, process_start_time) tuple.

Commit:5b5ace8
Author:Andžej Maciusovič
Committer:GitHub

Drop netflow ports info (#355)

Commit:a2abd83
Author:Patrick Pichler
Committer:Patrick Pichler

Add network details to DNS and SSH events Since we anyway get the source an destination addresses and ports from the underlying network packet, they are now also forwarded to the backend as part of the events.

Commit:88073dc
Author:Patrick Pichler
Committer:Patrick Pichler

Add basic SSH detection The SSH protocol does not give a lot to detect it by. A new kernel check has been added to detect the initial SSH hello messages.

Commit:1266495
Author:Andžej Maciusovič
Committer:GitHub

Kernel stdio via sock (#329)

Commit:4b3447a
Author:Patrick Pichler
Committer:Patrick Pichler

Expose simple eBPF metrics through prometheus There are now a set of simple metrics from eBPF, exposed via a prometheus metrics. eBPF communicates those counters through a newly added `metrics` map, that is a simple array of `u64`. We then take those and expose them to the world through prometheus metrics. This allows to detect potential issues, such as scratch buffer misses.

Commit:b41ddc5
Author:Andžej Maciusovič
Committer:GitHub

Disable signature engine by default (#317)

Commit:5c56bb3
Author:Patrick Pichler
Committer:Patrick Pichler

Add initial process tree support The process tree gives you context about running processes at a given time. They are used by the cast backend to provide more context to the user.

Commit:ff4cf88
Author:Patrick Pichler
Committer:Patrick Pichler

Remove deprecation of ImageMetadata architecture protobuf field

Commit:756a9dd
Author:Andžej Maciusovič
Committer:GitHub

Send deltas in batches (#290)

Commit:93e8cb9
Author:Andžej Maciusovič
Committer:GitHub

Detect exec from overlayfs upper layer (#287)

Commit:a86190b
Author:Domas Tamašauskas
Committer:Domas Tamašauskas

Take image architecture from the manifest instead of kvisor config

Commit:5bcce01
Author:Andžej Maciusovič
Committer:GitHub

Fix event workload uid mapping (#282)

Commit:9292dc0
Author:Andžej Maciusovič
Committer:GitHub

Fill missing netflow source zone (#280)

Commit:e08761e
Author:Anton Sankov
Committer:GitHub

Redact sensitive values in process exec args (#272)

Commit:7bf1446
Author:Andžej Maciusovič
Committer:GitHub

Send netflows (#271)

Commit:2e64c23
Author:Mantas Norvaiša
Committer:GitHub

Report agent enabled/disabled state in controller config (#270) Co-authored-by: Mantas Norvaisa <mantas.norvaisa@cast.ai>

Commit:94aa14f
Author:Andžej Maciusovič
Committer:GitHub

Initial netflow events export (#269)

Commit:e99e408
Author:Andžej Maciusovič
Committer:GitHub

Netflows ebpf tracing (#267)

Commit:32be554
Author:Andžej Maciusovič
Committer:GitHub

Add workload uid (#258)

Commit:a710d61
Author:Patrick Pichler
Committer:Patrick Pichler

Add SOCKS5 proxy detection There exists now a new signature that detects the use of SOCKS5 proxy servers/clients and reports it.

Commit:5e18782
Author:Patrick Pichler
Committer:Patrick Pichler

Add flow direction to DNS events In order to distinguish between incoming and outgoing DNS events, we now also record the flow direction. Additionally, the mock server now pretty prints the protobuf using the protojson library, instead of just the golang fmt library.

Commit:b49938e
Author:Andžej Maciusovič
Committer:GitHub

Send ctrl and agent configs (#254)

Commit:fbd07eb
Author:Andžej Maciusovič
Committer:GitHub

Fix socket state tracepoint task context (#252)

Commit:db1096f
Author:Patrick Pichler
Committer:Patrick Pichler

Extend signature event types with more data In order to better understand what is happening, the signature events reported to the backend now hold more info, such as fd of the tty opened, or the ip used for the stdio over socket detection.

Commit:d0e50c5
Author:Patrick Pichler
Committer:Patrick Pichler

Add logic to detect TTYs in containers Opening TTYs is in most cases something that one might want to investigate. There is now new logic in place, that hooks the `tty_open` function of the `tty` device drivers and reports an event to userspace, that is handled by a newly added signature called `tty_detected`. As of right now we always alert on any `tty_open` events, but this might change in the future.

Commit:209c5b5
Author:Patrick Pichler
Committer:Patrick Pichler

Add sha256 sum of executed binaries Exec events now feature a sha256 hash field, of the executed binary. File access is done via the `/proc` filesystem. In order to also catch short living processes, kvisor also tries accessing the file via other processes in the same mount namespace. For this to work in virtualised environments, such as kind, kvisor now also translates PIDs from the origin PID namespace, to the namespace it is running in.

Commit:28a482a
Author:Andžej Maciusovič
Committer:GitHub

Deltas fixes (#222)

Commit:b13c332
Author:Andžej Maciusovič
Committer:GitHub

Add runtime (#213)