splunk_escuAnomaly

Kubernetes Process with Resource Ratio Anomalies

The following analytic detects anomalous changes in resource utilization ratios for processes running on a Kubernetes node. It leverages process metrics collected via an OTEL collector and hostmetrics receiver, analyzed through Splunk Observability Cloud. The detection uses a lookup table containing average and standard deviation values for various resource ratios (e.g., CPU:memory, CPU:disk operations). Significant deviations from these baselines may indicate compromised processes, malicious activity, or misconfigurations. If confirmed malicious, this could signify a security breach, allowing attackers to manipulate workloads, potentially leading to data exfiltration or service disruption.

Detection Query

| mstats avg(process.*) as process.* where `kubernetes_metrics` by host.name k8s.cluster.name k8s.node.name process.executable.name span=10s | eval cpu:mem = 'process.cpu.utilization'/'process.memory.utilization' | eval cpu:disk = 'process.cpu.utilization'/'process.disk.operations' | eval mem:disk = 'process.memory.utilization'/'process.disk.operations' | eval cpu:threads = 'process.cpu.utilization'/'process.threads' | eval disk:threads = 'process.disk.operations'/'process.threads' | eval key = 'k8s.cluster.name' + ":" + 'host.name' + ":" + 'process.executable.name' | lookup k8s_process_resource_ratio_baseline key | fillnull | eval anomalies = "" | foreach stdev_* [ eval anomalies =if( '<<MATCHSTR>>' > ('avg_<<MATCHSTR>>' + 4 * 'stdev_<<MATCHSTR>>'), anomalies + "<<MATCHSTR>> ratio higher than average by " + tostring(round(('<<MATCHSTR>>' - 'avg_<<MATCHSTR>>')/'stdev_<<MATCHSTR>>' ,2)) + " Standard Deviations. <<MATCHSTR>>=" + tostring('<<MATCHSTR>>') + " avg_<<MATCHSTR>>=" + tostring('avg_<<MATCHSTR>>') + " 'stdev_<<MATCHSTR>>'=" + tostring('stdev_<<MATCHSTR>>') + ", " , anomalies) ] | eval anomalies = replace(anomalies, ",\s$", "") | where anomalies!="" | stats count values(anomalies) as anomalies by host.name k8s.cluster.name k8s.node.name process.executable.name | where count > 5 | rename host.name as host | `kubernetes_process_with_resource_ratio_anomalies_filter`

Author

Matthew Moore, Splunk

References

https://github.com/signalfx/splunk-otel-collector-chart

Raw Content

name: Kubernetes Process with Resource Ratio Anomalies
id: 0d42b295-0f1f-4183-b75e-377975f47c65
version: 9
creation_date: '2024-01-10'
modification_date: '2026-05-13'
author: Matthew Moore, Splunk
status: experimental
type: Anomaly
description: The following analytic detects anomalous changes in resource utilization ratios for processes running on a Kubernetes node. It leverages process metrics collected via an OTEL collector and hostmetrics receiver, analyzed through Splunk Observability Cloud. The detection uses a lookup table containing average and standard deviation values for various resource ratios (e.g., CPU:memory, CPU:disk operations). Significant deviations from these baselines may indicate compromised processes, malicious activity, or misconfigurations. If confirmed malicious, this could signify a security breach, allowing attackers to manipulate workloads, potentially leading to data exfiltration or service disruption.
data_source: []
search: "| mstats avg(process.*) as process.* where `kubernetes_metrics` by host.name k8s.cluster.name k8s.node.name process.executable.name span=10s | eval cpu:mem = 'process.cpu.utilization'/'process.memory.utilization' | eval cpu:disk = 'process.cpu.utilization'/'process.disk.operations' | eval mem:disk = 'process.memory.utilization'/'process.disk.operations' | eval cpu:threads = 'process.cpu.utilization'/'process.threads' | eval disk:threads = 'process.disk.operations'/'process.threads' | eval key = 'k8s.cluster.name' + \":\" + 'host.name' + \":\" + 'process.executable.name' | lookup k8s_process_resource_ratio_baseline key | fillnull | eval anomalies = \"\" | foreach stdev_* [ eval anomalies =if( '<<MATCHSTR>>' > ('avg_<<MATCHSTR>>' + 4 * 'stdev_<<MATCHSTR>>'), anomalies + \"<<MATCHSTR>> ratio higher than average by \" + tostring(round(('<<MATCHSTR>>' - 'avg_<<MATCHSTR>>')/'stdev_<<MATCHSTR>>' ,2)) + \" Standard Deviations. <<MATCHSTR>>=\" + tostring('<<MATCHSTR>>') + \" avg_<<MATCHSTR>>=\" + tostring('avg_<<MATCHSTR>>') + \" 'stdev_<<MATCHSTR>>'=\" + tostring('stdev_<<MATCHSTR>>') + \", \" , anomalies) ] | eval anomalies = replace(anomalies, \",\\s$\", \"\") | where anomalies!=\"\" | stats count values(anomalies) as anomalies by host.name k8s.cluster.name k8s.node.name process.executable.name | where count > 5 | rename host.name as host | `kubernetes_process_with_resource_ratio_anomalies_filter`"
how_to_implement: "To implement this detection, follow these steps:\n* Deploy the OpenTelemetry Collector (OTEL) to your Kubernetes cluster.\n* Enable the hostmetrics/process receiver in the OTEL configuration.\n* Ensure that the process metrics, specifically Process.cpu.utilization and process.memory.utilization, are enabled.\n* Install the Splunk Infrastructure Monitoring (SIM) add-on. (ref: https://splunkbase.splunk.com/app/5247)\n * Configure the SIM add-on with your Observability Cloud Organization ID and Access Token.\n* Set up the SIM modular input to ingest Process Metrics. Name this input \"sim_process_metrics_to_metrics_index\".\n* In the SIM configuration, set the Organization ID to your Observability Cloud Organization ID.\n* Set the Signal Flow Program to the following: data('process.threads').publish(label='A'); data('process.cpu.utilization').publish(label='B'); data('process.cpu.time').publish(label='C'); data('process.disk.io').publish(label='D'); data('process.memory.usage').publish(label='E'); data('process.memory.virtual').publish(label='F'); data('process.memory.utilization').publish(label='G'); data('process.cpu.utilization').publish(label='H'); data('process.disk.operations').publish(label='I'); data('process.handles').publish(label='J'); data('process.threads').publish(label='K')\n* Set the Metric Resolution to 10000.\n * Leave all other settings at their default values.\n* Run the Search Baseline Of Kubernetes Container Network IO Ratio"
known_false_positives: No false positives have been identified at this time.
references:
    - https://github.com/signalfx/splunk-otel-collector-chart
intermediate_findings:
    entities:
        - field: host
          type: system
          score: 20
          message: Kubernetes Process with Resource Ratio Anomalies on host $host$
analytic_story:
    - Abnormal Kubernetes Behavior using Splunk Infrastructure Monitoring
asset_type: Kubernetes
mitre_attack_id:
    - T1204
product:
    - Splunk Enterprise
    - Splunk Enterprise Security
    - Splunk Cloud
category: cloud
security_domain: network
baselines:
    - Baseline Of Kubernetes Process Resource Ratio