Copy Fail: AF_ALG + splice page-cache overwrite (CVE-2026-31431)
This page documents Copy Fail: a Linux kernel local privilege escalation where AF_ALG + splice() turns readable file page-cache pages into part of a writable AEAD destination scatterlist, and authencesn then performs a deterministic 4-byte write past the contractual output boundary.
- Affected component:
crypto/algif_aead.cin-place decrypt path +crypto/authencesn.c - Primitive: controlled 4-byte page-cache write into any file readable by the attacker
- Reachability: unprivileged local user,
AF_ALGavailable,algif_aeadloaded - Impact: immediate system-wide corruption of the page-cache copy used by
read(),mmap(), andexecve()
This is closer to Dirty Pipe / Dirty COW style page-cache abuse than to a classic memory-corruption race:
- no race window
- no repeated retries
- no on-disk file modification
- same exploit flow across many distros because the primitive is structural, not offset-dependent
Core idea
splice() moves data between a file, a pipe, and another FD by reference. If a readable file is spliced into a pipe and then into an AF_ALG AEAD socket, the crypto input scatterlist can reference the same page-cache pages backing that file.
For AEAD decrypt, algif_aead historically optimized the request into an in-place layout:
- AAD and ciphertext were copied into the user RX buffer
- the final authentication tag was not copied
- instead, tag scatterlist entries were appended to the destination with
sg_chain() req->src = req->dst, so those appended tag pages became part of a writable destination chain
If the tag pages come from spliced file data, the writable destination chain now includes page-cache pages of a read-only file.
The bug in authencesn
authencesn is an AEAD wrapper used for IPsec Extended Sequence Numbers (ESN). During decrypt it uses the destination scatterlist as scratch space and writes 4 bytes past the legitimate decrypt output:
scatterwalk_map_and_copy(tmp, dst, 0, 8, 0);
scatterwalk_map_and_copy(tmp, dst, 4, 4, 1);
scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1);
The last write stores seqno_lo (attacker-controlled AAD bytes 4..7) at dst[assoclen + cryptlen], which is after the tag and therefore outside the contract for AEAD decrypt output.
When algif_aead has chained page-cache-backed tag pages into dst, that write crosses out of the RX buffer and lands in the victim file's page cache.
Why this becomes a useful primitive
The attacker controls:
- Which file: any file readable by the attacker
- Which offset: via splice offset/length and AEAD
assoclen - Which value: the 4 bytes written come from attacker-controlled AAD bytes
4..7
Even if authentication fails and recvmsg() returns an error, the page-cache overwrite persists because the scratch write already happened.
The corrupted page is not marked dirty for writeback, so:
- the on-disk file remains unchanged
- checksum comparisons on disk miss the attack
- all later
read(),mmap(), andexecve()users consume the modified in-memory page
Typical LPE path
The public write-up targets a setuid-root binary such as /usr/bin/su:
- Open
AF_ALGand bind toauthencesn(hmac(sha256),cbc(aes)) - Send AAD where bytes
4..7contain the 4-byte chunk to write splice()target file data into the AEAD input so the final tag region references the target file's page-cache pages- Trigger
recv()/recvmsg()to force decrypt - Repeat until the page-cache copy of the setuid binary is patched
- Execute the binary so the kernel loads the modified cached image and runs attacker code as root
Conceptual PoC skeleton:
a = socket.socket(38, 5, 0) # AF_ALG, SOCK_SEQPACKET
a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))
# set key, accept request socket
u.sendmsg([b"A"*4 + payload_chunk], [cmsg_headers], MSG_MORE)
os.splice(target_fd, pipe_wr, offset)
os.splice(pipe_rd, alg_fd, offset)
u.recv(...) # triggers decrypt -> page-cache write
How the bug became exploitable
- 2011:
authencesnintroduced for IPsec ESN handling (a5079d084f8b) - 2015:
authencesnconverted to the new AEAD interface and kept the out-of-contract scratch write (104880a6b470) - 2017:
algif_aeadswitched decrypt to an in-place design and chained tag pages into the destination (72548b093ee3)
That 2017 change is what turned an internal scratch write into a page-cache write primitive reachable from unprivileged userspace.
Fix and mitigations
Mainline fixed this by reverting algif_aead back to out-of-place operation (a664bf3d603d), so page-cache pages can remain in the source scatterlist but no longer become part of the writable destination chain.
Useful mitigations:
- patch to a kernel carrying
a664bf3d603dor a distro backport - block
AF_ALGsocket creation with seccomp for untrusted workloads - disable
algif_aeadif you need an immediate stopgap
Example emergency mitigation:
echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif-aead.conf
rmmod algif_aead 2>/dev/null || true
For containerized environments, AF_ALG should be treated as a kernel attack surface. Even without this specific CVE, it is a good candidate for seccomp denial in CI runners, sandboxes, and multi-tenant containers.
Detection / review notes
- A page-cache-only patch means the suspicious effect may be visible only in memory, not on disk.
- Look for unusual
AF_ALGuse on systems that do not intentionally expose kernel crypto sockets to workloads. - When auditing zero-copy kernel interfaces, treat any path that combines
splice()-backed page references with scatterlists reused as destinations as high risk. - A useful reviewer rule is: if an algorithm writes beyond its documented output length, any caller that chains foreign pages into
dstmay turn it into a write primitive.
References
- Xint write-up: Copy Fail: 732 Bytes to Root on Every Major Linux Distributions
- Copy Fail advisory / mitigation page
- Linux fix:
crypto: algif_aead - Revert to operating out-of-place(a664bf3d603d) - Linux commit:
crypto: algif_aead - copy AAD from src to dst(72548b093ee3) - Linux commit:
crypto: authencesn - Convert to new AEAD interface(104880a6b470) - Linux commit:
crypto: authencesn - Add algorithm to handle IPsec extended sequence numbers(a5079d084f8b)