PoC Watermark Attack on Restic
Date: Message-Id: https://www.5snb.club/posts/2023/poc-watermark-attack/
Tags: #security(4)
First off, does this matter to you? No. No it doesn’t. Unless you’re backing up gigabytes of completely attacker controlled data, to an attacker controlled service, and need to ensure they don’t know you’re backing up said data, it Doesn’t Fucking Matter.
With that said, it’s a somewhat neat attack!
A watermarking attack is when an attacker who can get you to store an attacker-controlled piece of data can then detect the presence of that attacker controlled data. It’s not a huge deal, but is a concern if someone is able to inject a watermark into, say, copywritten or leaked content, and then automatically terminate the cloud storage/backup accounts of users that can be shown to have that data on their drive.
Restic uses Content-Defined-Chunking with Rabin Fingerprints with a secret, random polynomial in order to make watermarking attacks harder. (And, of course, the repo is encrypted, otherwise this attack doesn’t make sense to try to do, just read off the data directly).
This attack does not go after that. Instead, I simply repeat a given chunk of data many times in a row. This means that Restic will tend to create chunks that will have that size, or a multiple of it. Which means the attacker can use the size of the chunks they’re forcing the user to create to encode data.
Anyways, this does work… just not very well :P
A transcript of running it is below. The check.rs
tool only accesses what
metadata would be visible to an attacker without the password to an encrypted
repository, that being object sizes.
running attacker file with tags 1000 1000 2000 2000
-rw-r--r-- 1 jess jess 801M May 19 23:10 attacker.bin
open repository
repository e794b424 opened (version 2, compression level off)
created new cache in /home/jess/.cache/restic
lock repository
no parent snapshot found, will read all files
load index files
start scan on [attacker.bin]
start backup on [attacker.bin]
scan finished in 0.224s: 1 files, 800.732 MiB
Files: 1 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Data Blobs: 528 new
Tree Blobs: 1 new
Added to the repository: 795.915 MiB (795.935 MiB stored)
processed 1 files, 800.732 MiB in 0:04
snapshot b3f64a9e saved
[src/bin/check.rs:22] &sz = [
(
16777390,
8,
),
(
18045195,
7,
),
(
18034195,
6,
),
]
./check.sh 15.01s user 3.68s system 137% cpu 13.543 total
And for reference, this was ran with
restic 0.15.2 compiled with go1.20.3 on linux/amd64
At the end, the first number is the file size, and the second number is how many chunks of that size appeared.
Here, the attacker generated 800MB of data, and we can see 3 interesting sizes.
The two bottom ones are the real tags, the top one is noise. The attacker
generated chunks with the tags 1000
and 2000
, which are bytes added to the
base, so we would expect to see a multiple of that many bytes difference in
size between them. And we do, they’re 18MB large, but the difference is
11×1000=11000 bytes.
Compression doesn’t really affect the attack since the data being inserted is random data, so Restic won’t try and compress it.
See watermark-attack.zip for the test files, under CC0. It requires Rust, but should work on any vaguely modern compiler. Or just rewrite the code in whatever language you prefer, it’s very trivial.
Future research might include trying to automatically recover the fingerprint polynomial so a watermark attack can be pulled off much more directly. That would require interactivity, where you backup attacker controlled data repeatedly.
What if you have webserver logs that you’re backing up, is that a viable attack
to confirm that 5snb.club
, or
iykpqm7jiradoeezzkhj7c4b33g4hbgfwelht2evxxeicbpjy44c7ead.onion
is some
specific account on a online backup service? (Not that I keep logs anyways :P)
Mitigations might be to pad chunks as they’re created, though that comes with a size cost. See Optimally Hiding Object Sizes with Constrained Padding, perhaps. That wouldn’t be a hard solution to the problem, but would make this harder to pull off.