printf format validation in rust
Date: Message-Id: https://www.5snb.club/posts/2023/printf-format-validation-in-rust/
Tags: #hack(6)
I was watching a talk about Idris 2 and it was
mentioned that you can implement a type safe printf
using dependent types (around 10 minutes in).
And I was wondering if you could do something like that in rust. And you can, ish!
error[E0308]: mismatched types
--> src/main.rs:145:13
|
145 | let x = printf::<"that's a %s %s, aged %u!">("cute", "dog");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `"%s%s"`, found `"%s%s%u"`
|
= note: expected constant `"%s%s"`
found constant `"%s%s%u"`
That’s done with no macros, just a lot of const
code of dubious quality.
The core technique I use here that you can assert equality of 2 constant values as a where
bound.
For example, let’s write a function that asserts that you pass in a string whose length is the same size as the object you pass in.
#![allow(incomplete_features)]
#![feature(generic_const_exprs)]
#![feature(associated_const_equality)]
#![feature(adt_const_params)]
trait Size {
const SIZE: usize;
}
impl<T> Size for T {
const SIZE: usize = std::mem::size_of::<T>();
}
pub const fn length(s: &str) -> usize {
s.len()
}
fn correct_size<T, const S: &'static str>(item: T)
where T: Size<SIZE = { length(S) }> {
}
fn main() {
correct_size::<_, "hewwo">(42_u32);
}
error[E0308]: mismatched types
--> src/main.rs:23:32
|
23 | correct_size::<_, "hewwo">(42_u32);
| ^^^^^^ expected `4`, found `5`
|
= note: expected constant `4`
found constant `5`
note: required by a bound in `correct_size`
--> src/main.rs:19:19
|
18 | fn correct_size<T, const S: &'static str>(item: T)
| ------------ required by a bound in this function
19 | where T: Size<SIZE = { length(S) }> {
| ^^^^^^^^^^^^^^^^^^^^ required by this bound in `correct_size`
Worried about the 3 features and incomplete_features
in this small example? Don’t worry, it gets
worse.
Anyways, the core premise here is that you have two sides, one being the string as the const parameter, and one being the value entered. And you apply a “skeleton” function to each side to map it to some shared expected value, and if both map to the same value, then it’s allowed to compile.
The skeleton function should end up in a reasonably human understandable value as the key, since it is what will be printed when there’s a difference.
That’s all we really need to know in order to start with the real printf
code. First, let’s look
at the skeleton function for the format string. The key will be the format string specifiers, so
Hello %d world %s!
ends up with a key of %d%s
, which is reasonably human readable.
I’m using konst
to make parsing the string a bit
easier. You absolutely can do this without that crate, it’s just a bit more painful.
const fn parse_skeleton<const F: &'static str>() -> &'static str {
let mut s = "";
let mut chars = konst::string::chars(F);
let mut saw_percent = false;
while let Some((ch, chars_)) = chars.next() {
chars = chars_;
if saw_percent {
if ch != '%' {
let encoded = konst::chr::encode_utf8(ch);
s = append_strs(append_strs(s, "%"), encoded.as_str());
}
saw_percent = false;
} else {
if ch == '%' {
saw_percent = true;
}
}
}
s
}
Fairly simple code, if a bit Weird because it needs to be const fn
, so no for
loops for you :)
Except for append_strs
. What the fuck is that? Well, I need some way to dynamically build a
&'static str
. So I wrote a function with a rather funny signature, fn(&str, &str) -> &'static str
,
which does exactly what you think it does.
const fn append_strs(a: &str, b: &str) -> &'static str {
unsafe {
let buf = core::intrinsics::const_allocate(a.len() + b.len(), 1);
assert!(!buf.is_null(), "append_strs can only be called at comptime");
std::ptr::copy(a.as_ptr(), buf, a.len());
std::ptr::copy(b.as_ptr(), buf.add(a.len()), b.len());
std::str::from_utf8_unchecked(std::slice::from_raw_parts(buf, a.len() + b.len()))
}
}
Turns out const
does have allocation. It’s just very very magic. It returns a null pointer if
you try to call it at runtime, so this is actually safe, I think. Cursed shit like this at compile
time is Fine because the unused values will just get removed since they’re not referenced.
Probably.
Oh, I almost forgot, you also earn
#![feature(adt_const_params)]
#![feature(core_intrinsics)]
#![feature(const_ptr_is_null)]
#![feature(const_heap)]
Anyways, now let’s do the skeleton for the value. The code for that is downright normal, and doesn’t need any more unstable features.
First, we need to define the specifier for each type you want to use in the formatting.
trait InnerFormatString: Display {
const KIND: &'static str;
}
And then the implementations
impl InnerFormatString for u32 {
const KIND: &'static str = "%u";
}
impl InnerFormatString for i32 {
const KIND: &'static str = "%d";
}
impl<'a> InnerFormatString for &'a str {
const KIND: &'static str = "%s";
}
Fairly standard stuff.
I used tuples to pass the arguments, so I defined a trait for the tuples themselves
trait FormatString {
const KIND: &'static str;
fn display(&self, x: usize) -> &dyn Display;
}
And then the implementation code. I’ll only show the one for a 3-tuple, but you get the gist, it works for any tuple size you want.
impl<A: InnerFormatString, B: InnerFormatString, C: InnerFormatString> FormatString for (A, B, C) {
const KIND: &'static str = append_strs(A::KIND, append_strs(B::KIND, C::KIND));
fn display(&self, x: usize) -> &dyn Display {
match x {
0 => &self.0,
1 => &self.1,
2 => &self.2,
_ => panic!(),
}
}
}
The thing computing the key is KIND
, the display
is just there to make the printf code actually
work.
Finally, let’s write the printf
code itself.
#![feature(generic_const_exprs)]
#![feature(associated_const_equality)]
fn printf<A, const F: &'static str>(arg: A) -> String
where
A: FormatString<KIND = { parse_skeleton::<F>() }>,
{
let mut saw_percent = false;
let mut idx = 0;
let mut ret = String::new();
for ch in F.chars() {
if saw_percent {
if ch == '%' {
ret.push('%');
} else {
// We know that the ch *will* correspond to the appropriate arg here
// (So we could unsafely assume that.)
// But for now, we can just make use of Display and not actually use it.
write!(ret, "{}", arg.display(idx)).unwrap();
idx += 1;
}
saw_percent = false;
} else {
if ch == '%' {
saw_percent = true;
} else {
ret.push(ch);
}
}
}
ret
}
error[E0308]: mismatched types
--> src/main.rs:157:53
|
157 | let x = printf::<_, "that's a %s %s, aged %u!">(("cute", "dog"));
| ^^^^^^^^^^^^^^^ expected `"%s%s"`, found `"%s%s%u"`
|
= note: expected constant `"%s%s"`
found constant `"%s%s%u"`
note: required by a bound in `printf`
--> src/main.rs:163:21
|
161 | fn printf<A, const F: &'static str>(arg: A) -> String
| ------ required by a bound in this function
162 | where
163 | A: FormatString<KIND = { parse_skeleton::<F>() }>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `printf`
And we’re done!
Right?
Okay, fine. Time for one more hack. It would be nice if we didn’t need the double parenthesis, and
it just acted like a normal function. Thankfully, we have a tool for that! FnOnce
(and friends)
take their arguments as a tuple! It’s quite unstable, but we can do it.
#![feature(generic_const_items)]
#![feature(unboxed_closures)]
#![feature(tuple_trait)]
#![feature(fn_traits)]
struct printf<const F: &'static str>;
impl<A: std::marker::Tuple, const F: &'static str> std::ops::FnOnce<A> for printf<F>
where
A: FormatString<KIND = { parse_skeleton::<F>() }>,
{
type Output = String;
extern "rust-call" fn call_once(self, arg: A) -> String {
// you've already seen this.
}
}
Finally, we’ve reached the API shown at the top. Download the full .rs
(be sure to add konst 0.3.6
as a dependency)
Is printf
specifically a useful API to do this for? No, not really, we have format_args
. But
it sure was funny. Finding an actual productive use for this is an exercise for the reader.