Subject:

Custom literals in rust


Date: Message-Id: https://www.5snb.club/posts/2020/custom-literals-in-rust/
Tags: #hack(6)

Ever wanted custom literals in rust? No? Too bad!

let x = "#123456": [[Color]];
// x is a Color

This runs at compile time. If the parser panics, you get a compile failure. Neat, huh?

This also doesn’t only work for strings. Any value known at compile time can be used. As long as you can do your processing in a const fn, it’s fair game.

Usage

impl const constinto::ConstPanickingFrom<&str> for Color {
    fn const_panicking_from(x: &str) -> Self {
        todo!() // Not important here.
    }
}

#[userliterals]
fn flag() -> [Color; 4] {
    // Compile time error if uncommented.
    // let invalid = "lololol": [[Color]];

    [
        "#ffff00": [[Color]],
        "#ffffff": [[Color]],
        "#800080": [[Color]],
        "#000000": [[Color]],
    ]
}

Implementation

A full code sample is at the bottom for copy pasting.

struct LiteralVisitor;

We don’t have any state or arguments that need passing to the visitor, so we can simply use a unit struct here.

impl VisitMut for LiteralVisitor {

VisitMut lets you walk through a tree of nodes and do modifications on them just by defining methods that are run whenever an element of a particular type is encountered.

  fn visit_expr_mut(&mut self, expr: &mut syn::Expr) {
    let sp = expr.span();

Here, we’re capturing the span of expr. This lets us point to the original source when there’s errors.

    if let syn::Expr::Type(ty) = expr {
      if let syn::Type::Slice(outer) = ty.ty.as_ref() {
        if let syn::Type::Slice(inner) = outer.elem.as_ref() {

This looks for a Type expression which is type ascription (foo(50: i32)), and then looks to see if the ty of that type (which is just the type named on the right hand side) is two slices ([[]]). If you will never use type coercion, then the check for two slices can be skipped and you can simply treat "#123456": Color as a literal, at the risk of being very confusing to people who are not used to the code.

          let ex = &ty.expr;

Here we’re capturing the left hand side of the expression (In "#123456": [[Color]], this is the "#123456"), as we will be using it in the expansion.

          match inner.elem.as_ref() {
            syn::Type::Path(_) => {
              let to = &inner.elem;

Here we get at the thing inside the two square brackets on the right (In [[Color]], this is the Color), since we need it for both the type of the constant, as well as to invoke the correct trait method. We also check to see what is in the right hand side. Here, we’re looking for a Path, which is simply any name for an item. This means you can use [[crate::module::Color]] if you wanted, it will still work, since all the macro needs to be able to do is write it as a type.

              *expr = syn::Expr::Verbatim(quote::quote_spanned! {
                sp => {
                  use ::constinto::ConstPanickingFrom as _;
                  const _VAL: #to = #to::const_panicking_from(#ex);
                  _VAL
                }
              });

This is the expansion. We create a new expression (a Verbatim simply means you can put whatever tokens you want in it. This is good for our use case, since quote_spanned! creates a TokenStream).

Here, we also make use of sp, which is the span we got above. This lets the compiler know who to blame when there’s errors.

We create a new block using { } to avoid leaking our use and const _VAL into the outer scope. The use itself wouldn’t be much of a problem, but you wouldn’t want to re-use names of consts. This saves us the hassle of trying to generate a random name.

The reason we use const _VAL and then write _VAL at the end (which makes the whole expression evaluate to _VAL) is to force evaluation. If we just wrote sp => #to::const_panicking_from(#ex);, then the compiler wouldn’t actually evaluate it at compile time.

There’s an open RFC (https://github.com/rust-lang/rfcs/pull/2920) to add const { 2 + 2 } blocks to force const evaluation. This would mean this hack isn’t needed, we could just use a const block and call the function and return the result.

The reason I use two brackets is to ensure that the right hand side is never valid for type ascription as the set of two brackets would mean a slice within a slice, but slices cannot have unsized types inside of them (let x: &[dyn std::fmt::Debug] = &[] is invalid).

This ensures there is no compatibility issue, any code that the literal macro will convert would not have compiled anyways without it.

            }
            _ => panic!("Bad input"),

Here we just panic if we get something other than a Path inside the square brackets. Not great error handling, but it works.

          }

          return;

Once we have transformed the input, we return, to avoid walking the inner elements unnecessarily.

        }
      }
    }

    syn::visit_mut::visit_expr_mut(self, expr)
  }
}

This final call is so that we will walk the whole tree of expressions.

#[proc_macro_attribute]
pub fn userliterals(
    _attr: proc_macro::TokenStream,
    item: proc_macro::TokenStream,
) -> proc_macro::TokenStream {
    let mut input = syn::parse_macro_input!(item as syn::Item);

    LiteralVisitor.visit_item_mut(&mut input);

    let expanded = quote::quote! { #input };

    proc_macro::TokenStream::from(expanded)
}

The name of this function is the attribute name you will use on your functions. The #[proc_macro_attribute] means that this is a proc macro.

We don’t need attr, as that’s any items that are passed as “arguments” to the proc macro. We don’t allow for customisation, so we just ignore that variable.

We simply parse the input, walk the tree using our LiteralVisitor, and then create a TokenStream from the output and return it.

The Code

customlit

The code for this started life as https://github.com/Andlon/numeric_literals, and I gradually hacked away the parts that weren’t needed.

use syn::spanned::Spanned;
use syn::visit_mut::VisitMut;

struct LiteralVisitor;

impl VisitMut for LiteralVisitor {
    fn visit_expr_mut(&mut self, expr: &mut syn::Expr) {
        let sp = expr.span();

        if let syn::Expr::Type(ty) = expr {
            if let syn::Type::Slice(outer) = ty.ty.as_ref() {
                if let syn::Type::Slice(inner) = outer.elem.as_ref() {
                    let ex = &ty.expr;

                    match inner.elem.as_ref() {
                        syn::Type::Path(_) => {
                            let to = &inner.elem;

                            *expr = syn::Expr::Verbatim(quote::quote_spanned! {
                                sp => {
                                    use ::constinto::ConstPanickingFrom as _;
                                    const _VAL: #to = #to::const_panicking_from(#ex);
                                    _VAL
                                }
                            });
                        }
                        _ => panic!("Bad input"),
                    }

                    return;
                }
            }
        }

        syn::visit_mut::visit_expr_mut(self, expr)
    }
}

#[proc_macro_attribute]
pub fn userliterals(
    _attr: proc_macro::TokenStream,
    item: proc_macro::TokenStream,
) -> proc_macro::TokenStream {
    let mut input = syn::parse_macro_input!(item as syn::Item);

    LiteralVisitor.visit_item_mut(&mut input);

    let expanded = quote::quote! { #input };

    proc_macro::TokenStream::from(expanded)
}

And the relevant parts of the Cargo.toml

[lib]
proc-macro = true

[dependencies]
quote = "1.0.7"

[dependencies.syn]
version = "1.0"
default-features = false
features = ["visit-mut", "printing", "full", "parsing", "proc-macro", "extra-traits"]

constinto

There’s also a crate that defines const_panicking_from that needs to be included by the crate using the macro.

pub trait ConstPanickingFrom<T> {
    fn const_panicking_from(value: T) -> Self;
}

Yep, that’s it. 3 lines of code to define a trait. I couldn’t do this in customlit because proc_macro crates can’t export types.

If I implemented this trait for any standard library types, that would go here.

user_crate

#![feature(const_trait_impl)]
#![allow(incomplete_features)]
#![feature(const_option)]
#![feature(const_fn)]
#![feature(const_panic)]

use customlit::userliterals;

#[derive(Debug, PartialEq, Eq)]
struct Color {
    r: u8,
    g: u8,
    b: u8,
}

const fn hex_to_u4(x: u8) -> u8 {
    match x {
        b'0'..=b'9' => x - b'0',
        b'a'..=b'f' => x + 10 - b'a',
        _ => panic!("unknown character"),
    }
}

const fn hl_to_u8(h: u8, l: u8) -> u8 {
    hex_to_u4(h) << 4 | hex_to_u4(l)
}

impl const constinto::ConstPanickingFrom<&str> for Color {
    fn const_panicking_from(x: &str) -> Self {
        match x.as_bytes() {
            [b'#', rh, rl, gh, gl, bh, bl] => {
                let r = hl_to_u8(*rh, *rl);
                let g = hl_to_u8(*gh, *gl);
                let b = hl_to_u8(*bh, *bl);
                Self { r, g, b }
            }
            _ => panic!("invalid colour"),
        }
    }
}

#[userliterals]
fn flag() -> [Color; 4] {
    // Compile time error if uncommented.
    // let invalid = "lololol": [[Color]];
    [
        "#ffff00": [[Color]],
        "#ffffff": [[Color]],
        "#800080": [[Color]],
        "#000000": [[Color]],
    ]
}

fn main() {
    dbg!(flag());
}

As mentioned above, you must include both.

[dependencies]
customlit = { path = "../customlit" }
constinto = { path = "../constinto" }

Should I Do This?

Short answer? No.

Long answer? Not yet.

Support for const trait implementations are pretty weak at the moment, so I couldn’t actually get ConstPanickingFrom implemented for any standard library types, which would have made making it a trait actually worth it.

To be fair, it did say it was an incomplete feature.

But I believe this model, or something like it would be a decent idea for custom literals.

Alternate Methods

This initially didn’t have a trait, and just called a magic method that was expected to be implemented on the type. This meant you wouldn’t be able to implement it for any standard library types, and is pretty magic.

This could be implemented just on types that use TryFrom, but not every type implements that, SocketAddr only implements FromStr and not TryFrom.

Alternatively, this could be implemented just on types that use FromStr, but that might be too much work for some types. NonZeroI32 does implement FromStr, but it’s easier for types to start off with an integer. That, and having an integer literal in your code leads to better syntax highlighting.

Ideally, rust would evaluate code that uses values that are (partially) known at compile time, and then show a warning/error if it unconditionally panics. This would work even for non-const functions, and possibly even methods like AtomicBool::load could panic if you pass in an Order of Release or AcqRel. Doing this would mean you could just write let x = "192.168.0.1:1234".parse().unwrap(); and assuming type inference means that the compiler knows what x is, you’ll get a compile error if it can’t parse as a socket address correctly.