Subject:

terminal width


Message-Id: https://www.5snb.club/w/terminal-width-libraries/
Linked-From: wiki.

a terminal width library is a library that tells you “hey, how wide is this string when I print it on the user’s terminal?”

… which sounds great… except for you needing to give an answer without knowing what font the user uses, or what terminal they’re using, or anything else.

this script tries to enumerate what kinds of characters actually work. (the list is different based on your terminal :3 )

use std::io::Write;
use unicode_properties::UnicodeGeneralCategory;

#[derive(PartialEq)]
enum TestResult {
    Success,
    Failed { terminal: u16, library: u16 },
    Skipped,
}

fn check(s: &str) -> TestResult {
    let width = u16::try_from(textwrap::core::display_width(s)).unwrap();

    println!();
    let pos = crossterm::cursor::position().unwrap();
    assert_eq!(pos.0, 0);
    std::io::stdout().write_all(s.as_bytes()).unwrap();
    let newpos = crossterm::cursor::position().unwrap();

    if newpos.1 != pos.1 {
        // not testing line wrapping here.
        return TestResult::Skipped;
    }

    if newpos.0 != width {
        TestResult::Failed {
            terminal: width.into(),
            library: newpos.0,
        }
    } else {
        TestResult::Success
    }
}

fn allowed_char(ch: &char) -> bool {
    use unicode_properties::GeneralCategory as GC;
    match ch.general_category() {
        GC::ClosePunctuation | GC::ConnectorPunctuation | GC::CurrencySymbol
        | GC::DashPunctuation | GC::DecimalNumber | GC::EnclosingMark
        | GC::FinalPunctuation | GC::InitialPunctuation | GC::LetterNumber
        | GC::LowercaseLetter | GC::MathSymbol | GC::OpenPunctuation
        | GC::SpaceSeparator | GC::Surrogate | GC::TitlecaseLetter
        | GC::UppercaseLetter => true,
        GC::ModifierLetter => false, // failed on \u{ff9f} (library said 1, actual was 0)
        GC::OtherLetter => false,    // failed on \u{1171} (library said 1, actual was 0)
        GC::NonspacingMark => false, // failed on \u{1612f} (library said 0, actual was 1)
        GC::SpacingMark => false,    // failed on \u{1933} (library said 0, actual was 1)
        GC::OtherNumber => false,    // failed on \u{1d363} (library said 2, actual was 1)
        GC::OtherPunctuation => false, // failed on \u{17d8} (library said 1, actual was 3)
        GC::ModifierSymbol => false, // failed on \u{21f8}\u{1f3ff} (library said 2, actual was 3)
        GC::OtherSymbol => false,    // failed on \u{1d304} (library said 2, actual was 1)
        GC::LineSeparator => false,  // failed on \u{2922}\u{2028} (library said 1, actual was 2)
        GC::ParagraphSeparator => false, // failed on \u{2029} (library said 0, actual was 1)
        GC::Control => false,        // failed on \u{9} (library said 8, actual was 0)
        // ... actually, that doesn't seem right? maybe the terminal is being
        // fucky here. 🤷???
        GC::Format => false, // failed on \u{110bd} (library said 0, actual was 1)
        // These shouldn't be encountered anyways.
        GC::PrivateUse => false,
        GC::Unassigned => false,
    }
}

fn generate_valid_string(len: usize) -> String {
    let s: String = std::iter::repeat_with(rand::random::<char>)
        .filter(allowed_char)
        .take(len)
        .collect();

    for ch in s.chars() {
        assert!(allowed_char(&ch));
    }

    s
}

fn main() {
    for len in 8..=64 {
        for _ in 0..500000 {
            let s = generate_valid_string(len);
            let result = check(&s);
            if let TestResult::Failed { terminal, library } = result {
                let escaped = s.escape_unicode();
                panic!("\nfailed on {escaped} (library said {library}, actual was {terminal})\n")
            }
        }
    }
}

algorithm idea to print some text wrapping but correct independent on the terminal:

split text into wrapping units, and print each one individually getting the cursor position before and after every print

if you line wrapped (or went beyond a specific length, if you want to wrap to N columns), go back to where you were before you printed that word, do a “clear from this cell to the end of the line” operation, then go down a line repeat printing until you’re done

you can actually disable line wrapping in terminals, so you only need to detect “did we hit the end of the screen?”

use std::io::Write;
use crossterm::execute;
use crossterm::cursor::{SavePosition, RestorePosition, position};
use crossterm::terminal::{DisableLineWrap, EnableLineWrap, Clear, ClearType};
use itertools::Itertools as _;
use itertools::Position;

// Returns true if end of line was hit
fn write_detecting_eol<W: Write>(mut to: W, s: &str) -> bool {
    for byte in s.bytes() {
        std::thread::sleep(std::time::Duration::from_millis(100));

        let before = position().unwrap();
        to.write_all(&[byte]).unwrap();
        let after = position().unwrap();
        if before == after {
            // oops that write isn't actually doing anything :)
            return true;
        }
    }

    false
}

fn wrap_print<'a, W: Write>(mut to: W, words: impl Iterator<Item = &'a str>) {
    execute!(to, DisableLineWrap).unwrap();

    for (pos, word) in words.with_position() {
        execute!(to, SavePosition).unwrap();

        if write_detecting_eol(&mut to, word) {
            // ah frick we need to redo that
            execute!(to, RestorePosition).unwrap();
            execute!(to, Clear(ClearType::UntilNewLine)).unwrap();
            writeln!(to).unwrap();

            if write_detecting_eol(&mut to, word) {
                // TODO: hyphenate or just hard break it *somewhere*.
                // for now we can just panic.
                todo!();
            }
        }

        if pos == Position::First || pos == Position::Middle {
            // TODO:... do you need to do this? it *seems* to work fine without it??
            if write_detecting_eol(&mut to, " ") {
                // we ran out of room here, but that's okay actually, just write the last word
                // again
                execute!(to, RestorePosition).unwrap();
                assert!(!write_detecting_eol(&mut to, word));

                // we still need to go to the next line though
                writeln!(to).unwrap()
            }
        }
    }

    execute!(to, EnableLineWrap).unwrap();
}

const LOREM: &str = "Est aut.magnam corrupti.distinctio cupiditate.et. Quibusdam.magni.ut alias
deleniti.assumenda necessitatibus.placeat. Consectetur.saepe nihil.voluptatem quidem.inventore.
Sunt.omnis.reprehenderit maiores..Aut.ea repudiandae sed mollitia.ullam ipsa.at..Et.reprehenderit
et eligendi.";

fn main() {
    let words = LOREM.split(['\n', ' ']);

    wrap_print(std::io::stdout(), words);
}

seems to work fine?

and since it uses the terminal itself to determine the width, it’s more robust against libraries that are clueless.