URLs
Date: Message-Id: https://www.5snb.club/posts/2020/urls/
My requirements for URLs on websites, and what makes a good URL.
URLs MUST be stable. A link to a specific post, shouldn’t change what it points to 5 years down the line. Nor should it die, if the content’s still available but under a different name.
URLs SHOULD be user editable, and the parts that aren’t should be obviously opaque.
URLs MAY have immediately relevant information in them that isn’t strictly needed to resolve the URL.
Here’s a sampling of URLs
- https://www.reddit.com/r/rust/comments/6g3sc2/best_way_to_multithread_a_simple_function/ (reddit)
- https://github.com/rust-lang/wg-allocators/issues/17 (github)
- https://doc.rust-lang.org/src/std/io/mod.rs.html#502-964 (rustdoc)
- https://doc.rust-lang.org/std/io/trait.Read.html#method.read_to_end
- https://www.youtube.com/channel/UC1usFRN4LCMcfIV7UjHNuQg/videos (youtube channel)
- https://www.youtube.com/watch?v=dQw4w9WgXcQ (youtube video)
- https://discordapp.com/channels/729293063826175566/729293064267238942/738076547881893032 (discord)
- https://www.amazon.co.uk/dp/B0791RGQW3/ref=s9_acsd_al_bw_c2_x_0_t?pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=merchandised-search-11&pf_rd_r=SQHJ5TXNCSRE9PRP4QAG&pf_rd_t=101&pf_rd_p=eb3feabb-ea62-4002-b39e-5ccc29f387ba&pf_rd_i=14100223031 (amazon)
Take the reddit url as an example. Splitting it up into parts lets us see
https://www.reddit.com/
We’re going to redditr/
We’re going to a subreddit pagerust/
We’re going to the rust subredditcomments/
We’re likely to see comments6g3sc2/
Some opaque identifierbest_way_to_multithread_a_simple_function/
The post is about multithreading
All in all, the URL is fairly descriptive, if not a bit long.
The only actually critical part of the URL is the post id, 6g3sc2
. You can just go to
https://www.reddit.com/6g3sc2 and it will take you to that post. The rest of the URL is there to
show information.
GitHub
https://github.com/
We’re going to githubrust-lang/
It’s under therust-lang
user or organisationwg-allocators/
The repo iswg-allocators
issues/
We’re looking at issues17
The 17th issue.
The only non-descriptive element here is the issue number, but even then, because it’s all hierarchical, issue numbers tend to be far lower than a reddit post id, so it’s more feasible for someone to remember issue numbers.
And a github URL is very hierarchical. wg-allocators
could be completely different repos
depending on what user it’s under, and same for the issue number. This helps to keep identifiers
short, as they don’t need to be globally unique, just unique under the parent namespace.
All in all, compact and pretty informative as to roughly where you’re going. Maybe adding the issue title would help give more context, but it would just be some text that’s taking up space, since issue titles aren’t identifiers.
Rustdoc Source
https://doc.rust-lang.org/
We’re going to a documentation pagesrc/
Viewing the source code of somethingstd/
Thestd
crateio/
Theio
module instd
mod.rs.html
Themod.rs
file, rendered as HTML#502-964
Lines 502 to 964 are highlighted
There’s very little redundancy here, and all of the information is human readable and understandable.
Rustdoc Main
https://doc.rust-lang.org/
Again, we’re going to a documentation pagestd/io/
But not the source code, just theio
module in stdtrait.Read.html
We’re seeing the documentation for a trait named Read#method.read_to_end
And going to theread_to_end
method on it.
All in all, readable, and you can have a good chance of guessing how to link to something you’ve not seen the URL for.
Youtube channel
https://www.youtube.com/
We’re going to youtubechannel/
Viewing a channelUC1usFRN4LCMcfIV7UjHNuQg/
Some long opaque identifiervideos
But at least we know we’re seeing the videos.
This isn’t all that useful.
Noteworthy is youtube has user pages, which look like https://www.youtube.com/user/NurdRage.
This is an informative URL, very high signal to noise ratio. But channel pages get the fun
base64. But wait, there’s more! There’s also new style channels, which look like
https://www.youtube.com/c/Nighthawkinlight. These are like the user pages with a human readable
name, but using /c/
instead of /channel/
. Why? No idea.
Youtube video
https://www.youtube.com
We’re going to youtube/watch?
Watching a videov=dQw4w9WgXcQ
With this opaque video id.
It is opaque, but there’s not much shorter you can get, there’s not extra shit tacked on for the
fun of it. (Youtube has a short link in the form of youtu.be/<video id>
)
And you can modify the link to start at a specific timestamp, using t=1337
, where 1337 is the
number of seconds past the start of the video. I’d prefer it to be a colon delimited timestamp
though, as that’s more readable. But even still, anyone with a simple calculator can work out the
seconds to start at a given timestamp.
Discord
(Discord URL has been modified, but it doesn’t change the point)
https://discordapp.com/
Going to discordchannels/
Seeing a channel729293063826175566
Opaque server id729293064267238942
Opaque channel id738076547881893032
Opaque message id.
I would not at all be surprised if only the message id is really needed here. And it’s not like the server id and channel id are providing any useful information.
It’s not like you’re really meant to be using these, in any case. They don’t even embed, and using them’s a pretty bad experience.
Amazon
Okay… You ready?
https://www.amazon.co.uk/
Going to amazondp/
No idea.B0791RGQW3/
The actual unique product idref=s9_acsd_al_bw_c2_x_0_t?
And a whole bunch of opaque identifiers that amazon probably cares about, but I don’t.pf_rd_m=A3P5ROKL5A1OLE&
pf_rd_s=merchandised-search-11&
pf_rd_r=SQHJ5TXNCSRE9PRP4QAG&
pf_rd_t=101&
pf_rd_p=eb3feabb-ea62-4002-b39e-5ccc29f387ba&
pf_rd_i=14100223031
All of this shit isn’t needed, by the way. Just https://amazon.co.uk/dp/B0791RGQW3 works fine. So if you’re sending an amazon link, strip the tracking shit out.
Conclusion
Opaque URLs like youtube’s are good in that they don’t contain any information that might need to change. For instance, if you have a deadname in a URL, perhaps as a github username, that’s a problem. At best, you’re able to change it, and have hard redirects to the new name, but there will still be old links floating around with the old information.
On the other hand, this advantage of not keeping any information has the issue that… the URL provides no information.
A URL doesn’t need multiple opaque identifiers though. If you are going to do that, then either make use of the hierarchy to shorten the URL, or cut leave it as a direct link to the object, and cut out the hierarchy.
Informational text that doesn’t help resolve the URL can be useful, but it should be obvious to a user what’s informational text, so they can strip it out.
Any URL components that are not human readable and that don’t help resolve the page can, and
should, just be removed. Or at the very least, if you do feel the need to track users, use a
smaller identifier than what amazon uses. And name it something obvious, like ?tracking
.
If you’re going to have an identifer that’s intended to be opaque, it should be completely opaque. Don’t use the timestamp or any structured data in it. An exception being using, say, an issue number, because the number there is known to the users and it’s reasonably expected to be public. But seeing a URL should not let anyone who is not the website see any information about the user who created the URL.
If you have anything after a hash, that better take you to a specific part of a page, with the URL without the hash still being valid.