Recently I decided it would be fun to write a URL shortener. Something I can run on my domain to share shortened URLs that redirect to a longer one. (If you want to skip the blah-bidy-blah, here is the code.)
My requirements were fairly straightforward:
- Generate unique keys that are as short as possible.
- Simple web interface to shorten a URL.
- Redirect requests for generated keys to the corresponding URL.
- Secure against external abuse.
- Build the simplest thing that works.
While I generally prefer Python, setting it up on a web server is something of a bother. My server already supports PHP, so I created a one-letter subdomain on bagaag.com and started with an empty PHP file. I coded in Visual Studio Code connected to the server via SSH. ♫ It’s my server and I’ll code on production if I want to! ♫
The first requirement was the most interesting to me. I’ve often wondered about the algorithm that sites like tinyurl.com use. I thought about the problem, what kind of keys I’d like to see, and came up with this algorithm:
Given a unique set of characters, iterate through each in order, adding characters as required to retain uniqueness.
I implemented this in a function that takes as input any string the algorithm might generate and returns the next string in the sequence.
// Unique set of characters for generating keys
$charset = 'acr8mqbs7di4tuh6e9fjv2gwx0nlk5poy1z3';
// Provides the next key by incrementing the previous key.
// Start with empty string and then feed the return value
// back into this function to get the next key, and repeat.
function increment_string($s) {
global $charset;
// convert the charset to an array of characters
$chars = str_to_chars($charset);
// if the string is empty, return the first character
if ($s === '') {
return $chars[0];
}
// convert the input string to an array of characters
$string = str_to_chars($s);
// iterate over the input string characters from end to beginning
for ($index = count($string) - 1; $index >= 0; $index--) {
// find and validate the position of the character in the charset
$char = $string[$index];
$pos = array_search($char, $chars);
if ($pos === false) {
return "Character not in set: $char";
}
// if the character is not the last in the charset, increment
//it and break
if ($pos < count($chars) - 1) {
$string[$index] = $chars[$pos + 1];
break;
}
// character is the last in the charset; wrap around to the
// first character
$string[$index] = $chars[0];
// if we are at the beginning of the string, prepend the first
// character and break
if ($index === 0) {
array_unshift($string, $chars[0]);
break;
}
}
return join('', $string);
}
I like this algorithm because it can be easily customized in terms of the characters it uses in the keys. The value for $charset
came from randomizing “abcdefghijklmnopqrstuvwxyz0123456789”. I did that so the keys look meaningless to the casual observer. If you change $charset
to 'abc'
the sequence of keys looks like this: a b c aa ab ac ba bb bc ca cb cc aaa aab aac.… So by giving it a whole bunch of characters to choose from and randomizing them, you get seemingly meaningless and unique keys that are as short as possible.
I could have included capital letters and exponentially increased the key capacity, but mixed case keys look messy to me and I don’t need the capacity. Without them, I’m only waist deep in four character keys at the one million mark. I’m not building a SaaS product — this is just for personal use.
After writing this, I did some research on how this is generally done to see how far off I was. A base-62 encoder seems like the obvious path among the various implementations I found. Instead of math, I used fairly efficient string manipulation to achieve a similar result.
With the fun part over, I whipped up storage for the last key generated and a mapping of keys to URLs. Again, simple as possible. Even SQLite would be massive overkill here.
$last_file = 'last.txt'; // file to store the last incremented key
$urls_file = 'urls.txt'; // file to store the URLs with their keys
// Returns the contents of $last_file, or empty string if the file
// does not exist
function get_last() {
global $last_file;
if (file_exists($last_file)) {
return trim(file_get_contents($last_file));
}
return '';
}
// Writes to $last_file
function set_last($s) {
global $last_file;
file_put_contents($last_file, $s);
}
// Reads the last key, increments it, saves it, and returns it
function next_key() {
$last = get_last();
$next = increment_string($last);
set_last($next);
return $next;
}
// Adds a URL to urls.txt and returns the generated key
function add_url($url) {
global $urls_file;
$key = next_key();
// append the key and url to urls.txt
file_put_contents($urls_file, $key . ' ' . $url . "\n", FILE_APPEND);
return $key;
}
// Looks up a key in urls.txt and returns the corresponding URL,
// or null if not found
function get_url($key) {
global $urls_file;
$lines = file($urls_file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($lines as $line) {
list($k, $url) = explode(' ', $line, 2);
if ($k === $key) {
return $url;
}
}
return null;
}
For security, I started by limiting the domains that can be shortened to a defined list. Then I decided I’d rather just have that be unlimited for my personal use and added a hard-coded password. You can use either or both of these options by tweaking the configuration at the top.
// list of allowed domains for URL shortening
$allowed_domains = [];
// password to shorten a URL, leave empty to disable password protection
$password = 'open sesame';
With all the necessary functions out of the way, I focused on the HTTP request handling to tie them all together. get_input
just looks for a variable in either _GET or _POST.
// read input parameters
$url = get_input('url');
$input_password = get_input('password');
$key = get_input('key');
// If a URL is provided, attempt to shorten it
if ($url) {
// if a password is set, check it
if ($password !== '' && $input_password !== $password) {
http_response_code(401); // Unauthorized
echo "Unauthorized: Incorrect password.\n";
exit;
}
// return 422 if the URL is longer than max_url_length
if (strlen($url) > $max_url_length) {
http_response_code(422); // Unprocessable Entity
echo "Invalid URL.\n";
exit;
}
// return 403 if the URL's domain is not in the allowed list
$parsed_url = parse_url($url);
if ($parsed_url === false ||
!isset($parsed_url['host']) ||
!test_host($parsed_url['host'])) {
http_response_code(403);
echo "Forbidden: Domain not allowed.\n";
exit;
}
// validate the URL and shorten it
if (filter_var($url, FILTER_VALIDATE_URL)) {
$key = add_url($url);
$shortened_url = "https://$_SERVER[HTTP_HOST]/$key";
echo "$shortened_url\n";
} else {
http_response_code(422); // Unprocessable Entity
echo "Invalid URL.\n";
}
exit;
}
// Otherwise, assume the request URI is a key to look up
$path = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
$key = trim($path, '/');
// If the key is valid, look up the URL and redirect
if ($key !== '') {
$url = get_url($key);
if ($url !== null) {
header("Location: $url", true, 301);
exit;
} else {
http_response_code(404);
echo "Not Found.\n";
exit;
}
}
And finally, the bare minimum HTML to make it usable.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Bagaag.com Micro URL Shortener</title>
</head>
<body>
<h1>Bagaag.com Micro URL Shortener</h1>
<form method="post" action="">
<label for="url">Enter URL to shorten:</label><br>
<input type="text" id="url" name="url" size="50" required><br><br>
<?php if ($password !== ''): ?>
<label for="password">Enter the password:</label><br>
<input type="password" id="password" name="password" size="20"
required><br><br>
<?php endif; ?>
<input type="submit" value="Shorten">
</form>
</body>
</html>
Anyway, I had fun with this. It’s up and running for me at r.bagaag.com. Here is a shortened link to see it working: https://r.bagaag.com/j
To run it yourself, just host it on a domain of your choosing, like r.yourdomain.com. Make sure PHP has write access to the files named at the top. I recommend placing the text files outside the site root, or configure your web server to block requests to them.
If you just want to test it locally, use the terminal to run it like this from the directory that contains index.php:
$ php -S localhost:3000
Then browse to http://localhost:3000. You’ll need to change “https” to “http” in the script if you’re not going to use HTTPS.