March 11, 2023

When we think of a blog, we typically think of the content – but in practice, we also have to worry about the taxonomies. Tags and Categories, oh my! These are hard enough to manage with a live system like WordPress; how do we do it with a static system like Hugo, where it doesn’t come with any tools to help?

With a little clever hackery, that’s how…

The Problem

Taxonomies are a fun and extensive topic. Most often we see them as tags and categories; those are the two most common. In reality, they could be anything, but that’s beyond the scope of this post. For now, we’re only going to worry about the basics; you can easily apply this to anything else you’d like.

The base issue is that you don’t want your taxonomy to be too sparse. There are a few examples of that already on this blog: for example, the Networking tag has all of one post in it as of this writing. Kinda useless, right? That’s more acceptable for tags than it is for categories; to my thinking tags are more of a search aid, where categories are a navigational aid and thus need to be more robust.

Your opinion and/or strategy may be different, but that’s neither here nor there; the issue still applies.

How in the name of all that is good and Holy do I keep track of what my tags and categories are, and which to apply to a given post?

The more extensive your taxonomies are, the harder this gets…

The Solution

With something like Hugo, where you’re maintaing a blob of static files, there isn’t really an easy solution. Nothing is going to pop up while you’re writing and offer you a list of suggestions. You’re reliant on your memory, and the more you write, the more you’ll remember what to tag things with and how to categorize them.

But wouldn’t it be nice if you had a reference? And what happens if you get it wrong?

Hugo is free-form. You can simply add whatever term you want to your document, and it’ll happily add it to the specified taxonomy. This is a very good way to end up with sparse taxonomies – or in some cases, misspelled taxonomies due to typos. That would be extremely frustrating, wouldn’t it?

Thankfully, it’s simple enough to teach Hugo what the valid terms are for each taxonomy, and have it fail the build if the specified terms don’t match…

The Database

First things first, we need a list of what’s valid. Fortunately, Hugo offers a way to pull data in from YAML or JSON files; simply stick them in the data/ folder of your site, and it’ll add them to the .Site.Data object. From there you can get at all that yummy data with ease.

So for this purpose, we create a data file ("taxlimits.yaml" in the case of floating.io) and populate it with the terms we’re willing to have show up in our taxonomies:

---
tags:
  - '3.14159'
  - "C++"
  - "C++11"
  - Business
# ...

categories:
  - Apple
  - Blogging
  - Code
# ...

And so on and so forth. You can have as many taxonomies represented here as you like; it’s up to you. In my case, I’ve also added another feature, for “deprecated” terms. This was an artifact of the WordPress migration; even in the original data there were a few misspellings and whatnot, and that helped greatly in finding them – and reminding myself to go fix them!

Now I have a nice reference list for all of my various terms if I want to figure out what to tag a post with, or how to categorize it. Makes life much easier for my failing memory.

But how do I prevent errors?

Enforcing Your Desires

Ensuring that everything falls in those lists is easy enough. Hugo offers two functions – warnf and errorf – that allow you to emit both non-fatal warnings and fatal errors during the build process. The latter is the most important: we can effectively force a stop of the build if something happens that we don’t like.

To that end, I added a partial called check-legal-taxonomy.html that looks like this:

{{- /* Valiate our tags and categories to make sure we don't accidentally   */ -}}
{{- /* create a new one just by typing.  This is a guardrail to keep things */ -}}
{{- /* at least vaguely sane as far as the taxonomies go.                   */ -}}
{{- /*                                                                      */ -}}
{{- /* You can add new valid tags and categories in [data/taxlimits.yaml].  */ -}}

{{- range .GetTerms "tags" -}}
{{-   if not (in .Site.Data.taxlimits.tags .LinkTitle) -}}
{{-     errorf "ERROR: Invalid Tag [%s]. (%s)" .LinkTitle $.File.Path -}}
{{-   end -}}
{{-   if in .Site.Data.taxlimits.deprecated.tags .LinkTitle -}}
{{-     warnf "WARNING: Tag [%s] is deprecated.  (%s)" .LinkTitle $.File.Path -}}
{{-   end -}}
{{- end -}}

{{- range .GetTerms "categories" -}}
{{-   if not (in .Site.Data.taxlimits.categories .LinkTitle) -}}
{{-     errorf "ERROR: Invalid Category \"%s\". [%s]" .LinkTitle $.File.Path -}}
{{-   end -}}
{{-   if in .Site.Data.taxlimits.deprecated.categories .LinkTitle -}}
{{-     warnf "WARNING: Category [%s] is deprecated.  (%s)" .LinkTitle $.File.Path -}}
{{-   end -}}
{{- end -}}

{{- range .GetTerms "flags" -}}
{{-   if not (in .Site.Data.taxlimits.flags .LinkTitle) -}}
{{-     errorf "ERROR: Invalid Flag \"%s\". [%s]" .LinkTitle $.File.Path -}}
{{-   end -}}
{{-   if in .Site.Data.taxlimits.deprecated.flags .LinkTitle -}}
{{-     warnf "WARNING: Flag [%s] is deprecated.  (%s)" .LinkTitle $.File.Path -}}
{{-   end -}}
{{- end -}}

The purpose should be fairly obvious: check against our little database, and warn or error as appropriate. Then I can simply call this partial from _default/baseof.html, or any other base template:

<!DOCTYPE html>
<html {{ with site.LanguageCode | default site.Language.Lang }}lang="{{ . }}"{{ end }}>
  {{- partial "check-legal-taxonomy" . }}
  <head>
  ...

Why I didn’t make it the first line, I don’t recall. Doesn’t matter, in the end. As long as it gets executed, it will check the taxonomies of the page being generated, and that’s all we care about. Now, if I haven’t explicitly allowed a term into a protected taxonomy, the build will outright fail:

 GTU:LAB  sdoyle@dev-01:~/repo/floating.io/floating.io$ hugo -v
Start building sites … 
hugo v0.111.2-4164f8fef9d71f50ef3962897e319ab6219a1dad+extended linux/amd64 BuildDate=2023-03-05T12:32:20Z VendorInfo=gohugoio
INFO 2023/03/10 10:06:35 syncing static files to /
ERROR 2023/03/10 10:06:40 ERROR: Invalid Category "BadCat". [blog/2023/03-taxonomic-organization-with-hugo.md]
Error: Error building site: logged 1 error(s)
Total in 4852 ms

It will even exit with a non-zero exit code, so your build scripts can detect the failure and bail out. The warnings show up similarly, but don’t cancel the build, and don’t affect the exit code.

Conclusion

This was one of the more insidious issues I faced when moving to Hugo; I have a hard enough time just remembering how I’ve organized things. It was inevitable that I would make a mess of it if I didn’t manage to put some sort of guard in against it. This technique makes it fairly easy to ensure that invalid (or typoed) terms don’t end up where they shouldn’t be.

The only downside?

I still have to remember which terms I can use, which means frequent references to my taxonomy database file. My memory? Yeah, it apparently sucks.