This is a brief overview of how I set up multiple blogs under one Jekyll install. It took longer than I expected, so maybe this can save someone some pain.

Contents

Goals

I like using markdown with latex in Jupyter notebooks, so I wanted to build a website based on those formats. These formats are also portable, and should make migration to other platforms straightforward in the future.

I wanted to use a static site generator for cheap and easy serving. This also makes it possible to develop a site locally (eg. on an airplane), which is awkward with something like WordPress.

In the past, I’ve used a WordPress instace per project. This resulted in multiple sites with separate themes / maintenance, in addition to per-site fees.

To avoid adding separate sites over time, or combining everything awkwardly into one feed, I wanted to be able to add additional blogs and sections over time, so that new sections did not require separate websites.

Why Jekyll?

Jekyll is a well-supported static site generator, with serving support on GitHub pages, as well as AWS Amplify Console, and many other vendors.

Getting LaTeX working

One of my goals was to have a place to write about machine learning, which requires a robust way to render equations. Here’s some nonsense I wrote for an ML blog post, for example:

Ian Goodfellow has a great post covering the basic mechanics for adding LaTeX support. (I also recommend his Deep Learning textbook.)

In summary, copy your theme’s _includes/head.html into your local project, and edit it to add the javascript includes you need.

You also need a markdown parser configured to preserve latex. Jekyll defaults to kramdown with latex parsing to MathJax enabled. This prevents markdown rendering and liquid template rendering from interfering with latex. The latex is preserved in MathJax <script> tags.

I considered pre-rendering latex to images, but using JS rendering is likely faster, since each device only needs to load the rendering JS once. JS rendering also preserves the information needed for accessible rendering.

Switching to KaTeX

I was concerned about using JS rendering over slow connections (for downloading latex assets, etc), so I wanted to try KaTeX, which is said to be smaller and faster.

Look at KaTex docs to see how to load KaTex assets from their CDN.

There’s one main difference between a Jekyll KaTeX install and a normal KaTeX install: your markdown parser has already annotated the latex and pulled it into MathJax <script> elements.

This means that you do not need to find LaTeX in text elements, but that you do have to render latex in MathJax <script> tags.

To do so, the files you need are:

  • General CSS: katex.min.css
  • General JS: katex.min.js
  • MathJax script renderer: contrib/mathtex-script-type.min.js. (See docs.)

The last file is what renders the MathJax script blocks.

Because the latex is already in MathJax <script> blocks, you do not need auto-render.min.js. Also, unlike auto-render, neither of these JS files need you to call anything on document load.

Meaningfully isolate blogs

Jekyll has some functionality to group posts into categories, and to add tags to posts. But there are some missing pieces:

  • I wanted the blogs to share one design, but have separate names, descriptions, navigation bars, etc.
    • For example, clicking “About” from a computer vision blog should present computer vision relevant information.
  • I wanted each blog to have eg. a list of tags that are relevant to that blog’s posts, so that navigating from post -> tag -> post doesn’t bounce between blogs.
  • Listing posts within a blog requires some work. There are both plugins and framework abstractions for this, but both require customization to use appropriate blog-specific headers.

This is a big topic, and I’ll only be able to sketch the basic design here.

Note: If you plan on implementing multiple blogs, I recommend reading all the options before starting. Some approaches have significant drawbacks that I only discovered after trying them. There’s a reason I wound up abandoning most of these.

For any of these (except the multiple-install route), begin by adding all _layouts and nearly all _includes files from your theme to your repository.

Our goal here is to change the site title, description, navigation bar, etc. for different logical sites within our Jekyll build. This requires making small changes to many of these, and it’s easier to adding them all at once. Committing unchanged versions makes it easier to track what you’ve changed.

Version one: Path-scoped page-level settings

While not well documented, Jekyll allows you to have multiple directories of posts within your repository. For example, you can add tech posts to tech/_posts/, and the directory structure before _posts will be used as default categories for those posts.

Jekyll allows you to set different (page-level / “frontmatter”) settings based on a path prefix. If we use this, then change all the settings we care about in _layouts and _includes to use page-level settings instead of site-level settings, we’d almost solve this.

Here’s an example of what that would look like, using the site title and navigation items from header.html as an example:

{%- assign default_paths = site.pages | map: "path" -%}
{%- assign page_paths = page.site_nav | default: site.header_pages | default: default_paths -%}
<a class="site-title" rel="author" href="{{ page.site_url | default:"/" | relative_url }}">
  {{ page.site_title | default: site.title | escape }}
</a>

What I changed here was adding the page-level site settings: page.site_nav, page.site_url and page.site_title. A similar change is added to footer.html to use page.site_description.

Then I can populate these settings for all pages within the tech/ prefix:

defaults:
  -
    scope:
      path: "tech"
    values:
      site_description: >-
        A long description of a tech blog.
        More text, etc.
      site_title: A tech blog
      site_url: /tech/
      site_nav:
        - title: About
          url: /tech/about/

So, are we done?

Sadly, no.

There’s a problem here: later on (hand waving a bit), we’ll want to use collections, one of the main building blocks Jekyll provides.

The problem is: we generally want to have similar functionality on each blog, which implies having collection types that span sites.

As far as I know, collection types can not have defaults assigned based on path, even if they have a matching permalink path. (If you know of a way to do this, get in touch.)

We could add overrides in each collection record frontmatter, but that’s a lot of copies of metadata to keep in sync.

The solution I found brings us to version two.

Version two: Store site configuration in yaml data files

For each site, add a yaml configuration file in _data/sites/, eg. tech.yaml and home.yaml.

Have a single path-scoped setting in defaults: eg. site: tech, then have templates load metadata from those data files:

    {%- if page.site -%}
      {%- assign site_cfg = site.data.sites[page.site] -%}
      {%- assign page_paths = site_cfg.site_nav -%}
    {%- else -%}
      {%- assign site_cfg = nil -%}
      {%- assign default_paths = site.pages | map: "path" -%}
      {%- assign page_paths = site.header_pages | default: default_paths -%}
    {%- endif -%}
    <a class="site-title" rel="author"
     href="{{ site_cfg.site_url | default:"/" | relative_url }}">
     {{ site_cfg.site_title | default: site.title | escape }}
    </a>

Here, we load the site’s configuration from eg. _data/sites/tech.yaml, and use the site_nav, title and site_url values it defines.

Now, when we encounter collection elements that can’t be auto-assigned to a site based on path, we can simply set a single site: tech setting, and pick up that site’s settings.

Alternative: Multiple Jekyll installs on one domain

A simpler solution might be to have two Jekyll blogs on the same domain, that essentially don’t know about each other. This is likely simpler, but may incur a maintenance cost, since each change has to be made in two places to keep similar functionality.

Note that this does not work with GitHub pages (at least, not nicely).1 This does, however, work fine serving from AWS Amplify Console.

I’ll sketch how to do this below. I’ll use home as the top-level site, and tech as the sub-site within the /tech/ path.

  • Create two sites in a git repository. One in subdir home/, and the other in subdir tech/.
    • Keep a single Gemfile and Gemfile.lock with all dependencies in the repository root.
  • Configure tech/ to build to ../_sites/tech:
    baseurl: "/tech"
    destination: "../_site/tech"
    
  • Configure home/ to build to ../_sites, without killing tech/:
    destination: "../_site/"
    keep_files: ["tech/"]
    
  • Add a build.sh script that builds both (calling eg. bundle exec jekyll build in both paths)

Adding navigation items from home to the tech blog is slightly awkward. You can manually configure navigation, as I did for the joint site; this is probably what you’ll eventually want to do.

But a quick fix is to add an empty link page at /link/tech/ for the navigation link, and use the redirect_to option of the jekyll-redirect-from plugin to redirect to the target URL:

layout: page
title: Tech blog
permalink: /link/tech/
redirect_to: /tech/

This technique also works to add navigation links to remote sites.

You can use bundle exec jekyll serve in either directory to preview one of the sites. To preview the combined site, you can build the project, then run (cd ./_site; python -m SimpleHTTPServer).

Serving a multiple-install Jekyll site from AWS Amplify Console

First, AWS Amplify Console works great for Jekyll sites. If you have your own domain, and are willing to serve its DNS via Route53, it’s really straightforward, and they’ll set up SSL certs and everything.

GitHub Pages is also really great, but it will be awkward with multiple Jekyll sites, and I didn’t attempt it.

I initially set up AWS Amplitude Console using a single-Jekyll install repository. It determines that it’s a Jekyll install and sets up a default build that worked out of the box.

Once it was working, I added a (trivial) build.sh script. I then downloaded Amplify Console’s build settings, and added them to my repository as amplify.yml. (If you stay with one repo at this point, you’ll want to exclude build.sh and amplify.yml from your build, although they are not particularly sensitive.)

Change amplify.yml to build using the build.sh script.

    build:
      commands:
        - ./build.sh

Then, as long as build.sh still builds the project, you can change the branch to have two Jekyll installs that build the combined site to _site, and everything will work.

Blog landing pages, category and tag views

Next, we want to create per-blog landing pages, as well as pages for categories and tags. Each of these pages will show a filtered list of posts. Category and tag pages will be themed according to the blog that they pertain to.

Jekyll has some support for viewing posts by category. The framework allows fetching posts by category, and there are plugins that add post-list pages per category. This is similar, but integrates with our notion of separate per-blog settings.

Configure a category collection

I used a custom category collection for this, similar to the approach described in this blog post.

I created a category collection in _config.yml with a custom layout:

collections:
  category:
    output: true
    label: "category"

defaults:
  -
    scope:
      path: ""
      type: category
    values:
      layout: "category"
      tag: null
  ...

… then added empty markdown files for each category and tag in _category, with frontmatter to configure a title, categories (the tag, for tags), a URL, and which site it belongs to:

---
title: "Jekyll"
categories: [tech]
tag: jekyll
permalink: "/tech/jekyll/"
site: tech
---

Set up a filtered post-list template

Add a _layouts/category.html layout for category elements.

We’ll reuse the logic on other pages that show filtered lists of posts (eg. top-level blog landing pages), so implement the logic in a new include, _includes/post-list.html, as follows:

{% comment %} This is slow, but it's a static site, so it's fine. {% endcomment %}
{%- assign filtered = site.posts -%}
{% for category in include.categories %}
    {% capture filter_exp %}post.categories contains '{{ category }}'{% endcapture %}
    {%- assign filtered = filtered | where_exp: "post", filter_exp -%}
{% endfor %}

{%- if include.tag -%}
    {%- assign filtered = filtered | filter: "tag", tag -%}
{%- endif -%}

{% if filtered.size > 0 %}
<h2 class="post-list-heading">{{ include.title | default: "Posts" }}</h2>
<ul class="post-list">
{%- for post in filtered -%}
    {%- include post-list-entry.html -%}
{%- endfor -%}
</ul>

The fragment above starts with all posts defined in the site, then performs a linear scan for each category or tag present, filtering the list down2. Each result is rendered by a new _includes/post-list-entry.html include, which is straightforward.

At the end of posts, in _layouts/post.html, add links to categories and tags from this post that have configured _category files:

  <div>
    {% assign cats = page.categories | concat: page.tags %}
    {% for cat in site.category %}
      {% assign name = cat.label | default: cat.tag %}
      {% if cats contains name %}
        <a href="{{ cat.url | relative_url }}">#{{ name }}</a>
      {% endif %}
    {% endfor %}
  </div>

Again, this is inefficient, performing a linear scan over all site categories.2

Add per-blog landing pages

Finally, add per-blog landing pages that invoke the filtered post list logic (for example tech/index.markdown).

Configure the landing pages to set show_posts: true, and use the home layout.

Then, edit home.html to include post-list.html when show_posts is set:

  {%- if page.show_posts -%}
  {%- include post-list.html categories=page.categories -%}
  {%- endif -%}

The top-level index.markdown will not have show_posts, so we avoid an awkward post list that mixes the blogs together.

List categories and tags on blog landing pages

We can also show tags and categories relevant to each blog on that blog’s landing page (also in home.html):

  {%- assign filtered = site.category -%}
  {%- for cat_name in page.categories -%}
    {%- capture filter_exp -%}cat.categories contains '{{ cat_name }}'{%- endcapture -%}
    {%- assign filtered = filtered | where_exp: "cat", filter_exp -%}
  {%- endfor -%}
  {%- if filtered.size > 0 -%}
    <h3>Categories</h3>
    <ul class="category-list">
    {%- for category in filtered -%}
      <li>
          <a class="category-link" href="{{ category.url | relative_url }}">
          {{ category.title | escape }}
          </a>
      </li>
    {%- endfor -%}
    </ul>
  {%- endif -%}

Customize CSS

The default Minima theme puts site CSS in /assets/main.css. To customize it, add an assets/main.scss file to your repository that includes the theme’s CSS and adds your customizations.

In particular, my install renders h1 elements smaller than h2 elements, inverting the hierarchy I’m trying to communicate.

---
---
@import "minima";

// Make h1 larger than h2
h1 { font-size: 37px; }

Serving your site

GitHub Pages

Since this is now a single Jekyll installation, we can serve the blog from GitHub Pages.

I was happy with GitHub Pages, and only explored alternatives when I was exploring a multiple-install site.

AWS Amplify Console

I was also quite happy with AWS Amplify Console. It lets you configure auxiliary versions on other domains, which I found quite useful for testing. Most small blogs should fit comfortably within the free tier.

Amplify Console is easier if you use AWS’s DNS product, Route53, to serve the domain you’re serving from. This requires familiarizing yourself with Route53 if you’re new to AWS. However, I think learning basic AWS services is a useful skill, so I’d suggest that this isn’t wasted effort. Route53 is fairly straightforward, and should feel familiar to anyone who has used other interfaces to manage DNS records.

I set up my root domain to serve from the master branch, and a subdomain to serve a staging version. Updates to either branch trigger the continuous build which publishes those updates.

Alternative: Serve from S3 directly

Under the hood, Amplify Console builds the site, copies the built assets to S3, and invalidates the CDN level cache.

You can also do this yourself, using the s3_website tool to copy to S3 and optionally invalidate CloudFront.

My read is that this might save on Amplify’s (already reasonable) bandwidth pricing. Additionally, you would not have to pay for building the site, although this more or less negligible.

Things that Amplify handles for you that you’ll likely have to do yourself include:

  • Provision an S3 bucket
  • Configure CloudFront
  • Provision SSL certs

The differences are pretty negligible, so I’d pick based on what UX you prefer: a continuous build based on source control, or a command to deploy a locally-built version.

Conclusion

Jekyll provides a lot of useful building blocks to create a static site with multiple blogs in separate sections, and I’m happy with the result.

The default theme looks good out of the box (although there are some curious styling decisions – eg. how different header types are sized). The ecosystem is mature, with a lot of good serving options and blog posts explaining how to use the framework.

That said, this still took more effort than I think is reasonable.

In particular, if the _posts type was handled a bit less specially, and more of its logic was shared with other collections, I think this would be simpler, and version one would be sufficient. We could split _category across directories, as we do for posts, and use path-based scopes for all types.

The documentation is also weak in places. For example: what are the possible attributes you can use for scoping within defaults:? Which paths or types can be split across directories? You have to deduce all of this from code, or just try approaches. Both of these take more time than I wanted to spend on this.

My hope is that this post might make these sorts of customizations easier, and ideally help others avoid some of the trial and error I went through.

Footnotes

  1. I think it would be possible to render the site to HTML locally, then uploading the rendered _site path as a separate GitHub repository. I have not tried this. 

  2. Working on static sites requires a change of mindset for those of us used to backend engineering and working at scale. Since rendering happens once in a site-build step, it’s simply not worth the time to optimize almost anything.

    At serving time, these are just files on disk, and everything serves in constant time, even though the templates are inefficient, and look very similar to serving-time template rendering (eg. django or similar).

    All of this hurts my head a bit, and goes against decades of experience, but I think the impulse to optimize here is misguided.  2