Programatically writing Pelican posts using Python

When using the Pelican static website generator, blog posts are typically written by a human. When written by a human, readability of the source of the post is of utmost importance. That is why the source of a Pelican article is usually written in Markdown or reStructuredText. However, if the goal is to write (and edit) articles using a program, Markdown and reStructuredText are not ideal because of the difficulty in parsing the components of the article (e.g. title, date, categories, content, etc.). It is a challenge to extract the metadata of a Pelican article when the article is written in Markdown. This article aims to show a way to programatically write and modify Pelican posts using Python.

In addition to the usual human-readable formats of Markdown and reStructuredText, Pelican also allows articles to be written in HTML. The ability to write articles in HTML makes programmatic writing easier because HTML can be parsed and edited more easily than the other available formats. Pelican makes use of a custom HTML format for representing pages and articles. Here's a sample blog article in HTML:

    <title>Our Happy Day</title>
    <meta name="tags" content="sunshine, awesome" />
    <meta name="date" content="2018-02-30" />
    <meta name="modified" content="2018-02-31" />
    <meta name="category" content="Life" />
    <meta name="authors" content="John Doe, Jane Doe" />
    <meta name="summary" content="A happy day full of sunshine" />
    <p>Once upon a time...</p>

The region enclosed by the <head> tags is the metadata section. The <body> is the content of the post. The filename of the HTML file must end in .html or .htm for Pelican to process the post appropriately.

As you may know, there is an excellent HTML parsing library for python, beautifulsoup, that allows one to write and modify HTML, and to extract information from tags through the use of selectors. To install beautifulsoup using pip, sudo -H pip3 install beautifulsoup4.

To write articles, I use a little utility function:

from bs4 import BeautifulSoup

def write_article(path, title, content, date,
                  category=None, tags=None):
    soup = BeautifulSoup('<html><head></head><body></body></html>')

    # Article metadata.
    head = soup.head

    title_tag = soup.new_tag('title')
    title_tag.string = title

    date_tag = soup.new_tag('meta')
    date_tag['name'] = 'date'
    date_tag['content'] = date

    if category is not None:
        category_tag = soup.new_tag('meta')
        category_tag['name'] = 'category'
        category_tag['content'] = category

    if tags is not None:
        tags_tag = soup.new_tag('meta')
        tags_tag['name'] = 'tags'
        tags_tag['content'] = tags

    # Article content.
    soup.body = content

    # Write article to file.
    html = soup.prettify()
    with open(path, 'w') as f:

An example of the function's usage:

    title='Wonderful Day',
    content='<p>I had such a wonderful day today.</p>',
    tags='awesome, love')

So that is how Pelican blog posts can automatically be written using a program, and this is all thanks to the ease at which HTML can be parsed. To make a real automatic writer, however, you will need to find a source for the content... Happy writing. Happy robotic blog post writing using Pelican!