Automating Links to POSSE Copies On a Static Website

Up until today, I’ve been adding the links to POSSE copies of my posts manually. It hurt so much that I didn’t do it at all most of the time. I’ve been toying with the idea of automating this task for quite a while, but with a statically generated website it’s not very straightforward. I made it work, though.

My site is built with Hugo. Each time I make an update, my server fetches the new copy from the git server, runs Hugo to build the site, and then runs static-webmentions to send the webmentions out. These usually include a webmention to Brid.gy so that the page gets published on Twitter (or GitHub). Like many other IndieWeb syndication tools, Brid.gy’s webmention endpoint replies with a 201 Created HTTP code, and the reply includes a Location HTTP header that holds the URL of the created page (a tweet).

Since version 0.7, static-webmentions looks for these Location headers and outputs a string like:

created for https://evgenykuznetsov.org/en/posts/2021/syndication/ is https://twitter.com/nekr0z/status/1475095590728081410

I can use awk to extract the page URL and the POSSE copy URL from this output, store it to a temporary file, and then feed them pair by pair to a frontmatter.sh script that would to the magic. This is what it looks like in the script that is used to build my site:

static-webmentions | tee >(awk '/^created for/ {print $3, $NF}' > ~/posse_list)

    [...]

while read mypage newposse; do
    frontmatter.sh $mypage posse $newposse
done < ~/posse_list

What is frontmatter.sh supposed to be doing? Well, every post of my site is stored in a Markdown file that has a little configuration “intro” at the beginning. These “intros” are called Front Matter in Hugo, and can be in various formats. I happen to like TOML best, so my Front Matters are all TOML. This means they start and end with +++, and also means that the links to POSSE copies that Hugo will correctly incorporate in my pages look something like this:

posse = ["https://twitter.com/nekr0z/status/1475095590728081410", "https://news.indieweb.org/en/evgenykuznetsov.org/en/posts/2021/syndication/"]

That’s the job for the script: first, to locate the file responsible for the page, and second, to add the URL of the POSSE copy to that file’s front matter. Figuring out how to do it took me quite a while, and the current version of my frontmatter.sh looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#!/bin/bash
set -e

findfile() {
        local __variable=$1
        local sitepath=`echo $2 | sed 's|^https://evgenykuznetsov.org/||g'`
        local lang="ru"

        [[ "$sitepath" =~ ^en ]] && local lang="en" && local sitepath=`echo "$sitepath" | sed 's|^en/||g'`

        local path=`echo content/"$sitepath"`
        if [ -d "$path" ]; then
                local path="$path""index."
        else
                local path=${path%"/"}
        fi

        if [ "$lang" = "ru" -a -f $(echo "$path""md") ]; then
                local path=$(echo "$path""md")
        else
                local path=$(echo "$path""$lang"".md")
        fi

        if ! [ -f "$path" ]; then
                local path=
        fi

        eval $__variable="'$path'"
}

fmadd() {
        if ! [ -f "$1" ]; then
                return 1
        fi

        sed -i '/^\+\+\+$/,/^\+\+\+$/{
                \|'"$3"'|!{
                        s|^\(\s*'"$2"'\s*=\s*\[..*\)\s*\]|\1, \"'"$3"'\"\]|g
                }
        }' "$1"

        sed -i '2,/^\+\+\+$/{
                /^\s*'"$2"'\s*=/{
                        h
                        s|\(\s*'"$2"'\s*=\s*\[\)\s*\]|\1\"'"$3"'\"\]|g
                }
                /^\+\+\+$/{
                        x
                        /^$/{
                                x
                                /^\+\+\+$/i '"$2"' = \[\"'"$3"'\"\]
                                x
                        }
                x
                }
        }' "$1"
}

url=`echo "$3" | sed 's|^https://||g'`
url=`echo "$url" | sed 's|^http://||g'`
for good in "twitter.com" "news.indieweb.org"; do
        if [[ "$url" == "$good"* ]]; then
                findfile filename "$1"
                echo adding "$2" = "$3" to $filename
                fmadd "$filename" "$2" "$3"
        fi
done

The findfile() function locates a file that resides in the content directory, and for a page URL like /en/posts/2021/syndication/ can be either posts/2021/syndication.en.md or posts/2021/syndication/index.en.md. For a Russian version there would be no /en at the beginning of the URL, and either filename.ru.md or filename.md would work. It returns the empty path if the file can’t be located.

The fmadd() function uses some heavy sed magic to add the necessary URL to the front matter. The first sed works when there already are some values in the necessary string, and the second sed is for the cases when the array is initially empty, or the string isn’t there at all.

The last part is checking whether the “created” object’s domain is in the whitelist of the domains that can host POSSE copies, as opposed to the webmention endpoints that work asynchronously and return the “tracking” URL in the Location header.

This approach is far from universal, of course, but it works in my setup. And, hopefully, this post will be of some use to those of you who look to solve a similar problem.