Torchons et serviettes

L’adage nous demande de ne pas les mélanger. Pourquoi ? Allez savoir…

J’ai enfin pris le temps de séparer les articles classiques tel celui-ci et les articles métalliques du vendredi soir. Pourquoi ? Pour vous offrir deux flux RSS/ATOM. Ainsi, vous avez le choix : Suivre un flux, deux flux, aucun flux… Les liens sont en fin d’article.

J’en ai profité pour répondre à une demande légitime : Remettre le contenu des articles dans le flux afin de ne pas vous obliger à utiliser un navigateur internet.

les flux RSS avec Emacs orgmode-publish

Je me sers d’Orgmode publish pour construire ce blog et la capsule gemini associée. J’utilisais ox-rss.el pour le flux RSS et c’est pourquoi ce flux ne contenait pas le contenu de l’article. J’ai tenté de patcher ce module mais je me suis rabattu sur un langage plus simple pour moi afin malgré tout d’avancer. Il sera toujours temps de patcher cela plus tard.

Buildafeed

Je me suis dit que le plus simple serait de construire le flux ATOM depuis le dossier contenant les pages html produites après être passées à la moulinette de publish.

Le seul pré-requis est de nommer les fichers selon le schéma YYYY-MM-DD-titre-de-l-article.html où YYYY-MM-DD représente la date de publication.

Après un peu de nettoyage, je vous propose un script fonctionnel, mais un peu dégueux, mais fonctionnel.

Voici buildafeed qui utilise les modules python beautifulsoup4 et lxml afin de parser facilement les pages html :

#!/usr/bin/env python

from bs4 import BeautifulSoup
from datetime import datetime
import glob
import os
import sys
from xml.etree.ElementTree import Element, SubElement, tostring
from xml.dom import minidom


PSEUDO = "fredg"
BASE_LINK = "https://galusik.fr/log"
FEED_LINK = BASE_LINK + "/feed.xml"

# blog posts
LOG = "/home/f/g/galusik.fr/log"
# fridayrockmetal posts
FRM = "/home/f/g/galusik.fr/frm"

# feeds
LOG_FEED = LOG + "/feed.xml"
FRM_FEED = FRM + "/feed.xml"


def prettify(elem):
    """Return a pretty-printed XML string for the Element."""
    rough_string = tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")

def create_atom_feed(blog_dir, feed_file):
    # Create the root element
    feed = Element('feed', xmlns='http://www.w3.org/2005/Atom')

    # Add feed metadata
    title = SubElement(feed, 'title')
    title.text = "@fredg's feed"

    link = SubElement(feed, 'link', href=FEED_LINK, rel='self')
    updated = SubElement(feed, 'updated')
    updated.text = datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')

    author = SubElement(feed, 'author')
    name = SubElement(author, 'name')
    name.text = PSEUDO

    # Get all HTML files in the blog directory, excluding index.html
    html_files = glob.glob(os.path.join(blog_dir, '*.html'))
    html_files = [f for f in html_files if not f.endswith('index.html')]

    # Sort files by date in descending order
    html_files.sort(key=lambda x: os.path.basename(x)[:10], reverse=True)

    # Add each entry to the feed
    for html_file in html_files:
        entry = SubElement(feed, 'entry')

        # Extract the title and date from the filename
        basename = os.path.basename(html_file)
        date_str, title_text = basename[:10], basename[11:-5]

        title = SubElement(entry, 'title')
        title.text = title_text

        link = SubElement(entry, 'link', href=f'{BASE_LINK}/{basename}')

        # Set the id to the file path
        id = SubElement(entry, 'id')
        id.text = html_file

        # Set the updated and published date
        published = SubElement(entry, 'published')
        published.text = f'{date_str}T00:00:00Z'

        updated = SubElement(entry, 'updated')
        updated.text = f'{date_str}T00:00:00Z'

        # Read and clean the content of the HTML file
        with open(html_file, 'r', encoding='utf-8') as file:
            soup = BeautifulSoup(file, 'html.parser')

            # Remove head, preamble, and topnav div tags
            if soup.head:
                soup.head.decompose()
            if soup.preamble:
                soup.preamble.decompose()
            topnav = soup.find('div', class_='topnav')
            if topnav:
                topnav.decompose()

            content = SubElement(entry, 'content', type='html')
            content.text = str(soup)

    # echo var
    print(f"Source Dir:{blog_dir}\nGenerated feed file: {feed_file}")
    # Write the feed to the file
    with open(feed_file, 'w', encoding='utf-8') as f:
        f.write(prettify(feed))

# Usage
def main():
    try:
        arg = sys.argv[1]
    except IndexError:
        raise SystemExit(f"Usage: {sys.argv[0]} log OR frm")
    # only update classical blog articles
    if arg == "log":
        create_atom_feed(LOG, LOG_FEED)
    # only update the fridayrockmetal feed
    elif arg == "frm":
        create_atom_feed(FRM, FRM_FEED)
    else:
        print('log or frm args are expected as only arg.\nAborted!\n')


if __name__ == "__main__":
    main()

Libre à vous de l’améliorer et de partager ces améliorations (ou pas). Pour faciliter la chose, j’ai mis tout ça chez SourceHut.

🙂