Managing Lots Of Pregenerated HTML And Other Files With Pelican

Sat 14 June 2014 by Fred Clift

For one of the Pelican-managed websites I maintain, I have a lot of files that I don't really want to manage with Pelican, and in some cases, I can't easily, without lots of gymnastics.

In this case, I have about 30k html files that are a dump of an old phpNuke web forum. And, one directory that has a php app installed (Simple Machines Forum). I also have a a lot of other miscellaneous files that I'd like to go into the DocumentRoot of my web server

For the HTML, I could write a script to modify each file to include the meta tags that pelican needs, then put all the images into my images directory, and then put all the CSS into my template. For my archival content, that would be acceptable, but for the SimpleMachines forum, which is maintained by 3rd parties and updated regularly, it isn't practical. I would have to redo my changes every time they released a new version...

I could use STATIC_PATHS, with ARTICLE_EXCLUDES and PAGE_EXCLUDES to do what I want, but for some reason pelican still wants to look at each file. Doing this increased my build times 10x or so with my data. In short, I could use the static_paths to add all my files, and then the excludes settings to tell pelican to not process them.

Instead I tried the following, and use this now. In my top-level directory, I made a directory 'output-skel' and then I edited my Makefile. In the html target, I added this line as one of the commands that is run:

rsync -SHqav output-skel/ $(OUTPUTDIR)

so my html make block looks like this:

    rsync -SHqav output-skel/ $(OUTPUTDIR)

This way, I can put ANY file I want to include in my output in it's relative path in output-skel and have it show up where I want it.

This sure beats having 30k entries in my STATIC_PATHS. I put .htaccess files, static-html, some 'downloads' files there.

As an aside, I also add the following lines to my html make target:

    chown -R root:www $(OUTPUTDIR)
    find $(OUTPUTDIR) -type d | xargs chmod 750
    find $(OUTPUTDIR) -type f | xargs chmod 640

Just to keep things neat and tidy - both secure the files and allow the webserver to see them.