tidy_vig: Automatically reformatting generated HTML into something cleaner
Friday, February 4th, 2011As webmaster and secretary of various things I regularly need to upload minutes to websites and hence want to upload html files. While Open/LibreOffice’s export to html functionality works it doesn’t produce nice html. tidy is a useful tool for finding flaws in html and making it correct and nicer but it is not sufficient to accomplish this task on its own. Hence I have finally scriptified the various automatable parts of turning generated html into something publishable (this loses all style definitions so won’t look the same – use tidy_up if you want to avoid that).
#!/bin/bash
set -e #bail if something goes wrong
tidy_up='tidy -indent -modify -clean -bare -asxml -utf8 -wrap 80 -access 3 --logical-emphasis yes'
$tidy_up $1 #Normalise to lowercase and remove most rubbish
$tidy_up $1
$tidy_up $1 #Repeat until stabalises - this happens third time
# Get sed to select the range of lines to apply the replacement on first.
# No I don't know what is going on here.
sed -i '/