Skip to content Skip to sidebar Skip to footer

Best Way To Programmatically Save A Webpage To A Static Html File

The more research I do, the more grim the outlook becomes. I am trying to Flat Save, or Static Save a webpage with Python. This means merging all the styles to inline properties, a

Solution 1:

After walking away for a while, I managed to install a ruby library that flattens the CSS much much better than anything else I've used. It's the library behind the very slow web interface here http://premailer.dialect.ca/

Thank goodness they released the source on Github, it's the best hands down. https://github.com/alexdunae/premailer

It flattens styles, creates absolute urls, works with a URL or string, and can even create plain text email templates. Very impressed with this library.

Update Nov 2013

I ended up writing my own bookmarklet that works purely client side. It is compatible with Webkit and FireFox only. It recurses through each node and adds inline styles then sends the flattened HTML to the clippy.in API to save to the user's dashboard.

Client Side Bookmarklet

Solution 2:

It sounds like inline styles might be a deal-breaker for you, but if not, I suggest taking another look at Evernote Web Clipper. The desktop app has an Export HTML feature for web clips. The output is a bit messy as you'd expect with inline styles, but I've found the markup to be a reliable representation of the saved page.

Regarding inline vs. external styles, for something like this I don't see any way around inline if you're doing a lot of pages from different sites where class names would have conflicting style rules.

You mentioned that Web Clipper uses iFrames, but I haven't found this to be the case for the HTML output. You'd likely have to embed the static page as an iFrame if you're re-publishing on another site (legally I assume), but otherwise that shouldn't be an issue.

Some automation would certainly help so you could go straight from the browser to the HTML output, and perhaps for relocating the saved images to a single repo with updated src links in the HTML. If you end up working on something like this, I'd be grateful to try it out myself.

Post a Comment for "Best Way To Programmatically Save A Webpage To A Static Html File"