Simpler style scraping

Posted by Katie McLaughlin on July 22, 2014

If, for reasons, you want to take the style of a responsive website and use it in a local file, you could: load the site, download the page (using the "Web Page, Complete") setting. And then take hours trying to remove the DOM alterations that the browser performed on your behalf.

Or, you can use a simpler method.

  • Disable JavaScript (in Chrome: settings > advanced > Privacy > Content Settings > Disable Javascript
  • Load the page
  • Save as "Web Page, Complete" in a new folder, and call the file "index.html"
  • In the saved folder, change the "index_files" folder to "src"
  • In the index.html file, replace all "index_files" references to "src"

You should now have a somewhat exact copy of the site stored locally.

By disabling JavaScript, nothing can alter the raw html pulled from the source site, thus altering and cluttering the DOM. After pulling, re-enabling JavaScript will allow it to act on the pulled version, which should be very similar to the original source.

Be sure to remove any analytics or remote sources, depending on what the page had.

Also, be sure to enable JavaScript again after you finish, or everything will look 1995 (tip: try loading a google search results without javascript. Timewarp!)