Make Archived Sites Fetch Web Pages With This Handy Tool

Archiving Services Web sites, such as the of Internet the Archive: the Wayback Machine , an incredibly useful when you want to see older versions of websites – either due for nostalgia, or due for what you are looking for a specific piece of information, which has since been rewritten or deleted (for example, you wrote a story, for example, for a former employer).

However, these services are not perfect. There are times when an archive site is unable to create a snapshot of the site — usually when you need that snapshot the most. Or perhaps someone configured their site’s robots.txt file to block automatic crawling by archiving services . Not fun.

Thanks to a new tool from Motherboard , you can now try to archive the current version of the site in three different archive services at once: Wayback Machine, Archive.is and Perma.cc (if you created a free account with them).

Installing the backup utility on the motherboard takes a little effort, but it’s not that hard. First, you need to install the Python requests , json and archiveis modules , which are required for the mass_archive tool to run on the motherboard. (Alas, this isn’t just some simple executable or utility that you can run.) The best way to install their requests and json is to install pip first and then use it to load modules. You will find archiveis here and you can install it using pip as well.

You also need to get the mass_archive.py script from the aforementioned GitHub project . When you’re ready, open a terminal in macOS or Linux and enter this (obviously replacing example.com with the site you want to archive):

python mass_archive.py example.com

If you are using Python from an elevated command prompt on Windows , you can omit the initial “python” in this code.

More…

Leave a Reply