Archivers - Good Tools #16
20+ zero-cost tools to preserve websites and other online content from disappearing.
Hello and welcome to a new edition of Good Tools.
Today, nearly no-one believes that “if it’s on the Web it must be true,” but many people do believe that if something on the Web, there it will remain.
Unfortunately, there’s nothing farther from truth.
Content on the web changes, is removed, deleted, taken down. Very little remains of what is originally published.
So when you cite, mention or talk about something you’ve read online by linking to it, there’s always the risk that some day soon your links to that content will not work anymore.
That’s what is called “link rot”.
Roughly 70% of cited links in academic legal journals and 20% of all science, technology and medicine articles suffer from link rot. (Source: Perma)
History is being erased as it is being written.
The average life span of a webpage is between 44 and 100 days.
And even if you think that we won’t really lose much in the long run if we don’t get every website of interest preserved – the issue of “link rot” is a big deal, as half of all URLs in Supreme Court opinion citations are now dead.
Web archiving tools help individual entrepreneurs, curators, librarians as well as scholars, journals and courts prevent link rot by creating permanent, reliable, unmodifiable links to the online sources cited in their work.
I have gone out to find what free (and sme enterprise-paid) tools are out there to help solve this issue and I am back with a bunch of useful apps that can come in handy when in need of preserving specific pages or websites.
P.S.: My focus is on finding zero-cost tools that are as good or better than their paid counterparts, to save money and time to indie entrepreneurs.
I hope you find them useful.
In this issue:
Web Archiving Tools
Enterprise-level Archiving Tools
Web Archiving Utilities
Robin Good
Archivers, Preservers
1) Web Archiving Tools
Wayback Machine
The official web archiving platform allows to capture and publicly archive any web page on the Internet. Simply submit the URL of the content you want to be archived and the Wayback Machine does the rest. Useful also to access pages and websites that do not exist anymore.
Six Ways to Save Pages In the Wayback MachineBrowser extensions: Chrome extension, Firefox, Safari, Edge
Mobile apps: iOS, Android
100% free.
Perma
Open-source, web app developed and maintained by the Harvard Law School Library. Perma.cc downloads the content at any specified URL and returns a new URL (a “Perma.cc link”) that can then be inserted in a paper, article, blog, etc.
Perma URLs are not indexed by search engines like Google.
Free for academic use. Individual/private use starts at $10 for 10 links/month
.
HTTrack Website Copier
Open-source, downloadable, cross-platform, web crawler and offline browser. It allows users to download websites from the Internet to a local computer while maintaining their original structure. The downloaded website can be fully navigated in any browser.
Download for Windows, MacOS, Android, Linux and more.
100% free
.Conifer
Web archiving service creates an interactive and restageable archive copy of any web page that you browse, including interactive content such as video, audio as well as web pages with embedded media, complex Javascript, interactions and other dynamic elements. Supports the WARC file format.
100% free - Free accounts 5GB of storage. Get more by becoming a supporter..
WAIL (Web Archiving Integration Layer)
Cross-platform downloadable desktop app that integrates different web archiving tools to preserve and replay web pages (including Heritrix 3.2.0 for web crawling and OpenWayback 2.4.0 for replaying web archives). It allows you to create your own Wayback Machine.
Mac, Windows
Web Archiving Integration Layer (WAIL) Basic Operation [video]100% free
.Browsertrix
Open-source, cloud-based, high-fidelity, browser-based crawling service from the Webrecorder project initiative designed to make web archiving easier and more accessible. Replay archived content directly in your browser.
$30/month (with 100GB space) - Self-hosting option is 100% free
.ArchiveWebPage
Browser extension allows users to send archived items directly to Browsertrix to replay archived web pages (using the replayweb.page system) and to export archived pages in standard WARC and new WACZ formats (also in HAR and WBN archives).
Available as an extension for any Chrome or Chromium based browsers.
100% free
.ReplayWebPage
ReplayWeb.page provides an online web archive replay system in the form of a website which also works offline, allowing users to view web archives from anywhere, including their local computer or even their Google Drive.
100% free
.
ArchiveBox
Open-source CLI, desktop and self-hosted (you need to install it on your own server) web archiving solution to collect, save, and view websites and media content offline. It imports lists of URLs or you can schedule regular imports from your bookmarks, browser history, social media, RSS feeds, link-saving services like Pocket/Pinboard and from its Chrome extension. It then archives the content in multiple redundant common formats (HTML, PDF, PNG, WARC) and on the Internet Archive. Auto-extracts assets and media from pages and saves them in easily-accessible folders, with out-of-the-box support for extracting git repositories, audio, video, subtitles, images, PDFs, and more. It can be used to save copies of bookmarks, backup photos from Facebook, Instagram, Flickr or media from YouTube/Soundcloud/etc.. It saves research papers, and more.Get ArchiveBox on Linux, MacOS, and Windows (WSL2), or via Docker.
Demo
100% free.
Amber
Amber is an open source tool developed by the Berkman Klein Center for Internet & Society at Harvard University. It automatically preserves a snapshot of every page linked to on a website, giving visitors a fallback option if links become inaccessible.
Amber is available as a plugin for WordPress and as a module for Drupal.100% free
.Desktop and mobile app automatically downloads any site webpages, images, PDFs, style sheets and other files to your local hard drive, duplicating the site's directory structure. - can download entire websites including multimedia content effectively making a working, navigable localized copy for offline access and archiving.
Downloadable Mac / iOS app.
Pro version $4.99 (14-day trial).
.MySiteArchive
Web archival service. Captures screenshots, downloads source code, tracks Google Lighthouse scores and monitors DNS records.My Site Archive Demo [video]
14-day trial - Pro starts at $20/month
2) Enterprise Web Archiving Tools
PageFreezer
Enterprise web archiving platform for legal and compliance needs. Websites, social media and enterprise collaboration messaging archival.
Contact for pricing.
.
Web archiving, monitoring and surveillance solution built for governments, financial institutions, and brands. Capture and review everything from websites to social media channels and posts, SMS, WhatsApp, WeChat and Telegram in real-time.
Contact for pricing
.
CivicPlus WebSnapshot
Enterprise / public organizations web archiving service. Ensures compliance with USA state’s regulations by providing a snapshot of how any one of a public org website pages was displayed to citizens, at any point in time.
Web archiving service by the Internet Archive for collecting and accessing cultural heritage on the web. Has provided since 2006 web archiving services to over 800 organizations in over 24 countries. If you are an organization interested in archiving your web pages contact Archive.it here.
Paid service.
.
3) Web Archiving Utilities
Wget
Command-line app for technically expert individuals.
Crawls, archives and mirrors entire web or FTP sites.
For Windows - for Mac (Unix versions also available)
100% Free software
.Simple tool to create WARC files from any webpage.
You need to use WAIL or similar software to re-play archived web pages.
Chrome extension.
100% free
.Monolith
Open-source command-line tool bundles any web page into a single HTML file. Unlike the conventional “Save page as”, Monolith not only saves the target document, it embeds CSS, image, and JavaScript assets all at once, producing a single HTML5 document that can be shared or archived. If compared to saving websites with wget -mpk, this tool embeds all assets as data URLs and therefore lets browsers render the saved page exactly the way it was on the Internet, even when no network connection is available.100% free
.
SingleFile
Extension for Chrome, Firefox and Edge that will download a web page as a single HTML file. Images, etc. will be converted to Base64 and included in the single page archive.
100% free
.ArchiveReady
Verify whether a website or any other specific content published online is ready for being archived without problems. The web app checks how any URL is suitable for effective and reliable web archiving by analyzing accessibility, cohesion, metadata and standards compliance.
100% free
WARC Files
The Web Archive (WARC) file format pulls together all of your website archive’s files (images, metadata, and practically everything your site needs to run standalone) so that they are portable and self-contained in just one file.
The format was originated by the Internet Archive to preserve web data on a long-term basis.
To play back WARC files you need a compliant reader like:
Here’s the the full file specification as published by the International Internet Preservation Consortium (IIPC).
WARC is now also an international ISO standard (28500) for digital archives. As such, it’s been adopted by governments and other official bodies.
Vote Your Favorite Tools Category for Next Issue
I hope you have found this collection of tools useful for you.
I love to be helpful and to do so by researching, vetting and testing what I find in the little hidden corners of the internet.
If you enjoy and find useful what I do please consider supporting this work by placing a like at the bottom of this newsletter or by activating a premium subscription.
Thank you!
.
from sunny Holbox island
Robin Good
@Robin Good Thank you for this list. You made me think of "link rot" for the first time. If I read this right, does that mean the links I'm collecting in Zotero may rot in the future and I won't have access to the full data?