12
posted ago by acasper ago by acasper +12 / -0

At least one centralized site archival service (archive.org) has already demonstrated that it is willing to takedown resources at the request of government. Others like archive.today do not seem to have responded to takedowns for legal content, but there are many tools that the government can use to force this action. If you want an excellent example check out this summary of the National Security Letter that LavaBit received and how they had to behave or risk incarceration (https://wikipedia.org/wiki/LavaBit#Ladar_Levison).

What steps can we take to create some redundancy for these resources? Well, it’s actually as simple as keeping a full copy of the page or site on your computer. This is a pretty big downgrade of accessibility, but it makes it incredibly hard for a central censor to eliminate all copies that could then themselves be infinitely copied and distributed. Once you have the file on your computer it’s just a question of making it accessible through platforms like IPFS that are censorship resilient (https://patriots.win/p/12i4DQgiWN/centralized-storage-services-lik/). But how does one actually download a website? Lets explore that!

When you go to any resource on the Internet there is a conversation going on between your computer, the series of networks that are carrying the communication, and the host of the resource. If you wanted to archive a PDF file hosted on a public website it’s pretty straightforward to just go to the URL of the file and then click a download button. That option is not explicitly available for say a blog or news article because it’s typically delivered in chunks. If you look at the code behind a webpage it’s a set of instructions that your web-browser uses to retrieve a bunch of files and then reconstruct the page on your end (this is why all the images on a page don’t necessarily load at the same time). Once that reconstruction is complete though, well the page is fairly static in most cases. There might be some animations or embedded files that stream in more smaller chunks (embedded videos) but for all intents and purposes we have a static page that is equivalent to that PDF file, just no download button.

The two solutions that occur to most people are screenshots, which are available pretty universally across all devices, and ‘Save as PDF’ which will do it’s best to convert the fully rendered page into a local paginated PDF. These are fine ways to go about storing small amounts of information, but what if there’s a dump of 100 links that you’d like to keep a copy of? How does one avoid that manual process of saving and organizing all of that information?

Enter programs like ArchiveBox (https://archivebox.io/) which automate the entire process of retrieving the data, converting it to a ton of different formats, sending the page to centralized online archives like archive.today, and also organizing all the locally stored information in a database. Did I mention that it’s scriptable as well? That means that you can feed it 100 or 1000 websites and it will chug through them while you’re drinking coffee. You can schedule it to grab an archive of a specific page at a regular interval. You can tell it that you only want it to create PDFs locally to save space. You can tell it to only send the files to a central archive and it will do that. The tool is open-source and the documentation for it is well-written and updated regularly. I wish I could say that it’s a breeze to setup, but it can be kind of a pain. It’s worth it once it’s running. Learning more about how your computer works is never a bad thing, just make sure you have backups!

As always, I’ll be in the comments to respond to any questions or concerns from pedes!

~

~

Comments (5)
sorted by:
You're viewing a single comment thread. View all comments, or full comment thread.
5
jubyeonin 5 points ago +5 / -0

Or go the boomer route and print stuff and post pictures of the printed stuff.

3
Trumpican 3 points ago +3 / -0

thats what i do, hay wait......

1
acasper [S] 1 point ago +1 / -0

Hehe that is also an option.