How to Browse the GeoCities Archive

July 9, 2022

Prelude

If you've been around the Internet long enough, you've heard of GeoCities. Maybe you had a web site hosted by GeoCities; maybe you went with Tripod or Angelfire instead. But millions of people had sites with GeoCities, until Yahoo shut them down in 2009.

I didn't have a site hosted with them, but I have long been a fan of the early Internet, and the enthusiasm, optimism, and personal rather than commercial nature of early web sites. It's hard to find those sites on the Internet today. Some exist. Some existed back then. Sometimes blogs capture a similar quality, although their chronological nature lacks a certain je ne sais quoi that a manually organized site, like those on GeoCities, gives you. Sometimes you can find similar sites on more text-centric, harder-to-commercialize protocols today, such as Gemini.

But maybe you're like me and when you read about the 600 GB torrent of all of GeoCities, you thought, man, wouldn't it be cool to explore that someday? But where am I going to find that much free space, and is that torrent ever going to finish downloading? Wouldn't it be unfortunate if I downloaded half a terabyte only to realize it wasn't going to work for some reason?

Indeed, I tried downloading it about a decade ago, probably in 2012 after I'd bought a 2 TB hard disk. I never finished downloading it, didn't even get close. But I'm happy to say that today it's relatively easy to get a peak of GeoCities, it's approachable even if you've only ever used torrents for downloading Linux ISOs, and you don't actually need hundreds of gigabytes of free space.

Downloading and Extracting the Torrent

The first thing you'll need is to find the torrent. You'll want the patched one, which unlike the original will fully finish downloading if you have the space for it. The download link for the patched torrent is this magnet link.

But before you go download 641 GB, it's worth noting, if you're like me and use BitTorrent for pretty much exclusively Linux ISOs, that there are good clients nowadays, including some that have a key feature of being able to only download part of a torrent. Which is super helpful if you only have 500 GB of free space on any given drive, like I did. The GeoCities torrent is helpfully divided up by the first two characters of each site, so for example the Labyrinth neighborhood is in the "La" part of the torrent, and if my username were ajtjp, it would be in the "aj" part of the torrent. These parts aren't of equal size - "La" contains two top-level neighborhoods, for example, so it's going to be much larger than "Lb" - but it means that if there's a section of GeoCities you're particularly interested in, such as your own site, you can download a relatively small amount and find what you're looking for rather than the whole thing.

I'm running Windows, and found qBittrrent to be well-suited for this purpose. As an example, I'll be attempting to download the archive for http://www.geocities.com/TimesSquare/Ring/1700/civ2/zips/AlphaC15.zip, to see if it exists. This brings up a point worth mentioning - the GeoCities archive is not 100% comprehensive. Thought it obviously contains a lot, its creators did not have a list of absolutely everything on GeoCities, and thus some sites were missed.

Once you load the magnet link in qBitTorrent, you'll be able to go to the Content tab, which looks something like this (I've already downloaded a couple sections, so yours won't initially have the partially-downloaded segments).:

Now, open the Uppercase section (or Lowercase, as appropriate), and scroll down to the section corresponding to your first two letters. Shift-click to select a range, and then right click and set the priority to Normal or higher to start the download:

Once it finishes, right-click on the torrent in qBitTorrent and select "open destination folder"; this is where the files were downloaded. Navigate to the correct Uppercase/Lowercase folder, and you'll find the files, all 100 MB sections. Now, quit (don't minimize to tray) qBitTorrent, as it will still have locks on the downloaded segments than prevent decomrpession. Once qBitTorrent has quit, select all of the ones for your letter, right-click, and choose 7-Zip -> Extract Files in your context menu (you'll need to have 7-Zip, or another compatible decompression utility, installed; I recommend 7-Zip).

I like to configure it like this:

This changes the end from LOWERCASE\*\, to put it specifically in a folder for the given letter segment (and yes, I switched to a letter with fewer sections while writing the blog to speed things up - Ta has 121 sections or 12 GB, which was going to take a while at 3 Mbps!). You probably could leave it at the default, but if you're downloading multiple sections having this structure will make it easier to keep track of what's downloaded and extracted.

This is the result of the extraction process:

Next, right-click the only file in that folder, and select 7-Zip -> Extract Here from your context menu.

During the extraction, you'll likely get some "Confirm File Replace" dialogs, due to parts of the archive having files with duplicate names, and only capitalization differing.

This is an unfortunate consequences of GeoCities being designed to run on Unix, which allows file names that differ only by capitalization, and Windows not allowing that. In the example the lowercase one is larger, so presumably the correct version, but in some cases both versions may be intended to exist. There's no perfect way to resolve this without switching to a case-sensitive file system, so I tend to default to "Yes to all". My goal is surfing the archive as a browser, not hosting a 100% perfect match; if that were the goal I would switch to Unix or Linux for the task.

After that process, you'll have the final result, all extracted!

This also provides the counter-argument to my file structure: If you extract to the same folder for all of them (rather than my ak-Extracted folder), it'll be nicely organized as geocities/YAHOOIDS/letter/letter, so you can get a comprehensive structure. If you want to download most of the archive, that plus a case-sensitive file system is the way to go!

Now you can start browsing the downloaded pages! Do exercise some caution, as the archivers did not run a virus scan when they created the archive, so there's likely some old malware hiding amongst the files. But you're ready to go exploring!

In summary, an interesting new feature that so far has met expectations, and one I'll be playing around with more in the coming weeks.

Return to Blog Index