Hi! Recently I’m interested in digital archiving. I want to tidy up my own files and I’m also building my home server which will act (among many other purposes) as a storage for… everything, including archives - files I might never touch again but I also don’t want to lose.
I would appreciate some descriptions of how Lemmings are archiving their files. I mean mostly personal files, not bought media. In particular:
- family photos,
- home-related documents,
- job-related documents,
- school materials,
- medical documents,
- abandoned projects (software of other),
- travel related stuff,
- receipts, invoices,
- and more!
Some example questions I’m interested in:
- Do you ever delete anything or do you archive everything?
- Do you use dedicated software or do you just store plain old files on disk?
- Do you use archive formats? For instance ZIP, tar, etc.
- Do you use compression? Like gzip, zstd, xz, etc.
- What naming convention do you use?
- Do you use spaces in the filenames?
- What directory structure do you use?
Pictures from my phone onto a thumb drive. Mostly of my cats, since Google didn’t back up two years’ worth for some reason. I learned my lesson. Now if I could learn my lesson on how to do this myself, since my partner currently does it for me.
My scheme might come useful to somebody.
On the home server, there is a 2 Tb solid state drive and a 2 Tb spinning drive. Stuff goes on the SSD, it’s a regular ext4 volume mounted at /work. The spinning drive is configured to spin down after 10 min when not in use.
Once a week a script mounts the spinny one, runs an rsync from SSD to it, then unmounts. Thus the spinny one only runs about 20 min per week: it’s going to last forever. If the SSD borks it’s easily replaced and repopulated.
deleted by creator
Answering your questions specifically:
I mostly never delete anything. Storage space is pretty cheap these days. The exception is stuff that I’ve downloaded that’s large and likely to be easy to download again in the future, like popular TV shows or movies.
I store them as plain old files in a plain old directory tree. I actually don’t like using zip files for this sort of thing because if one gets corrupted somehow that could destroy everything in it. Why take the risk for minimal benefit? Compression doesn’t gain much, as I said storage space is pretty cheap these days.
No particular naming convention. I give the directories names that seem meaningful to me and I put them in a structure that seems meaningful. Some stuff is a bit more rigorously organized, for example I keep audio logs as a personal journal and those get automatically sorted into folders based on the date they were recorded. Same with photos. But the rest is just however seems right to me. Spaces are fine, it’s the 2020s, technology has advanced quite a bit since the olden days.
The result is that there’s a large amount of data that I would have no idea how to find or sort through easily. But I actually anticipated AI to some degree so that never bothered me, and now I can be pretty confident that within a few years I’ll have an agent running locally that I can point at my archive and say “hey, what was the name of my neighbors ten years ago, again? I’ve forgotten.” And it’ll dig out everything relevant to that. I’m already almost there for my audio journal, I’ve transcribed it all and built a little search engine for it.
I have a set of commands I run occasionally to back up my homedir. That contains pretty much anything that’s text, text-like and related metadata. Things like personal documents, code projects, etc. are all in there. Basically anything that isn’t enormous goes into that.
i.e. Software installed into my homedir by things like Steam and Wine are currently skipped. It currently runs to about 1GB, compressed.
Minecraft worlds are also skipped but get their own separate backup command set.
Never really got around to compiling those command sets into actual scripts. I kind of prefer to copy-paste them out of a text document and into the terminal so I can take action at each step if something goes wrong.
The major failing is that they each build a single tar.xz, which I then copy to an external 1TB drive. There’s no deduplication so that’s getting a bit full at this point.
Photos and media hoards (software installers, website rips, music) currently go on a single Storage drive that isn’t backed up. I should probably do something about the photos, tbh.
For photos I use synology photos (it’s a legacy that’s good enough and I don’t want to migrate my parents and in-laws to immich). Photos are backed up logically into folders in normal type files, so in case the software stops working, it’s all still there.
For documents I use paperlessngx. Its less friendly should it stop working (the documents are numbered going up, no human names) but they are still pdf/jpg whatever, so its just some more work to get to know what they are.
Both of these have lasted me for years now, so I’m not too worried for the future.
As for all other files, just some shared folders where I back stuff up based on my own use.
Most stuff at our home are analog and/or physical (be it documents, the few photos we care to keep, books, even movies and music are on DVDs and CDs). For digital files (including copies of whatever analog stuff we wish to make a backup of it):
-
Good old plain files. As much as possible, I will use standard file formats (aka no proprietary ones, as I want to be sure I’ll be free to quickly and easily move from any app and/or OS to another, like I did after almost 40 years using Apple moving to GNU/Linux)
-
No zip or compression. Plain copies on encrypted external drives (note the plural) and also a remote backup on an encrypted cloud storage that I use for archive purpose only, I use another cloud services for the storage of ‘active’ files. Both those clouds are EU-based and independent from the GAFAM (I’d rather spend my money with local businesses). They’re the German Filen.io (for encrypted archival), and the Swiss Infomaniak Kdrive for active files.
-
Local backups are done using rsync, while remote ones using their own dedicated apps because it does the job and I never bothered testing something else ;)
-
Backups directory structure is an exact copy of my working directory structure. In case I need to restore something I can then easily find it, and when I need to restore the entire backups, it’s just a matter of rsync-ing it back, which is real quick.
-
I don’t use different services for different type of files.
-
Legal papers, contracts,… We keep them as long as it’s necessary (ore required by law) then we toss them without any any worries. the print version and the digital one. Same with receipt. Since all my files are names based on the date of creation it’s easy to remove whatever becomes obsolete.
-
abandoned projects are trashed _the moment I decide they have no obvious use and won’t have in the future. But since any projetc I work on can last for many years, I seldom ‘trash’ anything after I started working on it instead I will mercilessly prune out most ‘good ideas’ and ‘maybe’ only keeping what’s truly worth my time.
Edit: typos + missing link.
-
All of the kind of files you listed currently live in a Nextcloud instance, although I also have a test paperless instance, especially for the scanned documents.
The only thing not in there are media files which can always be re-downloaded, I.e. files not created by me. Think music, movies, etc.