HACKER Q&A
📣 b20000

How do you keep file archives going back 20 years organized?


How do you deal with organizing files personally or for your business? What is your system and how do you regularly re-organize everything so you can always find what you need quickly? Do you keep everything on your desktop, laptop, or do you have dedicated small PCs for certain types of files? What file systems and what OSes? How do you share files and collab with your spouse? What do you do for networking, maintenance, backups etc?

** personally less interested in cloud based solutions as I've recently exited all cloud stuff including google accounts.


  👤 khaledh Accepted Answer ✓
I use paperless-ng hosted on a small VM on DigitalOcean (with automatic backups). I pay about $8/month.

- For physical documents: scan and upload manually.

- For email attachments I connected paperless-ng to my gmail account such that any email labelled with a special label gets its attachments synced automatically.

In paperless-ng I use tags for any topic I want to search by. Those can be specific or generic, no hard and fast rule here. One major feature I use is views: I create different views for frequently accessed topics, e.g. "House", "Taxes", "Health", "Year: 2022", "Year: 2021", etc.


👤 kimchidude
You say you’re not interested in cloud based solutions, but I think this is at odds with any practical way through which to share files. Here’s what I do:

I divide my life into a core four DropBox folders, so one for family, work, projects and education. Work and projects contain folders labelled by year, which lets me access time sensitive stuff pretty quickly. I’m always learning, so the education folder is subdivided into topics and I dump cool articles and links in there.

To find what I need, I do a lot of general querying. I used to keep separate external HDs for stuff, but I’ve just found this adds complexity I don’t need, and in one case an HD went dead and I lost a bunch of important photos. I’ve used the DropBox approach now for seven years and have lost nothing.

My wife and I have DropBox on all our devices, so we can access anything at anytime and photo synching is automatic. We also a have synced SimpleNote account on our phones where we collaborate in real-time on grocery lists, plans, movie lists (we chalk up anything we hear of and want to see), and recipes. Plans with other couples are done through a bunch of dedicated Whatsapp groups, which I’m not a huge fan of but we have yet to find something better.

Important physical documents are scanned (stored in ‘Family’ folder somewhere) and the docs themselves are filed in a filing cabinet. We have a safety deposit box for paper walleted crypto, historic family documents and physically small historic family heirlooms.


👤 softwaredoug
For the longest time, I had this idiotic system where I copy my old “Documents” folder as a sub directory on the new Documents folder.

after a while, this became all archived on an old Linux box in my basement running Samba so everyone could connect to it.

Eventually I gave up and archived this on a our service.

These days I treat anything on a PC as ephemeral. Almost all work is done in GitHub, Google docs etc. these are major services, so I’m not too worried in the near term.

Yet I shudder to think what I’m going to do when one of these single points of failures goes away.


👤 prepend
I have a local NAS not accessible off my LAN. I have a couple different folders.

I have accounts for family members to access and I manually copy stuff back and forth to Dropbox for things I need remotely.

One folder for documents that’s flat with all files prepended with the relevant date. This is everything from receipts for my car maintenance to passports to mortgages.

One folder for photos with a sub folder fir each year.

One folder for taxes with paystubs, filings, and all tax related docs.

One folder for videos with subs for movies, tv, and autotorrented.


👤 dpifke
I use git-annex: https://git-annex.branchable.com/

Repositories are synced (via essentially `git push` and `git pull`) between my laptop and desktop and home server, plus encrypted offline backups on two removable drives which take turns alternating between my safe deposit box and a desk drawer. (All my machines are currently Linux, but nothing has changed between now and when my desktop was Windows/WSL2.) The one time (in ~15 years) I had some corruption (dying laptop SSD), the Git history proved invaluable in figuring out which copies were intact and which had synced the corrupted files.

Files are organized in the repository roughly by topic, e.g. "/Finances/MyBank/Statements/", with the date as part of the file naming convention, e.g. "2022-07-20 MyBank Statement.pdf". This lets me sort by date despite Git not preserving mtime on checkout. Renaming/moving a file is just a metadata update in Git, and Emacs dired mode (https://stackoverflow.com/questions/15881776/emacs-dired-ren...) is my friend for bulk renames.

I use the same system, but different Git repositories/git-annex remotes, for work and personal. For company books, I use beancount (https://beancount.io/), with the ledgers for each account (e.g. "MyBank.beancount") in the same repository. Tags make linking transactions and documents easy. For example, invoices get added as:

  2022-07-01 * "SomeCo, Ltd." "Security Deposit" ^invoice-someco-1234
     Liabilities:AP:SomeCo
     Assets:Prepaid:Deposits:SomeCo              700.00 USD

  2022-07-10 document Liabilities:AP:SomeCo "2022-07-10 SomeCo Invoice.pdf" ^invoice-someco-1234
After the invoice gets paid and the bank statement is reconciled, the checking ledger gets entries that look like:

  2022-07-11 * "SomeCo, Ltd." "ACH Payment" ^invoice-someco-1234 ^statement-checking-2022-07-20
     Accounts:MyBank:Checking
     Liabilities:AP:SomeCo                       700.00 USD

  2022-07-20 document Accounts:MyBank:Checking "2022-07-20 MyBank Statement.pdf" ^statement-checking-2022-07-20
(This renders links in the web interface for jumping between transaction legs and supporting documents.)

The running theme is Git + plain text, which lets me leverage familiar tools.