Version 0.1.0
This blog post is to earmark the initialisation of the various sub-systems and tools that I will be using to gain better control over my data.
Section 1 - Unified Universal File System Structure
Section 1, Sub Section 0 - UUFSS Intent
Unified Universal File System Structure (or UUFSS for short) is more of a methodology, rather than an actual piece of software.
The motivation behind me creating this can be laid out as following;
- I am sick and tired of having data 10, 20, even in some cases 30 directories deep.
- I am sick of having to constantly double-check myself (Does file X belong in Y or Z?)
- I am sick of having scattered backups, not knowing if a file really is backed up or if it's just copied somewhere on the same storage location.
- Not only that, but I'm annoyed by having multiple different underlying file systems that fundamentally work differently under the hood. (Currently BTRFS with a singular storage location using NTFS).
- Likewise, I struggle with making sure that my data is easily consumable and UUFSS is a means to fix that issue.
Section 1, Sub Section 1 - UUFSS Constraints & Assumptions
UUFSS does and should have constraints and assumptions as to not fall out of scope and/or attempt to do things that something else should be covered by.
Assumption #1 - People are fallible, tired and inconsistent.
This is true and this is exactly why I wanted to make the UUFSS in the first place. Having a clear directory structure, automation scripts alongside a rigid ruleset helps ensure that the UUFSS is strictly adhered to and if it is broken, a system can understand what broke it, why it's broken and what needs to be done to fix it.
Assumption #2 - Storage is heterogeneous by nature and unreliable over time.
Again, this is true and having something like UUFSS in motion helps combat that so that eventually, integrating something like the 3-2-1 backup strategy or the gf-f-s backup strategy is as painless as flicking on a light switch.
Assumption #3 - The structure should explain itself without external tools.
Whilst this is true, and the top level directory structure will go a long way into that understanding, the lower level directories are where the bulk of the file classification will be leaning into.
Section 1, Sub Section 2 - UUFSS' structure
With how the UUFSS has been architected, it assumes BTRFS as the base file system
with @ being an easy way to denote a sub volume of the root file system.
$UUFSS/
├── @
├── @adult/
│ ├── archives
│ ├── comics
│ ├── dojin
│ ├── games
│ ├── hentai
│ ├── images
│ ├── literature
│ └── video
├── @archives/
│ ├── assets
│ ├── backups
│ ├── virtual-disc
│ └── virtual-hard-drive
├── @audio/
│ ├── audiobooks
│ ├── music
│ ├── playlists
│ ├── podcasts
│ ├── recordings
│ ├── soundfx
│ └── soundtracks/
│ ├── film
│ ├── television
│ └── video-game
├── @documents
├── @images/
│ ├── animated
│ ├── artwork/
│ │ ├── digital-art
│ │ ├── drawings
│ │ ├── logos
│ │ ├── paintings
│ │ └── sculpture
│ ├── camera
│ ├── charts/
│ │ ├── area-charts
│ │ ├── bar-charts
│ │ ├── bubble-charts
│ │ ├── flow-charts
│ │ ├── flow-charts
│ │ ├── infographics
│ │ ├── line-charts
│ │ ├── maps
│ │ ├── matrices
│ │ ├── mind-maps
│ │ ├── miscellaneous
│ │ ├── organisational-charts
│ │ ├── pie-charts
│ │ ├── sankey-diagrams
│ │ ├── spiral-diagrams
│ │ ├── tree-maps
│ │ ├── venn-diagrams
│ │ └── word-clouds
│ ├── memes
│ ├── photos/
│ │ ├── family
│ │ ├── friends
│ │ ├── other
│ │ └── personal
│ ├── purpose-based
│ └── screenshots
├── @inbox
├── @literature
├── @projects/
│ ├── 00-09 Management & Meta/
│ │ ├── 00 Index
│ │ ├── 01 Inbox
│ │ ├── 02 Notes
│ │ ├── 03 ToDos
│ │ ├── 04 Bookmarks
│ │ ├── 05 Later
│ │ ├── 06 Templates
│ │ └── 09 Archives
│ ├── 10-19 Active Primary Projects/
│ │ ├── 10 Active Primary Project Documentation
│ │ ├── 11 Project Sabre
│ │ ├── 12 Project Clean Slate
│ │ ├── 13 Project AMELHA
│ │ ├── 14 Project Nova
│ │ ├── 15 Project Volpe8
│ │ ├── 16 Project Volpe Master
│ │ ├── 17 Project Ghost Volpe
│ │ ├── 18 Project Mitsuo Volpe
│ │ └── 19 Project Other Volpe
│ └── 20-29 Active Secondary Projects/
│ ├── 20 Active Secondary Project Documentation
│ ├── 21 Sub Project Dark Cherry
│ ├── 22 Sub Project Celestial Ingress
│ ├── 23 Sub Project Umbra Forest
│ └── 24 Project Xenia
├── @software/
│ ├── applications
│ ├── firmware
│ ├── scripts
│ ├── source
│ ├── systems
│ └── typefaces
└── @video/
├── Anime - TV
├── Anime - Movies
├── Cartoons
├── Collections
├── Movies
├── Multimedia Franchises
└── Television
References:
- Roboyoshi's Datacurator Filetree
- Used for the base file system
- Johnny Decimal
- Used for the root directory
@projects
- Used for the root directory
Section 1, Sub Section 3 - UUFSS Rules
Rules for UUFSS are as follows;
- Every file must have a single canonical home.
- Top-level directories are semantic, not technical.
- No user created files directly live at the root.
- Lower level may vary by domain, but must remain deterministic.
- Filenames must remain meaningful without metadata.
- Automation may assist, but it must not act silently without user input.
- Readability is prioritised over optimal packing.
- Duplication must be intentional and visible.
- Archives are cold-storage by definition.
- The structure must degrade gracefully if UUFSS is abandoned.
Section 1, Sub Section 4 - General Questions
- "At what point does a top-level domain become too broad and need to be split?"
"In theory, the base top-level domains shouldn't need to be split any further."
- "Is there a maximum acceptable depth for directory trees?"
"I would say anything under double-digits is more than fine."
- "Are there any domains that should never exist at the root, even temporarily?"
"Again, in theory, you shouldn't need anything more than the top-level domains listed."
- "How should canonical status be indicated when multiple identical copies exist?"
"Well, in theory, if you are doing backups correctly, the canonical location should be denoted by the storage location. For example, you have a copy on your daily driver PC, the second being on your NAS and a third in the cloud provider of your choice, the canon one is obviously the one on your daily driver."
- "Is deduplication a file system concern, a tooling concern or explicitly out of scope for UUFSS?"
"Well, in my opinion, deduplication is a tooling concern, not a UUFSS concern. How you do backups is always up to you."
- "How does data transition between active, archived and deprecated states?"
"This is actually a fascinating problem for UUFSS. Personally, I would assume anything older than a few months would fall into the archived state and deprecated is just an archived project that hasn't had an update in months."
- "Is 'archive' a terminal state, or can items be promoted back to active domains?"
"In my honest opinion, I feel as though not all data should be archived. Images and video come to mind however, documents should be archived very aggressively and this will be reflected in the tooling that I will be using for UUFSS."
- "Should time-based organisation ever override semantic organisation?"
"I do agree that there are some domains that lead well into time-based organisation much better than other domains, especially images. Most people would group images in such a way like
London Trip 2024so having it dealt with on a per-domain basis is the best way to go about it."
- "What actions are safe to fully automate without confirmation?"
"I would say anything to do with verification, be that directory names or file hashing and probably generating reports about a user's file tree too."
- "How should tools surface uncertainty or low confidence classification?"
"This part is easy, just add it to the end of the generated reports."
- "Should tooling enforce rules strictly or merely warn on violation?"
"In my opinion, tooling should do both and a third thing, it should ignore it if the end-user has specifically said ‘I want you to ignore this rule.’ all dependent on how integral the rule is to the UUFSS."
- "How much manual friction is acceptable during ingestion?"
"In theory, there shouldn't be any friction when it comes to ingesting data."
- "When speed and correctness conflict, which should default?"
"It should always default to correctness over speed."
- "What signals should distinguish 'missing', 'offline' and 'deleted' data?"
"Offline is pretty easy to work into tooling, just use system calls to check if a location is mounted and if not, then the location is offline. Missing would have to rely on if a file was moved and then re-added and deleted data would just be if the file has been completely removed from all storage locations ... Although, that would mean the user would have to also remove it from backup storage locations too."
- "How should corruption or partial loss be documented?"
"It should be documented in the report that is generated upon the tooling finishing ingesting all of the data in the storage location."
- "If UUFSS was abandoned, what minimal conventions must remain for the data to stay legible?"
"For the most part, nothing would change if the UUFSS was abandoned per se."
- "Are there existing tools or ecosystems that UUFSS should remain compatible with even at a cost?"
"Okay, so ... yes and no. What I mean by that is there is some tooling that UUFSS uses (FileBot/Kid3/Tellico/Czkawka/etc.) but don't necessarily rely on UUFSS to get the job done per se."
- "Is strict backwards compatibility a goal, or is controlled breakage acceptable?"
"For the most part, backwards compatibility is easy, but it's not a stated goal for this ... Project? Tooling? ... Whatever."