The database is an implementation detail
"The database is an implementation detail - agree?"
That should get some reactions!!!
And the answer is "sort of".
Some back-story
But first a bit of background.
When I was a kid there were some books or magazines or something that had a series of puzzles that I used to love.
They would describe something totally ordinary, but from the perspective of an alien who had no idea about how life on earth worked. I adored the idea of trying to figure out stuff, that we totally took for granted, but looking at it from another perspective.
Also, when I was a kid, my favourite book was "The Hitch-hiker's Guide to the Galaxy". It's still one of my favourites. On the surface, it's a silly sci-fi tale, but underneath, it's a deep book on philosophy[1] (and I ended up doing philosophy[2] at university). My favourite bit was, and still is, about the dolphins.
“For instance, on the planet Earth, man had always assumed that he was more intelligent than dolphins because he had achieved so much - the wheel, New York, wars and so on — whilst all the dolphins had ever done was muck about in the water having a good time. But conversely, the dolphins had always believed that they were far more intelligent than man — for precisely the same reasons.”
The same facts - dolphins muck about in the water while humans do important stuff - but two entirely different conclusions.
Because the same thing looks different depending upon your perspective[3].
What does this have to do with databases?
So far, we've been building up a series of "feature classes" that describe various bits of logic in our application. But we've pretty much ignored the database. Now we're at the stage where we need to start considering it.
But, from the point of view of the core functionality, the database does not matter. All the business logic cares about is that "we have a user object with a can?
method" and "we have a user-group object with a grant_access_to
method". These objects could be stored in database tables, they could be stored in flat files, in memory or in Redis. The business logic - what the app is actually for - doesn't care about storage. It only cares whether it can tell the objects it meets to do certain things and provide the responses that it expects.
However, now we're going to have a perspective shift.
Because, in order to get our original "feature" specification to pass, we need to start building the real-life objects that do the work on behalf of this business logic. And that means storage, it means user-interface, it means efficiency and it means performance.
Resets and configurations
The code I was showed you last time didn't work.
I had to do a load of tedious mucking about[4] because my locator and the events publisher both needed to be cleared out after every test. This was so previous registrations did not interfere with the current test.
I added a global after
block that called StandardProcedure.reset!
, which in turn removed all event subscribers from StandardProcedure::Events
, cleared out all registered objects from the locator, then reloaded all the defaults.
This in turn meant I had to add in a configure
option, which ended up being more complicated than I first expected. Calling StandardProcedure.configure
yields a block, passing in an instance of the StandardProcedure
object. That block populates the locator with the relevant objects - but crucially, it is not called until the StandardProcedure
instance is created. Because the locator is configured from multiple places (for example, each of our feature objects handles their own configuration), the configure
method actually stores an array of configuration blocks and calls them all, one by one, when required.
This ended up far more work than I was first expecting (about two hours), but sometimes your test suite needs privileged access to your objects so it can do what it needs.
The database structure
Our features mention objects with names like user
, user_group
, folder
, organisation
and permission
. Now we need to start thinking about what these actually are.
At this point, I've already made some deployment decisions about this system. Those decisions may be wrong, but we need to start somewhere and part of the idea of working this way is to try and make it easy to change our minds later[5].
I'm going to use sqlite3
for my data storage[6] because my clients are small organisations and, for the past five years, none of these systems have had to deal with continued heavy load. This means deployment becomes much simpler, as I don't need to worry about managing multiple servers. In fact, I'm hoping I can get away with a single Docker container per client[7], which using Dokku and Anvil, is incredibly easy to set up and maintain.
The file-system
I also know that this system is going to have some sort of "filesystem" represented within the database. Most businesses are very comfortable with the idea of putting documents into folders - whether they are stored on a network share or in Dropbox. And most projects fail, not for technical reasons, but for people reasons. Meet your audience where they are and gradually guide them forwards.
So, I'm making a design decision right now.
The system is going to have a hierarchy, built from "nodes". Folder
s will be a subclass[8] of Node
, as will Organisation
. I will use the Ancestry gem to organise the nodes into a hierarchy[9] and each node will have an optional, polymorphic, item
reference, which I'll talk about later.
Let me stress again - the business logic we've written so far does not care how the database is structured. So, if I get this wrong, I can change my mind and try something different[10] without worrying that I will be breaking everything that I've written so far.
So, I'm adding in a Node
base class, in a nodes
table. There will be a subclass of Node
called Folder
and a subclass of Folder
called Organisation
[11].
class Node < ApplicationRecord
include HasName
include IsActive
enum node_status: {primary: 0, linked: 1}
has_ancestry cache_depth: true, counter_cache: true
belongs_to :item, polymorphic: true, optional: true
has_many :permissions, dependent: :destroy
has_rich_text :description
def node
self
end
def folders
Folder.active.where(id: child_ids)
end
def organisations
Organisation.active.where(id: child_ids)
end
end
Don't worry about the HasName
and IsActive
modules - they simply define a to_s
method for models with a name
attribute and an enum status: {active: 0, inactive: -1}
so I have a simple "soft-delete"[12]. The folders
and organisations
methods are little helpers to filter the children
of this Node
(using Ancestry's child_ids
method).
We also want something representing a folder that is sym-linked across to another folder, so I'll add in a subclass of Folder
called LinkedFolder
. The special thing about LinkedFolder
is it's item
property points to the source Folder
and it's children
and child_ids
methods[13] returns the children
and child_ids
from that source folder.
I had a fancy idea to try and mimic how Unix filesystems work using hard-links. In Unix filesystems, there is a hierarchy of directory entries - each of these represents a "file" (which could be a directory, a volume, an actual file or something like a socket or process). The directory entry itself then points to an "inode" which represents the actual contents of that directory entry. So if you want to hard-link a file, say "example.txt", to appear in two separate directories, you would create two directory entries that both point to the same node. This is what the item
property on a Node
represents - although it's used slightly differently for LinkedFolder
s[14].
In the feature specification, we have a setup line @policy = a_file_attachment parent: @policies_folder
. The FileAttachment
is another ActiveRecord model, in the file_attachments
table and the class uses has_one_attached :file
to manage the actual file contents.
I created a ruby module called app/models/concerns/has_nodes.rb
that looks like this:
module HasNodes
extend ActiveSupport::Concern
included do
has_many :nodes, as: :item, dependent: :destroy
after_create :create_primary_node
end
def node
@node ||= nodes.primary.first
end
def parent_node
node.parent
end
def parent
@parent ||= parent_node&.item
end
def parent= item
raise ArgumentError.new("#{item} is not part of the node hierarchy") if item.present? && !item&.respond_to?(:node)
@parent = item
end
def child_nodes
node.children.active
end
def children
child_nodes.collect(&:item)
end
private
def create_primary_node
nodes.create! name: name, icon: icon, parent: @parent&.node
end
end
The specification for HasNodes
looks like this:
require "rails_helper"
RSpec.describe HasNodes, type: :model do
# standard:disable Lint/ConstantDefinitionInBlock
class HasNodes::FakeModel < ApplicationRecord
include HasNodes
self.table_name = :fake_models
end
# standard:enable Lint/ConstantDefinitionInBlock
before do
ActiveRecord::Base.connection.create_table :fake_models, force: true do |t|
t.string :name, default: ""
t.string :icon, default: "none"
end
end
context "when inserted into the hierarchy" do
it "is added to the top of the hierarchy" do
model = HasNodes::FakeModel.create! name: "My model", parent: nil
expect(model.parent_node).to be_nil
expect(model.parent).to be_nil
end
it "it added as the child of an existing item" do
parent_model = HasNodes::FakeModel.create! name: "Parent", parent: nil
child_model = HasNodes::FakeModel.create! name: "Child", parent: parent_model
expect(child_model.parent_node).to eq parent_model.node
expect(child_model.parent).to eq parent_model
end
it "it has child items" do
parent_model = HasNodes::FakeModel.create! name: "Parent", parent: nil
first_child_model = HasNodes::FakeModel.create! name: "First Child", parent: parent_model
second_child_model = HasNodes::FakeModel.create! name: "Second Child", parent: parent_model
expect(parent_model.child_nodes.count).to eq 2
expect(parent_model.children.count).to eq 2
expect(parent_model.children).to include first_child_model
expect(parent_model.children).to include second_child_model
end
it "copies its details to the node" do
model = HasNodes::FakeModel.create! name: "My model", parent: nil
expect(model.node).to_not be_nil
expect(model.node.name).to eq model.name
expect(model.node.icon).to eq model.icon
end
it "can only be added as the child of an item that HasNodes itself" do
non_node_object = Object.new
expect { HasNodes::FakeModel.create! name: "My model", parent: non_node_object }.to raise_error(ArgumentError)
end
it "automatically saves the node to the database when being assigned to an item that has been saved" do
existing_model = HasNodes::FakeModel.create! name: "My model"
node = existing_model.node
expect(node).to_not be_new_record
end
end
end
An object that implements HasNodes
has a primary Node
which is where it is "anchored" within the filesystem. It can also have an arbitrary number of other Node
s, which represent "hard-links" to that object. This means it can appear in the filesystem in multiple places. When you create a HasNodes
object, you can pass in a parent
object, which is either a Node
or another HasNodes
[15]. This tells HasNodes
where to this instance belongs within the hierarchy.
What this all means is that there are four classes of model within the system.
- There are some, like
User
andUserGroup
that sit outside of the filesystem hierarchy. - There are some, like
Folder
andOrganisation
that are descendants ofNode
and define the filesystem hierarchy. - There are some objects, like
FileAttachment
that are within the filesystem hierarchy by virtue of being attached to one or more nodes[16]. - And there are "pure"
Node
s (not subclasses) which position those objects within the hierarchy.
At the moment, I don't know if this is good design or over-engineering. The next few specifications will tell me - if they're hard to implement I should probably revisit this. But, once again, the business logic does not care, so I'm free to change my mind if it turns out I'm going down the wrong path.
The audit trail
After spending a good few hours sorting the reset!
and configuration
stuff, writing the migrations and models for the database[17] and speccing and debugging the HasNodes
module, I ran the full specification again. And now it fails on the final two steps:
When I look at the audit trail
Then I should see a log of how access to this folder was granted
I've written audit trails like these several times across several systems and I think I know how I need it to work.
But that's a tale for another day.
Like most of Douglas Adams' work. He was a philosopher at heart. ↩︎
Probably the only academic thing I've done that I've really enjoyed. It also turned out to be most valuable thing I've learnt, as it taught me how to manipulate abstract ideas - very important for software developers - and how to communicate those ideas to others - which is essential for pretty much everything you do. ↩︎
This is why discovering object-orientation was also the moment when programming clicked in my head. When you're in Object A, how you see things and what you can do is totally different to when you're in Object B. ↩︎
Another Hitch-Hiker's reference - "The Infinite Improbability Drive was a wonderful new method of crossing interstellar distances in a mere nothing'th of a second, without tedious mucking about in hyperspace." ↩︎
Defer the big decisions till as late as possible, because the start of the project is the time when you have the least understanding. ↩︎
In fact, I'm going to use LiteStack so everything runs off
sqlite3
- including ActiveJob and caching - and we should have no external dependencies such as Redis or Memcached. ↩︎I know lots of people will be screaming REDUNDANCY and HIGH AVAILABILITY at me. If this were Collabor8Online, or one of my larger clients who put tens of thousands of pounds worth of business through the system every day, then there's no way I would make these choices. But I know the StandardProcedure audience very well and they don't need that complexity. They've already been running on a single, small, servers for years and the odd hour of downtime is something they can live with. ↩︎
Using ActiveRecord's Single Table Inheritance ↩︎
The simplest thing with hierarchies is to use a
parent_id
field, but Ancestry uses apath
field that allows you to pull an entire subtree from the database in a single SQL query. And that is really valuable and an example of where our current "real-world" perspective has an impact on the design. ↩︎At least until I've put it live and my clients have got some real data in the system - at which point changing the database structure becomes a risky activity. ↩︎
I'm actually putting in a further subclass of
Organisation
calledOrganisation::Home
- if you look back at the specification, the system wants to copy folders from the "home" organisation to the newly created client organisation. TheOrganisation::Home
class handles this, and uses an ActiveRecordvalidates :type, uniqueness: {case_insensitive: true}
constraint to ensure that there is only ever one "home" organisation in the hierarchy. ↩︎As soon as you start creating audit trails (which we'll get to later) physically deleting records becomes a problem - not least because if something bad happens, you need to be able to see the actual things that caused the issues ↩︎
Which is added by the Ancestry gem and, by default, returns the children of this record from the tree. ↩︎
I read that Unix originally treated directories as inodes. The directory entry would point to an inode representing the directory contents - so the same directory could appear at multiple points in the filesystem. But it ended up causing a load of problems, so the Unix people decided to use soft-links rather than hard-links for directories. Instead of assuming I know better than them, I copied their strategy and "soft-linked" my
LinkedFolder
instead of "hard-linking" the sameFolder
object into multiple places. ↩︎Reading the spec back, I've realised I missed the bit about having a parent
Node
- it only checks for parentHasNode
s. So I'll have to go back and update that. ↩︎An inode in Unix terms ↩︎
I don't think I wrote a single unit test for the database models. I'm treating these as "dumb data objects" with very minimal behaviour. Any methods that they do have will be covered by a combination of the other sets of specs. This is probably frowned on by "clean code" purists but I suspect that, if I come across a method that needs a test writing, I'll recognise it as I start. ↩︎