Nearly every application you ever write is going to need to store and retrieve data. It wouldn’t be much use if it was just a blank slate every time you used it.
But what, where and how you store that data is a big deal. Because arguably, the data is the single most important thing in almost every application.
Rails was originally built to work with relational databases. Things like MySQL, Postgres, Oracle (or in my case, when I first got started, Microsoft SQL Server). However, over the last few years, a whole host of alternative databases have sprung up (commonly called NoSQL), and people have figured out ways of making them work with Rails too. Some examples are MongoDB, Redis and Riak.
So which database should you choose?
Well, first of all we need to understand what we’re talking about here.
Relational databases have been around for a long time. Back in the 1970s, a load of work was done to build some principles around data that followed certain structures. By following a set of rules you could guarantee that the data was accessible in certain ways and you could perform certain operations on that data that would always work. There was a whole load of maths underlying this and the language used for accessing this data was known as the Structured Query Language – or SQL.
NoSQL databases are defined by their opposition to SQL. They each have different characteristics, different strengths and weaknesses and are intended for different purposes. MongoDB is a “document-orientated” database – each item is a document with no particular structure. Redis is a “network available data-store” – each item is a defined data-structure, with little or no relation to other items. Riak is a “highly-available key-value store” that can scale incredibly easily. And so on. Apart from the fact that they are not relational and don’t use SQL, they have little in common.
And so we return to the question of which to choose … and, as ever, the answer is it depends.
If your data is unstructured but has relationships between individual items then you probably want Mongo or something similar. It’s kind of like having a network file-system where files can contain anything, including links to other files.
If your data is structured but each item is unrelated to the rest and you need to access it quickly, Redis is for you. It’s incredibly well written, it’s incredibly fast and I trust it (unlike certain other NoSQL databases I won’t mention here). But Redis’ flaw was, until very recently, that it didn’t scale – it would sit on a single box on your network and hold its information in memory. That has changed with Redis Cluster, but that does change some of its characteristics.
Which is where Riak comes in – if you’ve got a small amount of unstructured data, Riak is a waste of time. But as you scale to billions of items Riak is designed to let you add in new servers at any time to expand the capacity of your system and deal with server-failure gracefully.
There are many other NoSQL systems, but I have no experience with them.
Why would you choose a relational (SQL) database?
Well, if your data is structured and has relations to other bits of data (which is where the name comes from) then this is the one for you. You define “tables” of data, each with columns of a given type – so the size of each row of data is known. Certain columns can refer to other pieces of data (foreign keys) and rules can be attached defining what happens when those relations are added or removed. And nearly every relational database engine has a system of indexing, allowing certain data to be accessed very quickly.
The real strength of relational databases come in the relations. Suppose you have a customer who places many orders for products at your store. Now you want to find every product a given customer has bought; that’s a simple query for a relational database but may or may not be trivial for a NoSQL database (you would need to maintain the links yourself). This may not count for much today, but when your customer asks you to write a sales report in six months it could be invaluable.
In twenty years of software development, for me there have been very few cases where a relational database haven’t been the right answer. Sometimes, augmented with a touch of NoSQL on the side, but it’s rarely the wrong choice, unless you’re doing something unusual. It may take a bit longer to set up, it may be harder to design, but those bits of maths that Mr Codd and his friends came up with fifty years ago still hold good today.