Of late, I've been trying to wrap my head around "Cloud Computing" as I'm sure it's the next big thing coming down the itube. In fact it's already here as evidenced by the number of big players offering cloud services. Just when I think it can't get better in steps Jim Starkey, the guy who created InterBase which would become a part of Borland's Product line and from there would eventually become FireBird RDMS. He also developed Interbase to be the first relational database to support multi-versioning, the blob column type (see The true story of BLOBs), type event alerts, arrays and triggers. So when Mr. Starkey talks we should listen.
Starkey started a new company with the express purpose of creating a database that will run in this above referenced cloud. What this essentially means is that the database and the underlying tables of data will reside in RAM! This is an old concept (remember loading programs into memory on the old IBM PC's to enhance performance?) retooled for the big itube in the sky. It makes perfect sense to go this route when one considers the amount of data accumulating on the internet in the last 5 years. Consider that in 2002, 532,897 terabytes of new data flowed across the Internet, in 2009 its at least double that.
Recently, Starkey's unique and excellent insight into this was revealed in a great article over on the highscalability.com website. The article discusses Cloud Based Memory Architectures where they envision the building out of architecture within the cloud, this treating it as a platform itself. Starkey's contribution to this is in his newest creation, NimbusDB, a relational database designed to reside in the cloud. Interestingly enough FireBird acquired NimbusDB in April 2009 and is planning to leverage this great idea to make FireBird database a leader in web developments.
So, what will all of this give us simple web programmers?
- Make a scalable relational database in the cloud where you can use normal everyday SQL to perform summary functions, define referential integrity, and all that other good stuff.
- Transactions scale using a distributed version of MVCC, which I do not believe has been done before. This is the key part of the plan and a lot depends on it working.
- The database is stored primarily in RAM which makes cloud level scaling of an RDBMS possible.
- The database will handle all the details of scaling in the cloud. To the developer it will look like just a very large highly available database.
I'm not sure if NimbusDB will support a compute grid and map-reduce type functionality. The low latency argument for data and code collocation is a good one, so I hope it integrates some sort of extension mechanism.
Why might NimbusDB be a good idea?
- Keeps simple things simple. Web scale databases like BigTable and SimpleDB make simple things difficult. They are full of quotas, limits, and restrictions because by their very nature they are just a key-value layer on top of a distributed file system. The database knows as little about the data as possible. If you want to build a sequence number for a comment system, for example, it takes complicated sharding logic to remove write contention. Developers are used to SQL and are comfortable working within the transaction model, so the transition to cloud computing would be that much easier. Now, to be fair, who knows if NimbusDB will be able to scale under high load either, but we need to make simple things simple again.
- Language independence. Notice the that IDMG products are all language specific. They support some combination of .Net/Java/C/C++. This is because they need low level object knowledge to transparently implement their magic. This isn't bad, but it does mean if you use Python, Erlang, Ruby, or any other unsupported language then you are out of luck. As many problems as SQL has, one of its great gifts is programmatic universal access.
- Separates data from code. Data is forever, code changes all the time. That's one of the common reasons for preferring a database instead of an objectbase. This also dovetails with the language independence issue. Any application can access data from any language and any platform from now and into the future. That's a good quality to have.
The smart money has been that cloud level scaling requires abandoning relational databases and distributed transactions. That's why we've seen an epidemic of key-value databases and eventually consistent semantics. It will be fascinating to see if Jim's combination of Cloud + Memory + MVCC can prove the insiders wrong.
The fact that FireBird exposes JDBC drivers bodes well for NimbusDB as this will give us ColdFusion hackers a new avenue for data development in the coming cloud architecture.
Technorati Tags:
Databases in the Cloud