Column-oriented MySQL for VLDB

InfiniDB – Open Source BI/Analytic Database

Recently ran across some blog posts about InfiniDB, a MySQL based DW and BI analytic database from a company called Calpont. I *think* InfiniDB is their only product. I could be wrong about that, though.

There are plenty of things to like about InfiniDB – Multi-threaded and designed for multi-cpu/cores, ACID compliant, recoverable, supports SQL standards and online DDL, MVCC, dynamic data compression, and FREE! What attracted me first though was the open source implementation of the columnar storage. That’s the current biggie on VLDB, think vertica or Oracle’s ExaData.

If you read through the documentation you will see some very similar terms (to Oracle folks, I mean) like blocks, extents and segments. They are even conceptually the same but a segment in InfiniDB is a single column (which can be partitioned across multiple segments). In Oracle a segment will be a single table or index (which can also be partitioned).

InifniDB uses an Extent Map to eliminate the need for many of the structures needed in traditional row based databases. Here’s a blurb from the documentation that describes it.

The Extent Map provides the ability for InfiniDB to only retrieve the blocks needed to satisfy a query, but it also provides another benefit – that of logical range partitioning.

This is accomplished via the minimum and maximum values for each extent that are
contained within the Extent Map. Extent elimination is first accomplished in InfiniDB via the column-oriented architecture (only needed columns are scanned), but the process is accelerated because of this logical horizontal partitioning that is implemented in the Extent Map.

This automatic extent elimination behavior is well suited for series, ordered, patterned,or time-based data where data is loaded frequently and often referenced by time. Near real-time loads with queries against the leading edge of the data can easily show good extent elimination for all of the date/time columns as well as an ascending key value. Any column with clustered values is a good candidate for extent elimination.

That’s pretty cool.

You get a MySQL type interface for running commands. Since it’s based on MySQL the security is pretty much what you would expect if you are familiar with that database. Also, since it’s MySQL, you can have a single instance of the binaries and create/run multiple databases within that instance.

It will be interesting to see some performance benchmarks over time. With Amazon providing a hosted cloud version of MySQL and now this, it looks like MySQL and MySQL derivatives are well positioned for the future.

Take care,

LewisC

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.