Brian Aker on post-Oracle MySQL

A deep look at Oracle's motivations and MySQL's future

Brian Aker parted ways with the mainstream MySQL release, and with Sun Microsystems, when Sun was acquired by Oracle. These days, Aker is working on Drizzle, one of several MySQL offshoot projects. In time for next week’s MySQL Conference & Expo, Aker discussed a number of topics with us, including Oracle’s motivations for buying Sun and the rise of NoSQL.

The key to the Sun acquisition? Hardware:

MySQL Conference and ExpoBrian Aker: I have my opinions, and they’re based on what I see happening in the market. IBM has been moving their P Series systems into datacenter after datacenter, replacing Sun-based hardware. I believe that Oracle saw this and asked themselves “What is the next thing that IBM is going to do?” That’s easy. IBM is going to start pushing DB2 and the rest of their software stack into those environments. Now whether or not they’ll be successful, I don’t know. I suspect once Oracle reflected on their own need for hardware to scale up on, they saw a need to dive into the hardware business. I’m betting that they looked at Apple’s margins on hardware, and saw potential in doing the same with Sun’s hardware business. I’m sure everything else Sun owned looked nice and scrumptious, but Oracle bought Sun for the hardware.

The relationship between Oracle and the MySQL Community:

BA: I think Oracle is still figuring things out as far as what they’ve acquired and who they’ve got. All of the interfacing I’ve done with them so far has been pretty friendly. In the world of Drizzle, we still make use of the Innodb plugin, though we are transitioning to the embedded version. Everything there has gone just along swimmingly well. In the MySQL ecosystem you have MariaDB and the other distributions. They’re doing the same things that Ubuntu did for Debian, which is that they’re taking something that’s there and creating a different sort of product around it. Essentially though, it’s still exactly the same product. I think some patches are flowing from MariaDB back into MySQL, or at least I’ve seen some notice of that. So for the moment it looks like everything’s as friendly as it is going to be.

Is NoSQL a fad or the next big thing?

BA: There are the folks who say “just go use gdbm or Berkeley DB.” What they don’t fundamentally understand is that when you get into a certain data size, you’re just going to be dealing with multiple computers. You can’t scale up infinitely. Those answers come from an immaturity of understanding that when you get to a certain data size, everything’s not going to fit on a single computer. When everything doesn’t fit onto a computer, you have to be able to migrate data to multiple nodes. You need some sort of scaling solution there.

With Cassandra, and similar solutions, the only issues that come up is when they don’t fit the data’s usage pattern. Like for instance with data analytics. There is also still the “I need these predicates across a relational entity.” That’s the part where the value key systems obviously fail. They have no knowledge of a relationship between two given items. So what happens then? Well, you can end up doing MapReduce. That’s great if you’ve got an awful lot of computers and you don’t really care about when the answer is going to be found. MapReduce works as a solution when your queries are operating over a lot of data; Google sizes of data. Few companies have Google-sized datasets though. The average sites you see, they’re 10-20 gigs of data. Moving to a MapReduce solution for 20 gigs of data, or even for a terabyte or two of data, makes no sense. Using MapReduce with NoSQL solutions for small sites? This happens because people don’t understand how to pick the right tools.

MySQL and location data:

BA: SQL goes very well with temporal data. SQL does very well with range data. I would say that SQL works very poorly today with location-based data. Is it the best thing out there, though? Probably. I’m still waiting for someone to really spend some time thinking about the location data problem, and come up with a true location store. I don’t believe that SQL databases are the solution for tomorrow’s location-based data. Location services are going to require something a lot better then what we have today. Because all we have today is a set of cobbled together hacks.

MySQL’s future:

BA: There hasn’t been a roadmap for MySQL for some time. Even before Sun acquired MySQL, it was languishing, and Sun’s handling of MySQL just further eroded the canonical MySQL tree. I’m waiting to see what Oracle announces at the MySQL Conference. I expect Oracle to scrap the current 5.5 plan and come up with a viable roadmap. It won’t be very innovative, but I am betting it will be a stable plan that users can look at.

I see a lot of excitement about multiple versions of MySQL. I’m hoping to see this push innovation as the different distributions differentiate themselves. I believe that the different MySQL distributions will all become forks eventually.

In the Drizzle world, the excitement is in the different sorts of plugins that have been written, and the opportunity for more. There has been a bunch of work around the replication system, and how it integrates with other systems. We have plugins now that allow Drizzle to replicate into things like RapidMQ, Cassandra, Gearman, Voldemort, Memcached and other database systems. Having a replication system that was designed from day one to be pluggable is a game changer for some enterprises. Drizzle’s future? Everything is open source, and we will see where the community wants to take it.

I would like to see more focus on data bus architectures, i.e. geographical replication. In the past, replication was a lot about how to scale out. That’s dead and gone. Anybody who’s doing scale-out with replication is creating a future headache for themselves. What I’d like to actually see is more attention to how we pass data between datacenters. I would also like to see more work done on shared-nothing storage systems. There’s been a few attempts at that with MySQL, but thus far, the attempts have been failures. The reasons for this? Poor code quality and difficulty of use. I believe we’ll see new shared-nothing solutions coming out that will work better then anything that’s been written so far.

Related:

Related

Sign up for the O'Reilly Programming Newsletter to get weekly insight from industry insiders.
topic: Programming
  • Jay Pipes

    Quick correction. It’s RabbitMQ, not RapidMQ. :)

    Cheers,

    Jay