"relational database" entries

Squaring big data with database queries

Integrating open source tools into a data warehouse has its advantages.

alone_realworkhard_pixabay

Although next-gen big data tools such as Hadoop, Spark, and MongoDB are finding more and more uses, most organizations need to maintain data in traditional relational stores as well. Deriving the benefits of both key/value stores and relational databases takes a lot of juggling. Three basic strategies are currently in use.

  • Double up on your data storage. Log everything in your fast key/value repository and duplicate part of it (or perform some reductions and store the results) in your relational data warehouse.
  • Store data primarily in a relational data warehouse, and use extract, transform, and load (ETL) tools to make it available for analytics. These tools run a fine-toothed comb through data to perform string manipulation, remove outlier values, etc. and produce a data set in the format required by data processing tools.
  • Put each type of data into the repository best suited to it––relational, Hadoop, etc.––but run queries between the repositories and return results from one repository to another for post-processing.

The appeal of the first is a large-scale simplicity, in that it uses well-understood systems in parallel. The second brings the familiarity of relational databases for business users to access. This article focuses on the third solution, which has advantages over the others: it avoids the redundancy of the first solution and is much easier to design and maintain than the second. I’ll describe how it is accomplished by Teradata, through its appliances and cloud solutions, but the building blocks are standard, open source tools such as Hive and HCatalog, so this strategy can be implemented by anyone. Read more…

Google’s Spanner is all about time

Did Google just prove the industry wrong? Early thoughts on the Spanner database.

In case you missed it, Google Research published another one of “those” significant research papers — a paper like the BigTable paper from 2006 that had ramifications for the entire industry (that paper was one of the opening volleys in the NoSQL movement).  

Google’s new paper is about a distributed relational database called Spanner that was a follow up to a presentation from earlier in the year about a new database for AdWords called F1.   If you recall, that presentation revealed Google’s migration of AdWords from MySQL to a new database that supported SQL and hierarchical schemas — two ideas that buck the trend from relational databases.

Meet Spanner

This new database, Spanner, is a database unlike anything we’ve seen.   It’s a database that embraces ACID, SQL, and transactions, that can be distributed across thousands of nodes spanning multiple data centers across multiple regions.  The paper dwells on two main features that define this database:

  • Schematized Semi-relational Tables — A hierarchical approach to grouping tables that allows Spanner to co-locate related data into directories that can be easily stored, replicated, locked, and managed on what Google calls spanservers.    They have a modified SQL syntax that allows for the data to be interleaved, and the paper mentions some changes to support columns encoded with Protobufs.
  • “Reification of Clock Uncertainty” — This is the real emphasis of the paper.    The missing link in relational database scalability was a strong emphasis on coordination backed by a serious attempt to minimize time uncertainty.  In Google’s new global-scale database, the variable that matters is epsilon — time uncertainty.   Google has achieved very low overhead (14ms introduced by Spanner in this paper for datacenters at 1ms network distance) for read-write (RW) transactions that span U.S. East Coast and U.S. West Coast (data centers separated by around 2ms of network time) by creating a system that facilitates distributed transactions bound only by network distance (measured in milliseconds) and time uncertainty (epsilon).

Read more…

Oracle’s NoSQL

Oracle's NoSQL Database is more than a product. It's also an acknowledgement.

Oracle's announcement of a NoSQL product isn't just a validation of key-value stores, but of the entire discussion of database architecture.

Oracle's NoSQL

Oracle's NoSQL Database is more than a product. It's also an acknowledgement.

Oracle's announcement of a NoSQL product isn't just a validation of key-value stores, but of the entire discussion of database architecture.

Percona's mini-conferences target the evolution of MySQL

Percona's goal is to bring MySQL expertise out of the Silicon Valley and build community around MySQL in many locations.

Wrap-up of 2011 MySQL Conference

Key themes from MySQL 2011. Plus, what you sacrifice when you use a NoSQL solution.

Two dominant themes emerged at MySQL 2011: Mix your relational database with less formal solutions and move to the cloud. This may actually be the best environment MySQL has ever enjoyed.