Redundant Storage Cluster: For When It's Just Too Big

Bob Burgess (Radian6 Technologies)

Our system has a number of large items to store, retrieve quickly, and eventually age out. When that number reached over a million a day (with two million just around the corner), we knew we weren’t “in Kansas anymore”. Ordinary tables just weren’t cutting it. Something had to be done.

Although there are horizontal scaling solutions out there, such as HiveDB and HScale, we need the speed of horizontal scaling plus with the security of redundant storage. Our storage cluster meets these needs.

The seeds for our cluster were planted at last year’s UC. Listening to presentations about horizontal scaling, I imagined a system like that to solve our problem. Everything was there except for data safety. Then I started investigating the Google File System and Bigtables and it all fell into place: directory-based storage plus redundancy and automatic rebalancing!

The goal of my presentation is to completely describe the design concepts and details of our cluster and to share performance measurements and details of how it fits in our database. Hopefully, even if our storage cluster doesn’t exactly meet someone’s requirements, it will inspire them to design something similar. Being written in Lua under MySQL-Proxy, the presentation will also illustrate Lua programming details under Proxy. Also, the basics of federated tables and shell scripting (pertaining to MySQL) will be covered briefly.

Operational details of our Redundant Storage Cluster

The cluster consists of at least three storage cluster members and two load balancers (primary and backup). Based on MySQL-Proxy, it looks like a single MySQL database to a MySQL client. In fact, it’s often accessed through a federated table, so the rest of the database thinks it’s “just another table” in the same database. For our particular application, it stores a single piece of information (datatype “text”) keyed on a 64-bit integer.

The cluster offers three major advantages over a single database table: data redundancy, load-balanced reads, and information-lifecycle management (data age-out).

  • Data redundancy

A redundancy factor can be chosen, which represents how many cluster members can die without data loss. A factor of 2 (our choice, for example) will allow for the loss of one cluster member. Data items coming in get stored on, in our case, two different machines: the machine chosen by the load balancer and one other, chosen by having the most free space. The location of each copy is written to a directory, which is kept the same on all cluster members. If one machine goes down, any one data item can be found on another member. The more cluster members, the lower proportion of total space is dedicated to redundant storage. Another feature of the cluster (which I think is really cool) is that when (not if) a cluster member bites the dust, the cluster will reapply the redundancy factor to the data stored on the remaining members, copying data items to the surviving members to restore redundancy (and the DBA’s peace of mind). Alternatively, if a new cluster member is added, a background process will move a proportion of the data to the new member. Changes to the redundancy factor will trigger similar data redistribution. Also, individual data tables can be automatically taken offline for unattended optimization (required since, at least in our system, the data is stored in MyISAM tables).

  • Load-Balanced Reads

All requests (read or write) arrive at a random cluster member through the load balancer. If, according to the directory, the item is not on “this” member, it is requested from wherever it’s stored and passed back. No single cluster member has to handle all read requests.

  • Data Age-Out

All stored data is accompanied by its key (a 64-bit integer) and a partition key. Data items are horizontally partitioned within each cluster member: there is one data table per partition key. Whenever desired (in our case, at the end of every week), the oldest-partitioned table is archived and the item directory is updated to reflect the recently departed.

This is just a brief description of our system. The presentation will include a more detailed description, illustrated with code examples and explanations of design decisions. My goal is not only to share what we’ve done with Proxy and Lua, but also to inspire someone to take this idea and develop it further. Maybe they’ll speak on it next year!

Photo of Bob Burgess

Bob Burgess

Radian6 Technologies

Bob has been squeezing every drop out of MySQL for a little over a year, having previously spent several years roaming in the Garden of Oracle. Last year, he found himself suddenly (and without warning) responsible for storing and retrieving all the information amassed at Radian6, the social media monitoring company.

Comments on this page are now closed.

Comments

Picture of Bob Burgess
04/22/2009 9:07pm PDT

Thanks to everyone who attended. The slides will be available here (today or tomorrow) and also at www.radian6.com/mysql in a few days. I’ll eventually have the full Lua scripts available there also.

Co-presented By:

O'Reilly Media MySQL/Sun Microsystems
  • Kickfire
  • Virident
  • Infobright, Inc
  • JasperSoft
  • Intel
  • Advanced Micro Devices
  • BIRT Exchange by Actuate
  • Calpont
  • Canonical
  • Continuent
  • Dolphin Interconnect Solutions
  • Facebook
  • HiT Software, Inc.
  • IBM
  • iDashboards
  • Oracle
  • Pentaho
  • R1Soft
  • Schooner Information Technology
  • SQLstream
  • Ticketmaster
  • Zmanda, Inc.
  • Linux Journal

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com

Download the MySQL Sponsor/Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

MySQL Conference Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the MySQL Conference newsletter.

Contact Us

View a complete list of MySQL contacts.