Practical Distributed Processing Using MySQL Built-In Functionality

Bob Burgess (Radian6 Technologies)

A couple of years ago at our company, the idea came up to rank the most influential people in the social media space, per topic. Since the calculations for this are substantial and chew on a lot of data, I figured it should run in the database. “Oh yeah,” I said, “I’ll code that. How long could it take?” Famous last words.
It worked for a while running in the main database, but since then we have a lot more data and the “influencer” calculation has grown much more complex. Several months ago, due to the load this calculation placed on the main database, we moved it off to a dedicated set of servers. The work is done there and the results are sent back, and it’s been working quite well for months.

This system works because our application of it satisfies several requirements:

  • the calculation doesn’t require our “bulkiest” tables — the involved tables are long but not wide
  • in some cases, only a small subset of columns on wide tables is required
  • the calculation is relatively isolated from the main system
  • the resulting data is contained in individual MyISAM tables

This presentation will begin by reviewing several key concepts that are the foundation of this system, including:

  • replication
  • replication into blackhole tables with triggers
  • federated tables
  • scp (ssh file copy) using public-key authentication

After the concept review, there’s a high-level overview of our distributed system:

  • general architecture
  • queueing
  • worker coordination
  • result delivery

Finally, we’ll look at the code involved. A generalized (but functional) version of the work-distribution code will be available for download following the presentation. Even if this system isn’t an exact match to your intense-computation problem, I’m confident that seeing what we did will lead you to your own solution.

Photo of Bob Burgess

Bob Burgess

Radian6 Technologies

Bob has been squeezing every drop out of MySQL for a couple of years (and becoming the 14th Canadian to get certified), having previously spent several years roaming in the Garden of Oracle. In late 2007, he found himself suddenly (and without warning) responsible for storing and retrieving all the information amassed at Radian6, the social media monitoring company.

Comments on this page are now closed.

Comments

Picture of Bob Burgess
Bob Burgess
04/20/2010 7:27am PDT

Thanks to all who attended! I just uploaded the slides; they should be available here very soon. Remember—email me for the sample script that goes with this presentation. Also, feel free to drop me a line with questions or comments.

—Bob

  • Oracle
  • Monty Program
  • Calpont
  • Facebook
  • Gear6
  • Infobright, Inc
  • JasperSoft
  • Joyent
  • Kickfire
  • NorthScale, Inc.
  • Percona
  • Schooner Information Technology
  • Solid Quality Mentors (SolidQ)
  • Intel
  • Pentaho
  • Linux Pro Magazine

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at yromaine@oreilly.com

Download the O'Reilly MySQL Conference & Expo Sponsor/ Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

O'Reilly MySQL Conference Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the O'Reilly MySQL Conference newsletter (login required).

Contact Us

View a complete list of O'Reilly MySQL Conference contacts.