How TokuDB Fractal Tree Databases Work

Average rating: ****.
(4.25, 4 ratings)

Insertion bottlenecks lie at the heart of database and file-system innovations, best practices, and system workarounds. Most databases and file systems are based on B-tree data structures, and suffer from the performance cliffs and unpredictable run times of B-trees. In this talk, we introduce the Fractal Tree data structure and explain how it works and how it provides dramatically improved performance in both theory and in practice. Although our company Tokutek is selling a transaction-safe fractal-tree storage engine for MySQL, this talk is primarily about the underlying technology.

From a theoretical perspective, if B is the block-transfer size, the B-tree performs O(log_B N) block transfers per insert in the worst case. In contrast, the Fractal Tree structure performs O((log_B N)/B) memory transfers per insert, which translates to run-time improvements of two orders of magnitude.

To relate that theory to practice, we present an algorithmic model for B-tree performance bottlenecks. We explain how the bottlenecks affect best practice and how database designers typically modify B-trees to try to mitigate the bottlenecks. Then we show how Fractal Tree structures can attain faster insertion rates, intuitively by transforming disk-seek bottlenecks into disk-bandwidth bottlenecks

We conclude with performance results showing how a Fractal Tree storage engine can maintain rich indexes more efficiently than B-trees. Surprisingly, Fractal Tree structures seem to maintain their order-of-magnitude competitive advantage over B-trees on SSDs as well as traditional rotating media.

Bradley C. Kuszmaul

Tokutek

Dr. Kuszmaul’s research focuses on developing computer systems and hardware that behave well both in practice and in theory. His entry won 5 out of 6 categories in Jim Gray’s 2007 sorting benchmark contest, sorting a terabyte in 197 seconds. He formerly architected Akamai’s distributed data collection system, was a Yale Professor of Computer Science and was a principal network architect for the Thinking Machines Connection Machine CM-5 Supercomputer. He was one of the developers of the MIT Cilk multithreaded programming system and wrote, in Cilk++, winning entries for 4 out of 12 of problems in the 2009 Intel Threading Challenge. Dr. Kuszmaul a founder and Chief Architect at Tokutek and is a Research Scientist in the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (MIT CSAIL).

Comments on this page are now closed.

Comments

Zardosht Kasheff
04/27/2010 10:23am PDT

Rick, fractal trees are designed to be as good as B-trees on queries and insertions. It’s intended for general usage and not to fit in a niche.

Picture of Rick James
Rick James
04/15/2010 11:52am PDT

Geeky. Very technical talk about technology behind a product. Good for understanding why the product fits in its niche.

  • Oracle
  • Monty Program
  • Calpont
  • Facebook
  • Gear6
  • Infobright, Inc
  • JasperSoft
  • Joyent
  • Kickfire
  • NorthScale, Inc.
  • Percona
  • Schooner Information Technology
  • Solid Quality Mentors (SolidQ)
  • Intel
  • Pentaho
  • Linux Pro Magazine

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at yromaine@oreilly.com

Download the O'Reilly MySQL Conference & Expo Sponsor/ Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

O'Reilly MySQL Conference Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the O'Reilly MySQL Conference newsletter (login required).

Contact Us

View a complete list of O'Reilly MySQL Conference contacts.