How TokuDB Fractal Tree Databases Work

Average rating: ****.
(4.25, 4 ratings)

Insertion bottlenecks lie at the heart of database and file-system
innovations, best practices, and system workarounds. Most databases
and file systems are based on B-tree data structures, and suffer from
the performance cliffs and unpredictable run times of B-trees. In
this talk, we introduce the Fractal Tree data structure and explain
how it works and how it provides dramatically improved performance in
both theory and in practice. Although our company Tokutek is selling
a transaction-safe fractal-tree storage engine for MySQL, this talk is
primarily about the underlying technology.

From a theoretical perspective, if B is the block-transfer size, the
B-tree performs O(log_B N) block transfers per insert in the worst
case. In contrast, the Fractal Tree structure performs O((log_B N)/B)
memory transfers per insert, which translates to run-time improvements
of two orders of magnitude.

To relate that theory to practice, we present an algorithmic model for
B-tree performance bottlenecks. We explain how the bottlenecks affect
best practice and how database designers typically modify B-trees to
try to mitigate the bottlenecks. Then we show how Fractal Tree
structures can attain faster insertion rates, intuitively by
transforming disk-seek bottlenecks into disk-bandwidth bottlenecks

We conclude with performance results showing how a Fractal Tree
storage engine can maintain rich indexes more efficiently than
B-trees. Surprisingly, Fractal Tree structures seem to maintain their
order-of-magnitude competitive advantage over B-trees on SSDs as well
as traditional rotating media.

Bradley C. Kuszmaul

Tokutek

Dr. Kuszmaul’s research focuses on developing computer systems and
hardware that behave well both in practice and in theory. His entry
won 5 out of 6 categories in Jim Gray’s 2007 sorting benchmark
contest
, sorting a terabyte in 197
seconds. He formerly architected Akamai’s distributed data collection
system, was a Yale Professor of Computer Science and was a principal
network architect for the Thinking Machines Connection Machine
CM-5 Supercomputer.
He was one of the developers of the MIT
Cilk
multithreaded programming
system and wrote, in Cilk++, winning entries for 4 out of 12 of
problems in the 2009 Intel Threading Challenge. Dr. Kuszmaul a
founder and Chief Architect at Tokutek and is a Research Scientist in
the Massachusetts Institute of Technology Computer Science and
Artificial Intelligence Laboratory (MIT CSAIL).

Comments on this page are now closed.

Comments

Zardosht Kasheff
04/27/2010 10:23am PDT

Rick, fractal trees are designed to be as good as B-trees on queries and insertions. It’s intended for general usage and not to fit in a niche.

Picture of Rick James
Rick James
04/15/2010 11:52am PDT

Geeky. Very technical talk about technology behind a product. Good for understanding why the product fits in its niche.

  • Oracle
  • Monty Program
  • Calpont
  • Facebook
  • Gear6
  • Infobright, Inc
  • JasperSoft
  • Joyent
  • Kickfire
  • NorthScale, Inc.
  • Percona
  • Schooner Information Technology
  • Solid Quality Mentors (SolidQ)
  • Intel
  • Pentaho
  • Linux Pro Magazine

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at yromaine@oreilly.com

Download the O'Reilly MySQL Conference & Expo Sponsor/ Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

O'Reilly MySQL Conference Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the O'Reilly MySQL Conference newsletter (login required).

Contact Us

View a complete list of O'Reilly MySQL Conference contacts.