Parallel MySQL Import and Export to Hadoop

Aaron Kimball (Cloudera, Inc.)
Average rating: ***..
(3.67, 3 ratings)

Hadoop is a powerful data analysis platform, especially for working with unstructured or semi-structured data. Much of the time, Hadoop MapReduce analyses are enhanced by including regularly-structured data as is often found in a MySQL database. Hadoop-based analysis often results in regularly-structured output, which must then be integrated with existing datasets or other online systems.

This talk introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from MySQL and other databases to Hadoop’s distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to MySQL and other systems for use with other data pipelines.

After this session, users will understand how MySQL and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We’ll also cover some deeper technical details of Sqoop’s architecture, and how it uses MySQL-specific tools to achieve high throughput.

Photo of Aaron Kimball

Aaron Kimball

Cloudera, Inc.

Aaron Kimball is a software engineer at Cloudera, Inc., the Commercial Hadoop company. Aaron is the principle developer of Sqoop, the SQL-to-Hadoop database import/export tool. Aaron has been working with Hadoop since early 2007, and contributes actively to its development. Through Cloudera, he additionally provides training to developers and system administrators working with Hadoop. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.

Comments on this page are now closed.

Comments

Suhas Mallya
04/13/2010 3:14pm PDT

Thanks, Mark.

Picture of Mark J. Levitt
Mark J. Levitt
04/13/2010 3:04pm PDT

Suhas, the hotel is looking into this!

Picture of Mark J. Levitt
Mark J. Levitt
04/13/2010 2:51pm PDT

Thanks for pointing this out, Suhas! I’ll relay this to the onsite Conference Ops group.

Suhas Mallya
04/13/2010 12:07pm PDT

The partition between Ballrooms E & F don’t seem to have been put in place properly with the result that voices and sounds from Ballroom F are carrying over into this room. Would be good if it could be fixed pronto…

  • Oracle
  • Monty Program
  • Calpont
  • Facebook
  • Gear6
  • Infobright, Inc
  • JasperSoft
  • Joyent
  • Kickfire
  • NorthScale, Inc.
  • Percona
  • Schooner Information Technology
  • Solid Quality Mentors (SolidQ)
  • Intel
  • Pentaho
  • Linux Pro Magazine

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at yromaine@oreilly.com

Download the O'Reilly MySQL Conference & Expo Sponsor/ Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

O'Reilly MySQL Conference Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the O'Reilly MySQL Conference newsletter (login required).

Contact Us

View a complete list of O'Reilly MySQL Conference contacts.