Automated, Non-Stop MySQL Operations and Failover

Tags: dba, scale_out
Average rating: ***..
(3.57, 7 ratings)
There are failure scenarios that might cause unscheduled downtime, and some of them are really difficult to solve quickly. One typical example is master crash. Suppose you run single master and multiple slaves. If the master crashes, you need to pick one of the slaves, promote it to the new master, and let other slaves start replication from the new master. This is not trivial. Even though you could identify the latest slave, other slaves might have not received all relay logs, which will cause consistency problems. How do you make them in sync? Is it ok for you to run with data inconsistency risks? In this session, I will talk about how to automate master failover. I’ll explain how to script an automated master failover program that do the followings.
  • Picking the new master (the lastest one)
  • Making all slaves consistent(to the new master) within short time
  • Restarting replication

This will be helpful for many people who do not want to spend much money for standby servers (i.e. DRBD) but also want to achieve short-time failover.

Another interesting failure scenario is a slave crash. Since updating master.info, relay-log.info and InnoDB log files is not atomic each other, data consistency might be broken when a slave crashes. In other words, replication thread might stop on restart due to “duplicate key error”. Do you recover slaves by restoring a full backup file? This is not fun. MySQL development team and communities have worked a lot for this area. I’ll talk an approach to check consistencies between relay logs and InnoDB log file, and recovering consistency without full data restore.

Non-stop scheduled maintenance is also an interesting topic. Maintenance tasks such as online schema changes, sharding, aggregating shards, migration are not trivial, but third-party tools such as oak-online-alter-table/OSC, Spider, etc have made things much easier. I’ll talk about how to achieve non-stop sharding operations.

Photo of Yoshinori Matsunobu

Yoshinori Matsunobu

DeNA

Yoshinori Matsunobu is a database and infrastructure architect at DeNA, living in Tokyo. Yoshinori’s primary responsibility at DeNA is to make our database infrastructure more reliable, faster and more scalable. Before joining DeNA, Yoshinori worked at MySQL/Sun/Oracle as a lead consultant in APAC for four years. Yoshinori has written eight MySQL related technical books so far and has published technical articles about MySQL, Linux, and Java for a monthly database magazine since 2004.

Comments on this page are now closed.

Comments

John Schulz
04/12/2011 6:37pm PDT

Good description of what is involved in failover but I was expecting to hear how it was automated. Either It was explained while I wasn’t looking or the speaker forgot to cover that part.

  • EnterpriseDB
  • Amazon Web Services
  • Clustrix
  • Continuent
  • Facebook
  • HTI Consultoria e Tecnologia
  • Monty Program
  • Percona
  • Rackspace Hosting
  • Schooner Information Technology
  • SkySQL
  • Xeround

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at yromaine@oreilly.com

Media Partners Opportunities

For media partnerships, contact mediapartners@ oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

O'Reilly MySQL Conference Bulletin

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the O'Reilly MySQL Conference Bulletin (login required).

Contact Us

View a complete list of O'Reilly MySQL Conference Contacts