We will discuss the technique of data smoothing in MySQL using group averaging and possibly running/moving averages. Data smoothing is the act of translating high-resolution data into lower resolution data for lower frequency trend analysis over a long period, and more importantly, to reduce unneeded resolution for data archival storage. Topics coverered in this session will include business needs and use cases, techniques for different normalization needs and different data sets, and a benchmark comparison between scripted/programmed solutions verses database-side, SQL query-side solution.
Any application that performs frequent polling for data and archives this data for long (or infinite duration) has run into the situation of ‘too much data’. Tables become huge and unsearchable, performance goes down, and over time – the resolution of this data becomes worthless compared to the cost to store it.
We will discuss the simple ‘gauge data’ smoothing process, the more complicated ‘counter data’ smoothing process, and the ‘waterfall smoothing’ technique for multiple levels of archival and smoothing. We will also discuss optimal indexing for efficient grouping. If time permits, we will show examples of stored procedures for the very tough methods, and also look at solving moving averages in SQL.
Many times developers fall back on programming and scripting languages such as php, perl, or python to solve this problem, as it can be hard to do strictly in SQL, but here we show some comparative run times that show magnitudes of performance gain.
If time permits, we will also go into using partitioning for maintenance and quick deletion of archived data, and using stored procedures for more complex smoothing applications.
Michael “Flip” McFadden earned his degree in Computer Science from Appalachian State University in 2000 and was hired on by the university to support the NASA Nearby Stars Survey as a Research Assistant and Data Reduction Specialist. He went on to earn his Linux Professionals Institute Level 1 certification in April 2004, and earned his MySQL 5.0 Certified Developer in May of 2008. Michael currently works for ScienceLogic as a Sr. Software Architect and can occasionally be found trolling on the #mysql IRC channel as ‘flipmcf’.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com
Download the MySQL Sponsor/Exhibitor Prospectus
Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the MySQL Conference newsletter.
View a complete list of MySQL contacts.
Comments
Wow !! what a session