Financial Informatics: Startup Low-cost Dataload Challenges and Solutions

Sean Kelly (Stumbleupon.com)

One of the biggest challenges financial data start-ups face is creating and updating a well optimized, clean, usable, historical record of a market. Often, they spend resources to find themselves wrestling with data-quality and historical adjustment discrepancies.

Building an understanding of how the universe of financial information relates to a normalized model is critical to success. The words market, exchange, and security, all have column-level implications for the ease of design, deliverability to web-applications and performance under millions of operations in batch.

There are several challenges in the prototype phase of development that must be planned for prior to building reliable models or data that matches large financial sites.

  • Multiple Dependency Task Scheduling
  • Tracking variances in data quality
  • Defining and Adjusting for corporate actions
  • Differences between daily and historical data
  • Determining Which Data Feeds To Get

Often development is focused on loading an entire market (NASDAQ) or an entire region (US Exchanges). This can cost valuable prototyping hours for new concerns. Regions usually define the currency used to trade a security. Exchanges are the places (virtual or real) where the buyer and seller meet. And in those exchanges, securities are bought and sold. We’ll talk about:

  • Data Sources
  • How specific domains of data match your audience
  • How to break up data into logical pieces for scalability
  • An order of operations for adding in high-quality market data

Loading clean data, trapping for errors, and dealing with data vendors are all challenging aspects for standing-up your first application. Errors that are introduced are usually very difficult to undo in a mysql batch process. We’ll provide a prototype design of a stock-market machine. We’ll provide prototype code for:

  • Building a market data loader
  • Building a security lookup and history table
  • Building a historical corporate actions machine
  • Methods for backing up and restoring your data without having to reload millions of rows on commodity hardware.
  • InnoDB vs MyISAM in the context of financial data in transactions

Sean Kelly

Stumbleupon.com

Sean Kelly has hand his hands in the guts of many a start-up in SF and as far back as Ft Lauderdale for Galacticomm prior to the widespread adoption of the Internet.

He was the IT Director and an Oracle DBA for StarMine a San Francisco financial data analytics company recently acquired by Reuters. He also acted as MySQL DBA for financial start-up Cake Financial.

He lives and works in San Francisco for a tiny subsidiary of eBay, riding large piles of MySQL data for stumbleupon.com.

Sponsors
  • Kickfire
  • Zmanda, Inc.
  • Continuent
  • EDS
  • JasperSoft
  • Sun Microsystems
  • Symantec Anti-Virus Software
  • XAware
  • Data Direct Technologies
  • Dolphin Interconnect Solutions
  • Hewlett Packard
  • Infobright, Inc
  • Linagora
  • Microsoft
  • OpSource
  • Oracle
  • Pentaho
  • R1Soft
  • Red Hat
  • Ticketmaster
  • TechRepublic

Contact Us

View a complete list of MySQL contacts.