The world is experiencing an Industrial Revolution of Data. In any given minute the machines around us are tracking billions of mouse clicks, credit card swipes, and GPS coordinates. And increasingly this data is being saved, aggregated, and analyzed. These massive data flows present big challenges to firms, but also new opportunities for deriving insights.
A new class of professionals, called data scientists, have emerged to address the Big Data revolution. In this talk, I first discuss three core skills to their workflow: munging, modeling, and visualization. Then I present a case study of using these skills: the analysis of billions of call records to address customer churn at a North American telecom.
Munging is the process of transforming large data sets into a form suitable for analysis; this is often the most labor-intensive of the three steps. Modeling refers to the application of statistical learning to identify patterns or make predictions using features of the data. Data visualization is how these models are presented to human eyes.
The case study begins with a data set of several billion call records spread across millions of customers. This data was first munged to describe frequent calling networks. We next modeled how events propagated within these and found: customers with a cancellation event in their network were 700% more likely to terminate service than at baseline. Finally, we visualized this analysis by showing how cancellations spread in one metropolitan call network.
Michael Driscoll has a decade of experience developing large-scale databases and predictive algorithms for digital media, financial, and life sciences firms. He is the CEO and co-founder at Metamarkets, and Chairman of Dataspora LLC, a big data & analytics consultancy he founded in 2007. Previously, he founded the online retailer, CustomInk.com, and worked as a software engineer for the Human Genome Project. Michael holds a Ph.D. in Bioinformatics from Boston University and an A.B. from Harvard College.