Sketching techniques for real-time big data

Preview of an upcoming session at Strata Santa Clara

By Bahman Bahmani

2bahman

Bahman Bahmani

In many modern web and big data applications the data arrives in a streaming fashion and needs to be processed on the fly. In these applications, the data is usually too large to fit in main memory, and the computations need to be done incrementally upon arrival of new pieces of data. Sketching techniques allow these applications to be realized with high levels of efficiency in memory, computation, and network communications.

In the algorithms research community, sketching techniques first appeared in the literature in 1980s, e.g., in the seminal work of Philippe Flajolet and G. Nigel Martin, then caught attentions in late 1990s, partially inspired by the award-winning work of Noga Alon, Yossi Matias, and Mario Szegedy, and were/are on fire in 2000’s/2010’s, when sketches got successfully designed not only for fundamental problems such as heavy hitters, but also for matrix computations, network algorithms, and machine learning. These techniques are now at an inflection point in the course of their history, due to the following factors:

1. Untapped potential: Being so new, their huge practical potential has been yet barely tapped into.

2. Breadth and maturity: They are now both broad and mature enough to start to be widely used across a variety of big data applications, and even act as basic building blocks for new highly efficient big data management systems.

3. Huge data volumes and velocities: With the phenomenal growth in data volumes and velocities beyond our computing powers, the efficiencies that can be gained by using sketching techniques for data analytics are now becoming more and more of a necessity.

4. Cloud computing: With the mass adoption of cloud computing and its associated cost model, the high levels of computing efficiencies afforded by sketching techniques can now directly lead to major reductions in both small and large businesses’ costs of data infrastructure.

5. Mobile devices and sensors: With Internet of Things becoming a reality and mobile devices gaining very high penetration rates, sketching techniques and their associated efficiencies in the amount of computation, communication, and power usage, can now make the difference between feasibility and infeasibility of a wide range of intelligent sensor network applications.

Hence, these are really exciting times for technologists, entrepreneurs, and business leaders alike to learn about the power and potential of sketching techniques, and how they can benefit from adoption of these techniques. In my talk at Strata, I will touch on the above points and demystify these techniques by going through some concrete applications from areas such as security and social media analytics. I will show how sketching techniques provide an alternate way of thinking about and doing big data analysis that is significantly more nimble than the conventional bulky ways many of us are used to.

I hope to see many of you at my talk on February 27!

tags: , , , , ,