Skip navigation
2011 / August / 02
Blog  

 
 
Home> Resources & Tools> Core Resources> Blog> LivePerson Event: Real Time Analytics Approaches at the BIG data era

 
Currently Being Moderated

Last week, LivePerson hosted an event about Real Time Analytics and Big data.

Our main speaker at the event was Lars Goerge - a solution architect @ Cloudera and a true guru.

Pizzas, Beers, 160 big data junkies and the Israeli summer made this event the hottest place to learn and to meet big data entrepreneurs, developers, CTOs, strategy analysts and more.

 

For those who have missed, or want to refresh on Lars's presnetation - we've uploaded the full presentation

 

 

Presentation slides: Realtime Analytics with Hadoop and HBase

 

Following Lars, Haggai Shachar - Director, Data Services presented on Real Time Analytics and Big Data - the blog post below summarizes his speech at the event.

 

Big data is here and it’s big. Real Time just makes it faster.

 

Many companies and technology providers are outlooking at the new possibilities that this tremendously growing industry is enabling. The new world requires agility, fast response to changes and ability to take educated yet automated decisions.

BigData.jpg

The world wants Real Time!

 

Over the past 6 years, I’ve been deep diving into the analytics and big data world, including - web analytics, advertising, business intelligence and machine learning algorithms. During this period I’ve been witnessing to different perspectives about real time. I’ve learned that when people talk about big data and real time, they usually refer to high freshness of data or ad hoc queries (or both).

 

High freshness

High freshness of data refers to the efforts of lowering the latency between the time the event occur till it’s available for reporting.

 

On the one hand, Facebook is publishing its architecture for the (super cool) real time insights product they launched few month ago. On the other hand Yahoo’s tech leader is complaining about the difficulties to develop what they call the “next-click” – effecting the experience of the visitor on the page right after the current click.

 

Seems like that even the big guys are struggling with the technology. Nati’s post nicely explains the difficulties and proposes an alternative approach.

 

My concern is different - at the end of the road, Facebook implementation is based on counters in HBase, aggregated metrics per (like) URL. This fairly simple approach is easy to implement but holds many compromise on the product itself – it’s fixed, it’s not drill-downable and it takes time to process.

 

What if you could have a real-time analytics solution running on top of raw data ?

Aggregations are for wussies!

Ad hoc queries

The world of data warehousing has gone through last year the most drastic changes over the past 30 years. While traditional databases (Oracle, Microsoft, MySQL) were all about scaling up a single server, the new technologies (Greenplum, Netezza,Asterdata, Vertica and others) are all about what I call - Linearness.

 

Are you linear ?

I believe that big data companies should drive themselves to be linear – linear in cost of the hardware, linear in performance of the queries and linear in accuracy of the response. Yes, accuracy. who cares if last month visits were 5483238 or 5483361 – sampling is in many cases the key for success.

Using this concept – Facebook could have developed it’s insights feature and allow cool drill downs and flexibility.

 

linearness.jpg

 

To achieve linearness a data warehouse must apply these 3 rules:

1. Shared Nothing - each node is independent and self-sufficient.

2. Massive Parallel processing (MPP) - many CPUs working in parallel to execute a single program.

3. Columnar orientation - stores content by column rather than by row.

 

These rules brings awsome opportunities in the big data world. Choose where you want to be - accurate or high performance, how much are you planning to spend on it ? it's all under you control.

 

Conclusion

Real time analytics is hot – advertising, personalization, stock trading, shift management and many other scenarios. Don’t wait for an invitation – hop in asap or step back.

 

Think Linearness!

1,947 Views Categories: Events & Announcements Tags: event, technology, data, hadoop, hbase


There are no comments on this post