Thursday, February 6, 2014

Reactive Real-time Big Data with Open Source Lambda Architecture Stack

5 Why

1) Why "Reactive" ? http://www.reactivemanifesto.org/
  • react to events
  • react to load
  • react to failure
  • react to users
2) Why "Real-time"? http://en.wikipedia.org/wiki/Real-time
3) Why "Big Data" ? http://www.bigdata-startups.com/best-practices/
4) Why "Open Source" ? 
Security, Quality, Customizability, Freedom, Flexibility, Interoperability, Auditability, Support Options, Cost, Try Before You Buy 
http://www.pcworld.com/article/209891/10_reasons_open_source_is_good_for_business.html
http://www.redhat.com/about/whoisredhat/opensource.html

5) Why Lambda Architecture ?
http://www.slideshare.net/tantrieuf31/lambda-architecture-for-real-time-big-data





The list of open source framework/tools I have tried:

● Netty (http://netty.io/) a framework using reactive programming pattern for scaling HTTP system easier, by JBoss http://www.jboss.org
https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead

● Apache Kafka (http://kafka.apache.org/) a publish-subscribe messaging rethought as a distributed commit log, open sourced by Linkedin. 
http://www.slideshare.net/amywtang/building-a-realtime-data-pipeline-apache-kafka-at-linked-in

● Storm (http://storm-project.net/) the framework for distributed realtime computation system, by Twitter
http://www.quora.com/Apache-Storm/What-are-some-of-the-use-cases-of-Apache-Storm

● Akka http://akka.io/ (Actor Model), a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. 
More use cases at http://doc.akka.io/docs/akka/2.2.3/intro/use-cases.html

● Redis (http://redis.io/) a advanced key-value in-memory NoSQL database, all fast statistical computations in here.
http://openmymind.net/redis.pdf
http://www.manning.com/carlson/

● OrientDB, an Open Source NoSQL DBMS with the features of both Document and Graph DBMSs for KPI Report Data Management http://pettergraff.blogspot.it/2014/01/getting-started-with-orientdb.html

● Groovy http://groovy.codehaus.org/ and Grails http://grails.org/ for scripting layer on JVM, ad-hoc query on Redis, and the front-end

● Hadoop ecosystem http://hadoop.apache.org/ : HDFS, Hive, HBase for batch processing

● RxJava https://github.com/Netflix/RxJava a library for composing asynchronous and event-based programs
https://www.coursera.org/course/reactive

● Hystrix https://github.com/Netflix/Hystrix : for Latency and Fault Tolerance for Distributed Systems
http://techblog.netflix.com/2012/11/hystrix.html

● NVD3 Reusable D3 Chart http://nvd3.org  http://d3js.org/
http://techslides.com/over-1000-d3-js-examples-and-demos/
https://github.com/anvaka/VivaGraphJS