Saturday, November 30, 2013
From FPT Technology Roadmap to setup a data science team
Here the roadmap and the core value:
Core skills for a data engineering and science team
- Frontend Javascript Developer (AngularJS)
- deep expertise in the latest web technologies.
- Frontend Dev Buzzword Compliance: HTML5, CSS3/SASS, Bootstrap, AngularJS …
- strong experience with AngularJS and the surrounding toolchain (Yeoman, Grunt, etc.).
- a focus on simplicity and great UX.
- high interest in data visualization with d3.js.
- the desire to quickly learn and adapt the latest evolutions of frontend web technology.
- no fear to work with node.js/PostgreSQL/redis based backends.
- Frontend Web/Mobile Developer (HTML5 / Android / iOS )
- Backend System Software Engineer
- Big Data Buzzword Compliance: MapReduce, Hadoop, Hive, Solr/Lucene …
- processes and analyzes massive amounts of data at minimum computing time.
- is working at the cutting edge of Big Data and real-time technologies.
- A background in online advertising technology (adserving, RTB).
- Data Scientist / Statisticians
- an academic degree in a quantitative field, e.g. Mathematics, Statistics, Computer Science, Physics, etc.
- the desire to quickly learn and adapt new technologies.
- an analytical mind with a hands-on attitude.
- substantial knowledge in at least one major programming language (preferably Python and/or C++, Java).
- a solid understanding of statistics and machine learning techniques.
Friday, November 29, 2013
Streaming Native Advertising
Actor (agent programming ) can be used to processing large big data in stream (manually or automatically) in real-time
Native advertising is a web advertising method in which the advertiser attempts to gain attention by providing content in the context of the user's experience. (http://en.wikipedia.org/wiki/Native_advertising)
Tuesday, November 26, 2013
Simple but readable book for beginner in Qualitative Data Analysis
Qualitative Data Analysis shows that learning how to analyse qualitative data by computer can be fun. Written in a stimulating style, with examples drawn mainly from every day life and contemporary humour, it should appeal to a wide audience.
http://www.drapuig.info/files/Qualitative_data_analysis.pdf
Analytics for small business
Đây là bài blog ngắn, mô tả về các tiềm năng & ứng dụng lĩnh vực phân tích dữ liệu dạng stream (từ vài KB logs đến vài TB logs ) cho small business ở Vietnam.
Sau khi làm slide và present ở Barcamp Saigon, mình có nhận vài feedback quan tâm (hỏi làm quen có, offer công việc fulltime cũng có, dự án freelance cũng có,…). http://nguyentantrieu.info/blog/data-analytics-for-mobile-app-developement/
Điều này cho thấy những tiềm năng: về mặt ứng dụng, nhận thức về lợi ích thật của việc biến những data logs vô dụng thành 1 sản phẩm có ích. Nó có ích trên nhiều điểm:
- thấy được xu hướng và feedback xung quanh sản phẩm bạn bán cho khách hàng ( 1 sự kết hợp giữa team technical , business và operation )
- xác định được khách hàng tiềm năng (có khả năng bán được hàng cao ROI - Return on Investment) (CRM 1 cách thông minh)
- liên kết các dữ liệu từ nhiều nguồn, => đưa ra quyết định xác với thực tế hơn, khả năng thành công cao hơn (report được nhiều KPI )
- thấy được các rủi ro tiềm ẩn , fraud detection (monitor các giao dịch e-commerce bất thường, gian lận trong Games , ...)
- targeting các chiến dịch marketing, nghiên cứu thị trường ở mức độ lớn
Trong giới hạn của 1 bài viết, mình chỉ trình bày gắn gọn, yếu tố cần quan tâm nhất là tính real-time , triển khai không quá phức tạp, không tốn quá nhiều chi phí (tận dụng open source projects & tools).
Các công ty/cửa hàng mình đã làm / biết đang có nhu cầu lớn về lĩnh vực này:
1) PhongCachMobile (Mobile Data Analytics trên 1 app shop cài sẵnhttps://play.google.com/store/apps/details?id=com.mc2ads.browser4x ). Đây là 1 project freelance ý tưởng lúc tham gia hackathon ideas http://nguyentantrieu.info/blog/from-hackathon-idea-to-new-smart-store-on-smartphone/
2) TheGioiDiDong.com (??? chưa biết rõ nhưng có hỏi lúc present ở Barcamp Saigon 2013)
ý tưởng từ 1 bài viết ở nytimes Attention, Shoppers: Store Is Tracking Your Cell
Like dozens of other brick-and-mortar retailers, Nordstrom wanted to learn more about its customers — how many came through the doors, how many were repeat visitors — the kind of information that e-commerce sites like Amazon have in spades. So last fall the company started testing new technology that allowed it to track customers’ movements by following the Wi-Fi signals from their smartphones.
3) FPT Technology Solutions ( http://www.chungta.vn/tin-tuc/cong-nghe/2012/11/fts-xay-cong-nghe-chong-ket-xe/)
Dủng Stream computing (xe, mật độ giao thông, dữ liệu từ các cảm biến, …) + Analytics =automatic real-time traffic monitoring (giảm tối thiếu thời gian lúc có 1 vụ kẹt xe và hiển thị trên biển thông báo ?)
Tham khảo thêm tại http://nguyentantrieu.info/blog/stream-computing-natural-ways-for-solving-problems-faster/
Chưa biết ở FIS họ implement như thế nào, đi google được vài thông tin có ích
4) GNT Vietnam (Game Analytic, Game Recommendation Engine, In-App Automation marketing )
Đây là 1 cty tham vọng, với những ý tưởng dùng phân tích học (Analytics) để cạnh tranh thông minh hơn trong 1 thị trường khó tính Mobile Game ở Nhật và toàn cầu.
Google 1 tí có cái slide hay:
Những cuốn sách hay để tham khảo
How does predicting human behavior combat risk, fortify healthcare, toughen crime fighting, and boost sales?
Real-time Reactive Analytics for the World
Original ideas from http://nguyentantrieu.info/blog/building-cloud-and-virtual-computing-platform-on-existing-physical-servers/
Philosophy design :
event-driven, scalable, resilient and responsive ( http://www.reactivemanifesto.org )
Open Source Links:
- http://akka.io : a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM
- https://github.com/peter-lawrey/Java-Chronicle : an ultra low latency, high throughput, persisted, messaging and event driven in memory database
- http://redis.io : advanced key-value store
- http://netty.io : an asynchronous event-driven network application framework
- https://github.com/Netflix/RxJava : a library for composing asynchronous and event-based programs using observable sequences for the Java VM
- http://kafka.apache.org : publish-subscribe messaging rethought as a distributed commit log
- https://github.com/orientechnologies/orientdb/wiki Open Source Graph Database
Reference URLs:
- http://arxiv.org/pdf/1008.1459.pdf : Actor Model of Computation by Carl Hewitt
- http://en.wikipedia.org/wiki/Named_pipe
- http://sonalimendis.blogspot.in/2010/10/named-pipes-for-inter-process.html
- http://www.slideshare.net/drorbr/the-actor-model-towards-better-concurrency
- http://www.aosabook.org/en/nginx.html
- http://www.slideshare.net/joshzhu/nginx-internals
- http://redis.io/topics/pubsub
- http://techblog.netflix.com/2013/01/optimizing-netflix-api.html
Benchmark Test for Parallel Processing with Actor Model (with Akka framework)
Refer link: http://nguyentantrieu.info/blog/the-architecture-for-real-time-event-processing-with-reactive-actor-model/
Problem:
- Distributed messages through all processing phases, each phase has a event-handler pool (pre-allocated size).
- The message, receiving at first phase, would go through all phases in defined flow (a directed graph – aka: topology ).
- Support: Statistics (likes counting, average, sum, …) and publishing new event (when matching a specific rule)
Result Test (1 second could process 6000 messages)
“TestActor-SIZE-100000″ “TestActor processed 100000 messages, done in (milisecs):18450”
“TestActor-SIZE-200000″ “TestActor processed 200000 messages, done in (milisecs):28214”
“TestActor-SIZE-500000″ “TestActor processed 500000 messages, done in (milisecs):81132, average 1 milisecs could process 6″
Monday, November 25, 2013
Realtime Processing with Storm
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
http://strataconf.com/stratany2012/public/schedule/detail/25246
http://info.mapr.com/ted-chicago-hug-4-23-12.html
http://cdn.oreillystatic.com/en/assets/1/event/85/Realtime%20Processing%20with%20Storm%20Presentation.pdf
http://strataconf.com/stratany2012/public/schedule/detail/25246
http://info.mapr.com/ted-chicago-hug-4-23-12.html
http://cdn.oreillystatic.com/en/assets/1/event/85/Realtime%20Processing%20with%20Storm%20Presentation.pdf
Subscribe to:
Posts (Atom)
-
1. Programming Languages Python Tutorial - Python for Beginners https://www.youtube.com/watch?v=_uQrJ0TkZlc Java Tutorial for Beginners ht...
-
Một chút về lịch sử CRM & CDP Quản lý quan hệ khách hàng (CRM) đã trải qua một chặng đường dài. Từ những năm 1990, CRM đã phát triển t...