Friday, June 12, 2015

TargetAd 2015 - workshop on computational advertising

http://www.targetad-workshop.net/

Recent advent of big data applications and platforms has led to a renaissance in many areas of machine learning and data mining. Computational advertising, a burgeoning field that accumulated revenue of over 20 billion dollars in the first half of 2013 in the US alone, has particularly benefited, and the industry has observed a steady two-digit growth in the past few years. However, in order to maintain and improve upon this positive trend, researchers in academia and industry alike are faced with numerous theoretical and practical challenges that require immediate attention.

Objective of the workshop is to bring together interdisciplinary practitioners and researchers from industrial and academic research labs to discuss state-of-the-art research and future directions in the fields of Ad Targeting, User Modeling, Recommender Systems, and related areas in the era of Big Data. We expect the workshop to help develop and grow stronger a community of interested researchers, and yield future collaborations and exchanges.

Interesting papers: 
http://www.www2015.it/documents/proceedings/companion/p1269.pdf
http://www.journalofbigdata.com/content/pdf/s40537-014-0007-7.pdf

Thursday, June 11, 2015

Apache Spark and massive online courses

1) https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x


What you'll learn
  • Learn how to use Apache Spark to perform data analysis
  • How to use parallel programming to explore data sets
  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
  • Prepare for the Spark Certified Developer exam


What you'll learn
  • The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines
  • Exploratory data analysis, feature extraction, supervised learning, and model evaluation
  • Application of these principles using Apache Spark
  • How to implement scalable algorithms for fundamental statistical models

MITx: MAS.S69x Big Data and Social Physics

What you'll learn:
Installment 1: Social Physics: the revolution.    Students will learn how big data is revolutionizing the social sciences, including management and medical science, and what changes this may make in our society.

Installment 2: Social Physics: who we are.  Students will learn how big data has given us new insights into the role idea flow on social networks plays in our thinking, and how we use idea flow and social pressure to build social norms that make companies and society work smoothly. 

Installment 3: Social Physics: idea machines.  Students will learn how social network and social network incentives can be used to create more productive, agile, and creative organizations.   

Installment 4: Social Physics: data driven cities.  Students will learn how big data is transforming our cities, and how insights from social physics can be used to both minimize poverty and make our cities more productive.

Installment 5: Social Physics: data driven societies.  Students will learn how we can build data sharing infrastructure that both protects individual privacy and freedom while at the same time encouraging greater innovation.

A three-day class on distributed computing, using the high-speed cluster programming framework, Spark. Throughout the class, there will be hands-on exercises with computing resources provided by the organizers.

The class will include introductions to the many Spark features, case studies from current users, best practices for deployment and tuning, future development plans, and hands-on exercises.

Friday, April 24, 2015

QUANTCAST: A SMALL BIG DATA COMPANY

The Big Rewards of Big Data
A problem well defined is a problem half-solved. —John Dewey
Up to this point, we’ve defined Big Data and its elements. We then described many of the technologies that organizations are using to harness its value. Now it’s time to see some of these technologies in action. This chapter examines three organizations in depth, exploring how they have successfully deployed Big Data tools and seen amazing results. Let’s start with a company that makes handling Big Data its raison-d’etre.

QUANTCAST: A SMALL BIG DATA COMPANY
How do advertisers reach their target audiences online? It’s a simple question with anything but a simple answer. Traditionally, advertisers reached audiences via television based on demographic targeting. As discussed in Chapter 2, thanks to the web, consumers today spend less time watching TV broadcasts and more time in their own personalized media environments (i.e., their own individual blogs, news stories, songs, and videos picked). While good for consumers, this media fragmentation has scattered advertisers’ audiences. Relative to even twenty years ago, it is harder for them to reach large numbers of relevant consumers.
But just as the web lets consumers choose media more selectively, it lets advertisers choose their audiences more selectively. That is, advertisers need not try to re-create the effectiveness of TV advertising; they can surpass it. For example, an hour-long prime-time show on network TV contains nearly 22 minutes of marketing content.1 If advertisers could precisely target consumers, they could achieve the same economics with just a few minutes of commercials. As a result, TV shows could be nearly commercial free. Ads on the web are individually delivered, so decisions on which ad to show to whom can be made one consumer at a time.
Enter Quantcast.
Founded in 2006 by entrepreneurs Konrad Feldman and Paul Sutter, Quantcast is a web measurement and targeting company headquartered in San Francisco, California. Now with 250 employees, Quantcast models marketers’ best prospects and finds similar or lookalike audiences across the world. Connecting advertisers with their best customers certainly isn’t easy, never mind maximizing yield for publishers and delivering relevant experiences for consumers. To do this, Quantcast software must sift through a veritable mountain of data. Each month, it analyzes more than 300 billion observations of media consumption (as of this writing). Today the company’s web visibility is second only to Google. Ultimately, Quantcast attempts to answer some very difficult advertising-related questions—and none of this would be possible without Big Data.
I wanted to know more about how Quantcast specifically uses Big Data, so I asked Jim Kelly, the company’s VP of R&D, and Jag Duggal, its VP of Product Management. Over the course of a few weeks, I spoke with them.
Steps: A Big Evolution
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change. - Charles Darwin
Quantcast understood the importance of Big Data from its inception. The company adopted Hadoop from the get-go but found that its data volumes exceeded Hadoop’s capabilities at the time. Rather than wait for the Hadoop world to catch up (and miss a potentially large business opportunity in the process), Quantcast took Hadoop to the next level. The company created a massive data processing infrastructure that could process more than 20 petabytes of data per day—a volume that is constantly increasing. Quantcast built its own distributed file system (a centerpiece of its current software stack) and made it freely available to the open source community. The Quantcast File System2 (QFS) is a cost-effective alternative to the Hadoop Distributed File System (HDFS) mentioned in Chapter 4. QFS delivers significantly improved performance while consuming 50 percent less disk space.3
In a Big Data world, complacency is a killer. New data sources mean that the days of “set it and forget it” are long gone, and Charles Darwin’s quote is as relevant now as it was 100 years ago. In 2006, like just about every company in the world, Quantcast practically ignored data generated from mobile devices. Most Internet-related data originated from desktops and laptops before iPhone and Droids arrived. Of course, that has certainly changed over the past five years, and Quantcast now incorporates these new and essential data sources into its solutions.4 This willingness and ability to innovate has resulted in some nice press for the company. In February 2010, Fast Company ranked Quantcast forty-sixth on its list of the World’s Most Innovative Companies.5 To this day, the company continues to expand and diversify its analytics products.

From its inception in 2006, Quantcast focused on providing online audience measurement services, a critical part of the advertising industry for both advertisers and publishers. TV and radio stations need to use a mutually agreeable source for determining how many people they are reaching. Companies like Arbitron and Nielsen had provided similar services for radio and TV for decades. These companies used panels of users to extrapolate media consumption across the entire population.
For the most part, these companies’ Small Data approaches consist of simply porting their panel-based approaches to the Internet. As discussed in earlier chapters, Small Data tools and methods typically don’t work well with Big Data, something that Quantcast understood early on. It built a Big Data–friendly system tailored to the web’s unique characteristics. Millions of popular sites, social networks, channels, blogs, and forums permeate the web. Consumption is fragmented, making extrapolating from a panel extremely difficult. Luckily, since each web page is delivered individually to a user panel, such extrapolation is unnecessary. On the web, Quantcast measures the “consumption” of each page directly.
Buy Your Audience
In 2009, Quantcast began development of an “audience-buying” engine. With it, the company could leverage its vast troves of consumer data on online user media consumption. As real-time ad exchanges such as the DoubleClick AdExchange arose, Quantcast quickly got on board. Today, Quantcast is a major player in a market that auctions off billions of ad impressions each day.
In November 2012, Quantcast released Quantcast Advertise. The self-service platform enables advertisers, agencies, and publishers to connect Big Data with discrete brand targets.6 With the right solutions, Big Data allows organizations to drill down and reach very specific audiences. “A flexible compute infrastructure was critical to our ability to produce more accurate audience measurement services. That same infrastructure produced more accurate ad targeting once ad inventory started to be auctioned in real-time,” Duggal told me.

We saw earlier in this book how Amazon, Apple, Facebook, Google, and other progressive companies eat their own dog food. Count Quantcast among the companies that use its own Big Data tools. What’s more, like Google, Quantcast makes some of its own internal Big Data solutions available for free to its customers.7 Quantcast audience segments allow users to understand and showcase any specific audience group for free. Once implemented, these segments appear in users’ full publisher profile on Quantcast.com. As a result, they can better represent their audiences. Figure 5.1 shows some sample data from its Quantified dashboard.




Source: Quantcast.com
To be sure, “regular” web traffic, click-through, and purchase metrics might be sufficient for some business. However, Quantcast knows that it can’t serve myriad clients across the globe with a mind-set of one size fits all. No one company can possibly predict every Big Data need. Different businesses face vastly different data requirements, challenges, and goals. To that end, Quantcast provides integration between its products and third-party data and applications. What if customers could easily integrate their own data and applications with Quantcast-generated data? What if its clients wanted to conduct A/B testing, support out-of-browser and offline scenarios, and use multiple, concurrent analytic services—without impacting performance?
“Integration is central to everything we’re doing here,” says Kelly. “It’s the source of all the data we work with and the means by which it becomes relevant to the world.” And that advanced integration isn’t stopping anytime soon. Case in point: Quantcast created and offers an API built off the Microsoft Silverlight Analytics Framework.8
Results
Consider the following results from some of Quantcast’s recent customer campaigns:
  • A national after-market auto parts retailer relied upon digital advertising to attract new customers and drive online sales. Quantcast built predictive models to convert customers who had actually completed an online purchase, distinguishing between passersby and converting customers. The campaign all but eliminated the majority of superfluous clicks, achieving a return on investment (ROI) greater than 200 percent.
  • A major wireless phone company achieved a 76 percent increase in conversion rates above its optimized content-targeted campaign. Quantcast lookalike data allowed lead generation to garner significantly higher conversion rates over content-targeted inventory purchased from the same inventory sources.
  • A leading hotelier gained deep insights into the demographic, interests, behaviors, and affinities of its customers. In the process, it ultimately doubled its bookings.
Lessons
Compared to many organizations, Quantcast is a relatively small company. This proves the point that an organization doesn’t need to be big to benefit from—and innovate with—Big Data. There’s no secret sauce, but embracing Big Data from its inception starts a company on the right path. Also, it’s critical to realize that Small Data tools just don’t play nice with Big Data. Understand this, and then spend the time, money, and resources to equip your employees and customers with powerful self-service tools.

Thursday, April 2, 2015

Making Fast Data work for you

Machine Intelligence + Human Insight + Reactive to Event Sourcing = Fast Data
“Realize that everything connects to everything else”, said Leonardo Da Vinci. 

Fast Data http://www.tibco.com/blog/2015/03/27/how-analytics-facilitates-fast-data/

For e-commerce


Friday, January 30, 2015

Wednesday, January 14, 2015

Where Apache Spark meets RFX-stream

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Using & install Apache Spark http://www.mc2ads.com/2014/02/install-apache-spark-and-fast-log.html

Advanced use cases

RFX-stream is new kind of real-time data processing engine, based on Akka actor with Pipeline
http://www.mc2ads.com/p/rfx-for-big-data-developer.html



Sunday, January 4, 2015

What is reactive analytics ?

Reactive analytics is new concept, which is used for describe the processes transform data, event to analytics and actions for reality. It's used in RFX framework .