What you'll learn
- Learn how to use Apache Spark to perform data analysis
- How to use parallel programming to explore data sets
- Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
- Prepare for the Spark Certified Developer exam
What you'll learn
- Learn how to use Apache Spark to perform data analysis
- How to use parallel programming to explore data sets
- Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
- Prepare for the Spark Certified Developer exam
- Learn how to use Apache Spark to perform data analysis
- How to use parallel programming to explore data sets
- Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
- Prepare for the Spark Certified Developer exam
What you'll learn
- The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines
- Exploratory data analysis, feature extraction, supervised learning, and model evaluation
- Application of these principles using Apache Spark
- How to implement scalable algorithms for fundamental statistical models
MITx: MAS.S69x Big Data and Social Physics
What you'll learn:
Installment 1: Social Physics: the revolution. Students will learn how big data is revolutionizing the social sciences, including management and medical science, and what changes this may make in our society.
Installment 2: Social Physics: who we are. Students will learn how big data has given us new insights into the role idea flow on social networks plays in our thinking, and how we use idea flow and social pressure to build social norms that make companies and society work smoothly.
Installment 3: Social Physics: idea machines. Students will learn how social network and social network incentives can be used to create more productive, agile, and creative organizations.
Installment 4: Social Physics: data driven cities. Students will learn how big data is transforming our cities, and how insights from social physics can be used to both minimize poverty and make our cities more productive.
Installment 5: Social Physics: data driven societies. Students will learn how we can build data sharing infrastructure that both protects individual privacy and freedom while at the same time encouraging greater innovation.
A three-day class on distributed computing, using the high-speed cluster programming framework, Spark. Throughout the class, there will be hands-on exercises with computing resources provided by the organizers.
The class will include introductions to the many Spark features, case studies from current users, best practices for deployment and tuning, future development plans, and hands-on exercises.