Talend Big Data Advanced – MapReduce

Talend provides a development environment that lets you interact with many source and target Big Data stores without having to learn and write complicated code.

This course covers Big Data batch Jobs that use the MapReduce framework.

Duration1 day (7 hours)
Target audience Anyone who wants to use Talend Studio to interact with Big Data systems
PrerequisitesCompletion of Talend Data Integration Basics and Talend Big Data Basics
Course objectives

After completing this course, you will be able to:

  • Connect to a Hadoop cluster from a Talend Job
  • Use context variables and metadata
  • Read and write files in HDFS in a Big Data batch Job
  • Use the Twitter API with Talend components
  • Schedule Big Data Job execution from Talend Administration Center (TAC)
  • Tune memory requests to YARN
Course agenda

Clickstream use case

  • Monitoring the Hadoop cluster
  • Setting up a development environment
  • Loading data into HDFS
  • Enriching logs
  • Computing statistics
  • Converting a standard Job to a Big Data batch Job
  • Understanding MapReduce jobs
  • Using Studio to configure resource requests to YARN

Sentiment analysis use case

  • Loading dictionary and time zone data into HDFS
  • Loading tweets into HDFS
  • Processing tweets with MapReduce
  • Scheduling Job execution