Talend Big Data Basics

Talend provides a development environment that enables users to interact with many Big Data sources and targets without having to understand or write complicated code.

Talend Big Data Basics is an introduction to the Talend components that are shipped with several products that interact with Big Data systems.

Duration1 day (7 hours)
Target audience Anyone who wants to use Talend Studio to interact with Big Data systems
PrerequisitesCompletion of Talend Data Integration Basics or Talend Data Integration Advanced
Course objectives
After completing this course, you will be able to:
  • Create cluster metadata manually, from configuration files, or automatically
  • Create HDFS and Hive metadata
  • Connect to your cluster to use HDFS, HBase, Hive, Pig, Sqoop, and MapReduce
  • Read and write data to/from HDFS (HDFS, HBase)
  • Read and write tables to/from HDFS (Hive, Sqoop)
  • Process tables stored on HDFS with Hive
  • Process data stored on HDFS with Pig
  • Process data stored on HDFS with Big Data batch Jobs
Course agenda

Basic concepts

  • Opening a project
  • Monitoring the Hadoop cluster
  • Creating cluster metadata

Reading and writing data in HDFS

  • Storing a file on HDFS
  • Storing multiple files on HDFS
  • Reading data from HDFS
  • Using Hbase to store sparse data on HDFS

Working with tables

  • Importing tables with Sqoop
  • Creating tables in HDFS with Hive

Processing data and tables in HDFS

  • Processing Hive tables with Jobs
  • Profiling Hive tables (optional)
  • Processing data with Pig
  • Processing data with batch Jobs

Troubleshooting guide

  • Troubleshooting your cluster