Hadoop Pig

Hadoop Pig on a Plate: Big Data Processing Made Simple

Hadoop Pig is a platform for analyzing very large datasets stored in a Hadoop distributed processing cluster. Consisting of a high level programming language (Hadoop Pig Latin) and a compiler, Hadoop Pig enables the creation of Hadoop MapReduce programs for performing mass-scale parallel computation on big data resources like web server logs, call detail records, sensor data, and so on.

For enterprises looking to incorporate Hadoop Pig-based big data analytics into their data management and business intelligence architectures, Talend Open Studio for Big Data makes the process simpler and faster.

A Graphical Layer Over Hadoop Pig

Hadoop Pig emerged from an effort to make it easier to leverage the powerful but complex MapReduce distributed processing framework. Talend Open Studio for Big Data goes further by incorporating Hadoop Pig functionality into an Eclipse-based graphical development environment. By selecting from a palette of graphical components and configuring component properties, data engineers can quickly and easily build data processing jobs that use Hadoop Pig operations like filtering, sorting, and aggregating.

Behind the graphical console, Talend Open Studio for Big Data automatically generates the corresponding Hadoop Pig Latin code. The end result is MapReduce data processing jobs that you can deploy as stand-alone jobs, executables, or as shared big data services.

Support for Other Powerful Hadoop Applications

Along with Hadoop Pig, the Hadoop project and its offshoots have produced other innovative big data management technologies like Hadoop Distributed File System (HDFS), HBase, Hadoop Hive, and Sqoop. And just like with Hadoop Pig, Talend Open Studio for Big Data enables easy use of these Hadoop tools through the Talend graphical development environment.

Hadoop Pig in Your Enterprise Data Flows

With Talend Open Studio for Big Data, the first pure open source big data management solution, you can seamlessly integrate powerful Hadoop application technologies into your overall data management flows. Talend delivers more comprehensive connectivity than any other data integration platform, enabling you to easily bridge between your Hadoop layer and:

  • File formats such as CSV, XML, JSON, Excel, and EBCDIC
  • RDMS applications like Oracle, SQL Server, Informix, DB2, MySQL, and PostgreSQL
  • Enterprise applications including SAP, Microsoft CRM, SugarCRM, SAS, and Marketo
  • Cloud services such as Salesforce and Force.com

Learn more about Talend’s big data solutions from the many resources on this web site, or download Talend Open Studio for Big Data today and start benefiting from the leading open source big data tool.