googlehadoop

Connect to Pig

The Bitnami Hadoop Stack includes Pig, a platform for analyzing large data sets that consist of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

To use Pig, simply run:

$ pig

After a few moments, you will see the grunt prompt:

grunt>

In order to run the Pig tutorial scripts, you will first need to upload a file to HDFS:

$ hadoop fs -copyFromLocal /opt/bitnami/hadoop/pig/tutorial/data/excite.log.bz2 .

In this case we will run script1-hadoop.pig, which you can then run as following:

$ cd /opt/bitnami/hadoop/pig/tutorial
$ pig ./scripts/script1-hadoop.pig

The process takes some minutes, but once it finishes, you will find some output similar to the following indicating success:

Input(s):
Successfully read 944954 records (10409092 bytes) from: "hdfs://localhost:8020/user/hadoop/excite.log.bz2"

Output(s):
Successfully stored 13530 records (659954 bytes) in: "hdfs://localhost:8020/user/hadoop/script1-hadoop-results"

(...)

2018-02-20 09:11:36,947 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2018-02-20 09:11:36,976 [main] INFO  org.apache.pig.Main - Pig script completed in 3 minutes, 21 seconds and 433 milliseconds (201433 ms)
Last modification September 4, 2018