Apache Pig 存储数据

Apache Pig

Apache Pig教程 Apache Pig 概述 Apache Pig 架构 Apache Pig 安装 Apache Pig 执行 Apache Pig Grunt Shell Pig Latin 基础 Apache Pig 加载数据 Apache Pig 存储数据 Apache Pig Diagnostic运算符 Apache Pig Describe运算符 Apache Pig Explain运算符 Apache Pig illustrate运算符 Apache Pig Group运算符 Apache Pig Cogroup运算符 Apache Pig Join运算符 Apache Pig Cross运算符 Apache Pig Union运算符 Apache Pig Split运算符 Apache Pig Filter运算符 Apache Pig Distinct运算符 Apache Pig Foreach运算符 Apache Pig Order By运算符 Apache Pig Limit运算符 Apache Pig Eval函数 Apache Pig 加载和存储函数 Apache Pig 包和元组函数 Apache Pig 字符串函数 Apache Pig 日期时间函数 Apache Pig 数学函数 Apache Pig 用户定义函数（UDF） Apache Pig 运行脚本

Apache Pig 存储数据

在上一章中，我们学习了如何将数据加载到Apache Pig中。你可以使用 store 运算符将加载的数据存储在文件系统中，本章介绍如何使用 Store 运算符在Apache Pig中存储数据。

语法

下面给出了Store语句的语法。

STORE Relation_name INTO ' required_directory_path ' [USING function];

例

假设我们在HDFS中有一个包含以下内容的文件 student_data.txt 。

001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata
003,Rajesh,Khanna,9848022339,Delhi
004,Preethi,Agarwal,9848022330,Pune
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai.

使用LOAD运算符将它读入关系 student ，如下所示。

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' 
   USING PigStorage(',')
   as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, 
   city:chararray );

现在，让我们将关系存储在HDFS目录“/pig_Output/"中，如下所示。

grunt> STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');

输出

执行 store 语句后，将获得以下输出。使用指定的名称创建目录，并将数据存储在其中。

2015-10-05 13:05:05,429 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
MapReduceLau ncher - 100% complete
2015-10-05 13:05:05,429 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - 
Script Statistics:
   
HadoopVersion    PigVersion    UserId    StartedAt             FinishedAt             Features 
2.6.0            0.15.0        Hadoop    2015-10-0 13:03:03    2015-10-05 13:05:05    UNKNOWN  
Success!  
Job Stats (time in seconds): 
JobId          Maps    Reduces    MaxMapTime    MinMapTime    AvgMapTime    MedianMapTime    
job_14459_06    1        0           n/a           n/a           n/a           n/a
MaxReduceTime    MinReduceTime    AvgReduceTime    MedianReducetime    Alias    Feature   
     0                 0                0                0             student  MAP_ONLY 
OutPut folder
hdfs://localhost:9000/pig_Output/ 
 
Input(s): Successfully read 0 records from: "hdfs://localhost:9000/pig_data/student_data.txt"  
Output(s): Successfully stored 0 records in: "hdfs://localhost:9000/pig_Output"  
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0
Total records proactively spilled: 0
  
Job DAG: job_1443519499159_0006
  
2015-10-05 13:06:06,192 [main] INFO  org.apache.pig.backend.hadoop.executionengine
.mapReduceLayer.MapReduceLau ncher - Success!

验证

你可以如下所示验证存储的数据。

步骤1

首先，使用 ls 命令列出名为 pig_output 的目录中的文件，如下所示。

hdfs dfs -ls 'hdfs://localhost:9000/pig_Output/'
Found 2 items
rw-r--r-   1 Hadoop supergroup          0 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/_SUCCESS
rw-r--r-   1 Hadoop supergroup        224 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/part-m-00000

可以观察到在执行 store 语句后创建了两个文件。

步骤2

使用 cat 命令，列出名为 part-m-00000 的文件的内容，如下所示。

$ hdfs dfs -cat 'hdfs://localhost:9000/pig_Output/part-m-00000' 
1,Rajiv,Reddy,9848022337,Hyderabad
2,siddarth,Battacharya,9848022338,Kolkata
3,Rajesh,Khanna,9848022339,Delhi
4,Preethi,Agarwal,9848022330,Pune
5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
6,Archana,Mishra,9848022335,Chennai

上一篇:Apache Pig 加载数据

下一篇:Apache Pig Diagnostic运算符

我要发贴

Apache Pig

Apache Pig 存储数据

语法

例

验证

步骤1

步骤2

站内导航

联系我们

友情链接