Apache Pig Distinct运算符

Apache Pig

Apache Pig教程 Apache Pig 概述 Apache Pig 架构 Apache Pig 安装 Apache Pig 执行 Apache Pig Grunt Shell Pig Latin 基础 Apache Pig 加载数据 Apache Pig 存储数据 Apache Pig Diagnostic运算符 Apache Pig Describe运算符 Apache Pig Explain运算符 Apache Pig illustrate运算符 Apache Pig Group运算符 Apache Pig Cogroup运算符 Apache Pig Join运算符 Apache Pig Cross运算符 Apache Pig Union运算符 Apache Pig Split运算符 Apache Pig Filter运算符 Apache Pig Distinct运算符 Apache Pig Foreach运算符 Apache Pig Order By运算符 Apache Pig Limit运算符 Apache Pig Eval函数 Apache Pig 加载和存储函数 Apache Pig 包和元组函数 Apache Pig 字符串函数 Apache Pig 日期时间函数 Apache Pig 数学函数 Apache Pig 用户定义函数（UDF） Apache Pig 运行脚本

Apache Pig Distinct运算符

DISTINCT 运算符用于从关系中删除冗余（重复）元组。

语法

下面给出了 DISTINCT 运算符的语法。

grunt> Relation_name2 = DISTINCT Relatin_name1;

例

假设在HDFS目录 /pig_data/ 中有一个名为 student_details.txt 的文件，如下所示。

student_details.txt

001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata 
002,siddarth,Battacharya,9848022338,Kolkata 
003,Rajesh,Khanna,9848022339,Delhi 
003,Rajesh,Khanna,9848022339,Delhi 
004,Preethi,Agarwal,9848022330,Pune 
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai 
006,Archana,Mishra,9848022335,Chennai

通过关系 student_details 将此文件加载到Pig中，如下所示。

grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') 
   as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);

现在，让我们使用 DISTINCT 运算符从 student_details 关系中删除冗余（重复）元组，并将其另存在一个名为distinct_data 的关系如下所示。

grunt> distinct_data = DISTINCT student_details;

验证

使用 DUMP 运算符验证关系 distinct_data ，如下所示。

grunt> Dump distinct_data;

输出

它将产生以下输出，显示关系 distinct_data 的内容如下。

(1,Rajiv,Reddy,9848022337,Hyderabad)
(2,siddarth,Battacharya,9848022338,Kolkata) 
(3,Rajesh,Khanna,9848022339,Delhi) 
(4,Preethi,Agarwal,9848022330,Pune) 
(5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar)
(6,Archana,Mishra,9848022335,Chennai)

上一篇:Apache Pig Filter运算符

下一篇:Apache Pig Foreach运算符

我要发贴

Apache Pig

Apache Pig Distinct运算符

语法

例

输出

站内导航

联系我们

友情链接