博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
mapduce 分区操作,根据特定值将数据分开
阅读量:5169 次
发布时间:2019-06-13

本文共 4700 字,大约阅读时间需要 15 分钟。

代码
import org.apache.commons.lang.ObjectUtils;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;/** * Created with IntelliJ IDEA. * User: @别慌 * Date: 2019-06-16 * Time: 0:44 * Description: */public class PartitionerMain extends Configured implements Tool {    public int run(String[] args) throws Exception {        //job类,组装MR        Job job=Job.getInstance(super.getConf(),PartitionerMain.class.getSimpleName());        //打包运行        job.setJarByClass(PartitionerMain.class);        //第一步:读取文件,解析kv对        job.setInputFormatClass(TextInputFormat.class);        TextInputFormat.addInputPath(job,new Path(args[0]));        //自定义map逻辑        job.setMapperClass(partitionMap.class);        job.setMapOutputKeyClass(Text.class);        job.setMapOutputValueClass(NullWritable.class);        //第三部  分区         job.setPartitionerClass(partitionerOwn.class);        //自定义reduce逻辑        job.setReducerClass(partitionReduce.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(NullWritable.class);        job.setNumReduceTasks(2);        //设置输出类        job.setOutputFormatClass(TextOutputFormat.class);        //输出路径不写死,通过参数传进来        TextOutputFormat.setOutputPath(job,new Path(args[1]));        //提交任务        boolean b =job.waitForCompletion(true);        return b?0:1;    }    public static void main(String args[]) throws Exception {        int run= ToolRunner.run(new Configuration(),new PartitionerMain(),args);        System.exit(run);    }}import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Partitioner;/** * Created with IntelliJ IDEA. * User: @别慌 * Date: 2019-06-16 * Time: 0:56 * Description: */public class partitionerOwn extends Partitioner
{ public int getPartition(Text text, NullWritable nullWritable, int i) { String[] spilt=text.toString().split("\t"); String gameResult=spilt[5]; if (null != gameResult && ""!=gameResult){ if (Integer.parseInt(gameResult)>15){ return 0; }else { return 1; } } return 0; }}import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;/** * Created with IntelliJ IDEA. * User: @别慌 * Date: 2019-06-16 * Time: 0:47 * Description: */public class partitionMap extends Mapper
{ protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { context.write(value, NullWritable.get()); }}import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;/** * Created with IntelliJ IDEA. * User: @别慌 * Date: 2019-06-16 * Time: 1:14 * Description: */public class partitionReduce extends Reducer
{ protected void reduce(Text key,Iterable
values,Context context) throws IOException, InterruptedException { context.write(key,NullWritable.get()); }}
View Code

pom.xml

4.0.0
com.ali
hdfs
1.0-SNAPSHOT
org.apache.hadoop
hadoop-common
2.9.0
org.apache.hadoop
hadoop-hdfs
2.9.0
org.apache.hadoop
hadoop-mapreduce-client-core
2.9.0
View Code

java项目架构

  • PartitionerMain
  • partitionerOwn
  • partitionMap
  • partitionReduce

hadoop   jar  mouse.jar   PartitionerMain    /partition   /outtest

                     jar包            main函数            hdfs上的目标文件      hdfs的输出文件(不能事先存在)

源文件(partition)

1    0    1    2017-07-31 23:10:12    834232    6    sadasaa2    9    6    2019-06-76 78:97:12    687907    45    dsdskd
View Code

 

转载于:https://www.cnblogs.com/tangsonghuai/p/11037345.html

你可能感兴趣的文章
spring-boot-资源处理
查看>>
phpcms v9二次开发之模型类的应用(2)
查看>>
多线程之异步操作
查看>>
C++ 指定输出宽度和填充
查看>>
ImageX分块处理和保存
查看>>
解决百度BMR的spark集群开启slaves结点的问题
查看>>
bzoj 1861 treap
查看>>
Codeforces 263E Rhombus (看题解)
查看>>
【原创】Java移位运算
查看>>
[转] 怎么减少编程中的 bug?
查看>>
蓝点中文Linux2.0 实验十三 进程与作业管理
查看>>
工作室招新管理系统需求分析
查看>>
C# 不借助第三个变量实现两整数交换
查看>>
js调用局部打印功能并还原
查看>>
web of knowledge分析文献引用情况(引)
查看>>
matlab写入二进制文件注意事项
查看>>
mac ssh key 的获取 (转)
查看>>
Java 学习 day04
查看>>
Jobdu 1005
查看>>
webapi 给自己挖的坑
查看>>