安卓app整站织梦网站源码,wordpress+3d线条,重庆制作网站,沙坪建设集团网站MapReduce原理分析
什么是MapReduce
前言#xff1a;如果想知道一堆牌中有多少张红桃#xff0c;直接的方式是一张张的检查#xff0c;并数出有多少张红桃。 而MapReduce的方法是#xff0c;给所有的节点分配这堆牌#xff0c;让每个节点计算自己手中有几张是红桃#…MapReduce原理分析
什么是MapReduce
前言如果想知道一堆牌中有多少张红桃直接的方式是一张张的检查并数出有多少张红桃。 而MapReduce的方法是给所有的节点分配这堆牌让每个节点计算自己手中有几张是红桃然后将这个数汇总得到结果。
概述
官方介绍MapReduce是一种分布式计算模型由Google提出主要用于搜索领域解决海量数据的计算问题。MapReduce是分布式运行的由俩个阶段组成Map和Reduce。MapReduce框架都有默认实现用户只需要覆盖map()和reduce()俩个函数即可实现分布式计算。
原理分析 Map阶段执行过程
框架会把输入文件划分为很多InputSplit默认每个hdfs的block对应一个InputSplit。通过RecordReader类将每个InputSplit解析为一个个键值对K1,V1。默认每一个行会被解析成一个键值对。框架会调用Mapper类中的map()函数map函数的形参是k1,v1输出是k2,v2。一个inputSplit对应一个map task。框架对map函数输出的k2,v2进行分区。不同分区中的k2,v2由不同的reduce task处理默认只有一个分区。框架对每个分区中的数据按照k2进行排序、分组。分组指的是相同k2的v2分为一组。在map节点框架可以执行reduce规约此步骤为可选。框架会把map task输出的k2,v2写入linux的磁盘文件
Reduce阶段执行过程
框架对多个map任务的输出按照不同的分区通过网络copy到不同的reduce节点这个过程称为shuffle。框架对reduce端接收到的相同分区的k2,v2数据进行合并、排序、分组框架调用reduce类中的reduce方法输入k2,[v2…]输出k3,v3。一个k2,[v2…]调用一次reduce函数。框架把reduce的输出保存到hdfs。
WordCount案例分析 多文件WordCount案例分析 Shuffle过程详解
shuffle是一个过程贯穿map和reduce通过网络将map产生的数据放到reduce。
Map与Reduce的WordsCount案例与日志查看
引入依赖
?xml version1.0 encodingUTF-8?
project xmlnshttp://maven.apache.org/POM/4.0.0 xmlns:xsihttp://www.w3.org/2001/XMLSchema-instancexsi:schemaLocationhttp://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsdmodelVersion4.0.0/modelVersionparentgroupIdorg.springframework.boot/groupIdartifactIdspring-boot-starter-parent/artifactIdversion2.7.14/versionrelativePath/ !-- lookup parent from repository --/parentgroupIdcom.hx/groupIdartifactIdhadoopDemo1/artifactIdversion0.0.1-SNAPSHOT/versionnamehadoopDemo1/namedescriptionDemo project for Spring Boot/descriptionpropertiesjava.version1.8/java.version/propertiesdependenciesdependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-client/artifactIdversion3.3.0/versionscopeprovided/scope/dependency/dependencies
/project编码
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;/*** author Huathy* date 2023-10-21 21:17* description 组装任务*/
public class WordCountJob {public static void main(String[] args) throws Exception {System.out.println(inputPath args[0]);System.out.println(outputPath args[1]);String path args[0];String path2 args[1];// job需要的配置参数Configuration configuration new Configuration();// 创建jobJob job Job.getInstance(configuration, wordCountJob);// 注意这一行必须设置否则在集群的时候将无法找到Job类job.setJarByClass(WordCountJob.class);// 指定输入文件FileInputFormat.setInputPaths(job, new Path(path));FileOutputFormat.setOutputPath(job, new Path(path2));job.setMapperClass(WordMap.class);job.setReducerClass(WordReduce.class);// 指定map相关配置job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(LongWritable.class);// 指定reducejob.setOutputKeyClass(Text.class);job.setOutputValueClass(LongWritable.class);// 提交任务job.waitForCompletion(true);}/*** author Huathy* date 2023-10-21 21:39* description 创建自定义映射类* 定义输入输出类型*/public static class WordMap extends MapperLongWritable, Text, Text, LongWritable {/*** 需要实现map函数* 这个map函数就是可以接受keyInvalueIn产生keyOut、ValueOut** param k1* param v1* param context* throws IOException* throws InterruptedException*/Overrideprotected void map(LongWritable k1, Text v1, Context context) throws IOException, InterruptedException {// k1表示每行的行首偏移量v1表示每一行的内容// 对获取到的每一行数据进行切割把单词切割出来String[] words v1.toString().split(\W);// 迭代切割的单词数据for (String word : words) {// 将迭代的单词封装为k2,v2的形式Text k2 new Text(word);System.out.println(k2: k2.toString());LongWritable v2 new LongWritable(1);// 将k2,v2输出context.write(k2, v2);}}}/*** author Huathy* date 2023-10-21 22:08* description 自定义的reducer类*/public static class WordReduce extends ReducerText, LongWritable, Text, LongWritable {/*** 针对v2s的数据进行累加求和并且把最终的数据转为k3,v3输出** param k2* param v2s* param context* throws IOException* throws InterruptedException*/Overrideprotected void reduce(Text k2, IterableLongWritable v2s, Context context) throws IOException, InterruptedException {long sum 0L;for (LongWritable v2 : v2s) {sum v2.get();}// 组装K3,V3LongWritable v3 new LongWritable(sum);System.out.println(k3: k2.toString() -- v3: v3.toString());context.write(k2, v3);}}}运行命令与输出日志
[rootcent7-1 hadoop-3.2.4]# hadoop jar wc.jar WordCountJob hdfs://cent7-1:9000/hello.txt hdfs://cent7-1:9000/out /home/hadoop-3.2.4/wc.jar
inputPath hdfs://cent7-1:9000/hello.txt
outputPath hdfs://cent7-1:9000/out
set jar /home/hadoop-3.2.4/wc.jar
2023-10-22 15:30:34,183 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2023-10-22 15:30:35,183 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2023-10-22 15:30:35,342 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1697944187818_0010
2023-10-22 15:30:36,196 INFO input.FileInputFormat: Total input files to process : 1
2023-10-22 15:30:37,320 INFO mapreduce.JobSubmitter: number of splits:1
2023-10-22 15:30:37,694 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1697944187818_0010
2023-10-22 15:30:37,696 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-10-22 15:30:38,033 INFO conf.Configuration: resource-types.xml not found
2023-10-22 15:30:38,034 INFO resource.ResourceUtils: Unable to find resource-types.xml.
2023-10-22 15:30:38,188 INFO impl.YarnClientImpl: Submitted application application_1697944187818_0010
2023-10-22 15:30:38,248 INFO mapreduce.Job: The url to track the job: http://cent7-1:8088/proxy/application_1697944187818_0010/
2023-10-22 15:30:38,249 INFO mapreduce.Job: Running job: job_1697944187818_0010
2023-10-22 15:30:51,749 INFO mapreduce.Job: Job job_1697944187818_0010 running in uber mode : false
2023-10-22 15:30:51,751 INFO mapreduce.Job: map 0% reduce 0%
2023-10-22 15:30:59,254 INFO mapreduce.Job: map 100% reduce 0%
2023-10-22 15:31:08,410 INFO mapreduce.Job: map 100% reduce 100%
2023-10-22 15:31:09,447 INFO mapreduce.Job: Job job_1697944187818_0010 completed successfully
2023-10-22 15:31:09,578 INFO mapreduce.Job: Counters: 54File System CountersFILE: Number of bytes read129FILE: Number of bytes written479187FILE: Number of read operations0FILE: Number of large read operations0FILE: Number of write operations0HDFS: Number of bytes read139HDFS: Number of bytes written35HDFS: Number of read operations8HDFS: Number of large read operations0HDFS: Number of write operations2HDFS: Number of bytes read erasure-coded0Job Counters Launched map tasks1Launched reduce tasks1Data-local map tasks1Total time spent by all maps in occupied slots (ms)4916Total time spent by all reduces in occupied slots (ms)5821Total time spent by all map tasks (ms)4916Total time spent by all reduce tasks (ms)5821Total vcore-milliseconds taken by all map tasks4916Total vcore-milliseconds taken by all reduce tasks5821Total megabyte-milliseconds taken by all map tasks5033984Total megabyte-milliseconds taken by all reduce tasks5960704Map-Reduce FrameworkMap input records4Map output records8Map output bytes107Map output materialized bytes129Input split bytes94Combine input records0Combine output records0Reduce input groups5Reduce shuffle bytes129Reduce input records8Reduce output records5Spilled Records16Shuffled Maps 1Failed Shuffles0Merged Map outputs1GC time elapsed (ms)259CPU time spent (ms)2990Physical memory (bytes) snapshot528863232Virtual memory (bytes) snapshot5158191104Total committed heap usage (bytes)378011648Peak Map Physical memory (bytes)325742592Peak Map Virtual memory (bytes)2575839232Peak Reduce Physical memory (bytes)203120640Peak Reduce Virtual memory (bytes)2582351872Shuffle ErrorsBAD_ID0CONNECTION0IO_ERROR0WRONG_LENGTH0WRONG_MAP0WRONG_REDUCE0File Input Format Counters Bytes Read45File Output Format Counters Bytes Written35
[rootcent7-1 hadoop-3.2.4]# MapReduce任务日志查看
开启yarn日志聚合功能将散落在nodemanager节点的日志统一收集管理方便查看修改yarn-site.xml中的yarn.log-aggregation-enable和yarn.log.server.url
propertynameyarn.log-aggregation-enable/namevaluetrue/value
/property
propertynameyarn.log.server.url/namevaluehttp://cent7-1:19888/jobhistory/logs//value
/property启动historyserver
sbin/mr-jobhistory-daemon.sh start historyserverUI界面查看 访问 http://192.168.56.101:8088/cluster 点击History 点进Successful 看到成功记录点击logs可以看到成功日志 停止Hadoop集群中的任务
CtrlC退出终端并不会结束任务因为任务已经提交到了Hadoop
查看任务列表yarn application -list结束任务进程yarn application -kill [application_Id]
# 查看正在进行的任务列表
[rootcent7-1 hadoop-3.2.4]# yarn application -list
2023-10-22 16:18:38,756 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):1Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1697961350721_0002 wordCountJob MAPREDUCE root default ACCEPTED UNDEFINED 0% N/A
# 结束任务
[rootcent7-1 hadoop-3.2.4]# yarn application -kill application_1697961350721_0002
2023-10-22 16:18:55,669 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Killing application application_1697961350721_0002
2023-10-22 16:18:56,795 INFO impl.YarnClientImpl: Killed application application_1697961350721_0002Hadoop序列化机制
序列化机制作用 上面可以看出Hadoop运行的时候大多数IO操作。我们在编写Hadoop的Map和Reduce代码的时候用的都是Hadoop官方提供的数据类型Hadoop官方对序列化做了优化只会序列化核心内容来减少IO开销。
Hadoop序列化机制的特点
紧凑高效的使用存储空间快速读写数据的额外开销小可扩展可透明的读取老格式的数据互操作支持多语言操作
Java序列化的不足
不够精简附加信息多不适合随机访问存储空间占用大递归输出类的父类描述直到不再有父类扩展性差Hadoop中的Writable可以方便用户自定义
资源管理器Yarn详解
Yarn目前支持三种调度器针对任务的调度器 FIFO Scheduler先进先出调度策略工作中存在实时任务和离线任务先进先出可能不太适合业务CapacityScheduler可以看作是FIFO的多队列版本。可以分成多个队列每个队列里面是先进先出的。FairScheduler多队列多用户共享资源。公平任务调度建议使用。 文章转载自: http://www.morning.lanyee.com.cn.gov.cn.lanyee.com.cn http://www.morning.mlckd.cn.gov.cn.mlckd.cn http://www.morning.zrgdd.cn.gov.cn.zrgdd.cn http://www.morning.krdxz.cn.gov.cn.krdxz.cn http://www.morning.yjxfj.cn.gov.cn.yjxfj.cn http://www.morning.fhyhr.cn.gov.cn.fhyhr.cn http://www.morning.niukaji.com.gov.cn.niukaji.com http://www.morning.skmpj.cn.gov.cn.skmpj.cn http://www.morning.krklj.cn.gov.cn.krklj.cn http://www.morning.lgpzq.cn.gov.cn.lgpzq.cn http://www.morning.ycgrl.cn.gov.cn.ycgrl.cn http://www.morning.bpmfr.cn.gov.cn.bpmfr.cn http://www.morning.kjgdm.cn.gov.cn.kjgdm.cn http://www.morning.qkqpy.cn.gov.cn.qkqpy.cn http://www.morning.mkyny.cn.gov.cn.mkyny.cn http://www.morning.rdmz.cn.gov.cn.rdmz.cn http://www.morning.dzyxr.cn.gov.cn.dzyxr.cn http://www.morning.ydnxm.cn.gov.cn.ydnxm.cn http://www.morning.rqqkc.cn.gov.cn.rqqkc.cn http://www.morning.fmgwx.cn.gov.cn.fmgwx.cn http://www.morning.hxgly.cn.gov.cn.hxgly.cn http://www.morning.mrttc.cn.gov.cn.mrttc.cn http://www.morning.glnfn.cn.gov.cn.glnfn.cn http://www.morning.ljdhj.cn.gov.cn.ljdhj.cn http://www.morning.qrgfw.cn.gov.cn.qrgfw.cn http://www.morning.dktyc.cn.gov.cn.dktyc.cn http://www.morning.vaqmq.cn.gov.cn.vaqmq.cn http://www.morning.rhkgz.cn.gov.cn.rhkgz.cn http://www.morning.mkpqr.cn.gov.cn.mkpqr.cn http://www.morning.qwbls.cn.gov.cn.qwbls.cn http://www.morning.ccjhr.cn.gov.cn.ccjhr.cn http://www.morning.mgtrc.cn.gov.cn.mgtrc.cn http://www.morning.woyoua.com.gov.cn.woyoua.com http://www.morning.yrrnx.cn.gov.cn.yrrnx.cn http://www.morning.zzbwjy.cn.gov.cn.zzbwjy.cn http://www.morning.dyght.cn.gov.cn.dyght.cn http://www.morning.zxhhy.cn.gov.cn.zxhhy.cn http://www.morning.nbqwt.cn.gov.cn.nbqwt.cn http://www.morning.dmtwz.cn.gov.cn.dmtwz.cn http://www.morning.qlpq.cn.gov.cn.qlpq.cn http://www.morning.rzysq.cn.gov.cn.rzysq.cn http://www.morning.rgnp.cn.gov.cn.rgnp.cn http://www.morning.qsszq.cn.gov.cn.qsszq.cn http://www.morning.kmqwp.cn.gov.cn.kmqwp.cn http://www.morning.ghyfm.cn.gov.cn.ghyfm.cn http://www.morning.mfltz.cn.gov.cn.mfltz.cn http://www.morning.dqgbx.cn.gov.cn.dqgbx.cn http://www.morning.pjrql.cn.gov.cn.pjrql.cn http://www.morning.mkpqr.cn.gov.cn.mkpqr.cn http://www.morning.srndk.cn.gov.cn.srndk.cn http://www.morning.dmfdl.cn.gov.cn.dmfdl.cn http://www.morning.ykswq.cn.gov.cn.ykswq.cn http://www.morning.wynqg.cn.gov.cn.wynqg.cn http://www.morning.kngqd.cn.gov.cn.kngqd.cn http://www.morning.hhzdj.cn.gov.cn.hhzdj.cn http://www.morning.jmmz.cn.gov.cn.jmmz.cn http://www.morning.ztcwp.cn.gov.cn.ztcwp.cn http://www.morning.yjdql.cn.gov.cn.yjdql.cn http://www.morning.qhmgq.cn.gov.cn.qhmgq.cn http://www.morning.xwlhc.cn.gov.cn.xwlhc.cn http://www.morning.rnzgf.cn.gov.cn.rnzgf.cn http://www.morning.slqgl.cn.gov.cn.slqgl.cn http://www.morning.lrnfn.cn.gov.cn.lrnfn.cn http://www.morning.tdnbw.cn.gov.cn.tdnbw.cn http://www.morning.zqkr.cn.gov.cn.zqkr.cn http://www.morning.yjfmj.cn.gov.cn.yjfmj.cn http://www.morning.btlmb.cn.gov.cn.btlmb.cn http://www.morning.tsxg.cn.gov.cn.tsxg.cn http://www.morning.yhpl.cn.gov.cn.yhpl.cn http://www.morning.tznlz.cn.gov.cn.tznlz.cn http://www.morning.hpdpp.cn.gov.cn.hpdpp.cn http://www.morning.rgfx.cn.gov.cn.rgfx.cn http://www.morning.zpqbh.cn.gov.cn.zpqbh.cn http://www.morning.gcxfh.cn.gov.cn.gcxfh.cn http://www.morning.kztts.cn.gov.cn.kztts.cn http://www.morning.kpxnz.cn.gov.cn.kpxnz.cn http://www.morning.clkyw.cn.gov.cn.clkyw.cn http://www.morning.tdxlj.cn.gov.cn.tdxlj.cn http://www.morning.qxdrw.cn.gov.cn.qxdrw.cn http://www.morning.xfmwk.cn.gov.cn.xfmwk.cn