1.YARN参数调优

检查项 当前值 修改值
JobHistory Server 的 Java 堆栈大小 1GB 2GB
NodeManager 的 Java 堆栈大小 1GB 2GB
ResourceManager 的 Java 堆栈大小 1GB 2GB
容器内存 yarn.nodemanager.resource.memory-mb 24GB 32GB
最小容器内存 yarn.scheduler.minimum-allocation-mb 10GB 8GB
最大容器内存 yarn.scheduler.maximum-allocation-mb 40GB 56GB
Map 任务内存 mapreduce.map.memory.mb 0M 12GB
Reduce 任务内存 mapreduce.reduce.memory.mb 0M 24GB
Application Master容器内存 yarn.app.mapreduce.am.resource.mb 24GB 32GB
Map 任务 Java 选项库 mapreduce.map.java.opts -Djava.net.preferIPv4Stack=true -Dmapreduce.map.java.opts=-Xmx2048m
Reduce 任务 Java 选项库 mapreduce.reduce.java.opts -Djava.net.preferIPv4Stack=true -Dmapreduce.reduce.java.opts=-Xmx2048m
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

yarn.scheduler.capacity.root.queues:

当前值:

<configuration>
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>100</value>
    </property>
</configuration>

修改值:

<configuration>
    <property>
        <name>yarn.scheduler.capacity.maximum-applications</name>
        <value>10000</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>0.1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,bigdata,analysis,prd</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>30</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.capacity</name>
        <value>30</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.capacity</name>
        <value>20</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.capacity</name>
        <value>20</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
        <value>3</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.user-limit-factor</name>
        <value>3</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.user-limit-factor</name>
        <value>3</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.user-limit-factor</name>
        <value>3</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>70</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.maximum-capacity</name>
        <value>90</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.maximum-capacity</name>
        <value>60</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.maximum-capacity</name>
        <value>90</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.state</name>
        <value>RUNNING</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.state</name>
        <value>RUNNING</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.state</name>
        <value>RUNNING</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.state</name>
        <value>RUNNING</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.acl_submit_applications</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.acl_submit_applications</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.acl_submit_applications</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.acl_administer_queue</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.acl_administer_queue</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.acl_administer_queue</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.acl_application_max_priority</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.acl_application_max_priority</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.acl_application_max_priority</name>
        <value>*</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.maximum-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.maximum-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.maximum-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.default-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.bigdata.default-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.analysis.default-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.prd.default-application-lifetime</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.node-locality-delay</name>
        <value>40</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
        <value>-1</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.queue-mappings</name>
        <value></value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
        <value>1</value>
    </property>
</configuration>  

修改完队列名称要记得刷新队列,执行此命令:yarn rmadmin -refreshQueues

2.spark参数调优

检查项 当前值 修改值
Java Heap Size of History Server in Bytes history_server_max_heapsize 512MB 2GB

3.kudu参数调优

检查项 当前值 修改值 备注
gflagfile 的 Master 高级配置代码段(安全阀) -max_num_columns=10000 -unlock_unsafe_flags=true 调整Kudu单表最大300列限制
Kudu Tablet Server Hard Memory Limit memory_limit_hard_bytes 4GB 32G
Kudu Tablet Server Block Cache Capacity block_cache_capacity_mb 512MB 4G

4.impala参数调优

检查项 当前值 修改值 备注
Catalog Server 的 Java 堆栈大小(字节) 50M 12GB
Impala Daemon 内存限制 mem_limit 256M 80GB
Impala Daemon 查询选项高级配置代码段(安全阀) default_query_options PARQUET_FALLBACK_SCHEMA_RESOLUTION=name 防止查询错列报错
Impala Daemon 命令行参数高级配置代码段(安全阀) -use_local_tz_for_unix_timestamp_conversions=true -convert_legacy_hive_parquet_utc_timestamps=true 启用本地时区 并且兼容hive parquet

5.HDFS参数调优

检查项 当前值 修改值 备注
hdfs-site.xml 的 HDFS 服务高级配置代码段(安全阀) dfs.client.datanode-restart.timeout 30 防止flink提交任务失败,出现以下错误:java.lang.NumberFormatException: For input string: “30s”