问题:
spark-submit --master yarn --conf spark.default.parallelism=100 \
--deploy-mode cluster --driver-memory 4G --executor-memory 4G \
--num-executors 40 --executor-cores 2 \
--conf spark.yarn.executor.memoryOverhead=5g \
--class com.lz.hbase.CompanyInfo /tmp/test_langzi/original-spark_hbase01-1.0-SNAPSHOT.jar
以上提交参数中的–num-executors 40没有生效,executors 大于40并且占满yarn资源,导致后来的yarn任务阻塞
原因:
官方参数解释
–num-executors NUM Number of executors to launch (Default: 2). If dynamic allocation is enabled, the initial number of executors will be at least NUM.
当开启动态分配时,num-executors成为了最小executors 数,而cdh中spark默认开启dynamic allocation,所以当yarn队列资源空闲时,真正的excutor数会大于设置的num-executors
解决方案:
提交参数添加--conf spark.dynamicAllocation.maxExecutors=40 限制最大excutor数
附:spark提交任务模板
spark-submit --master yarn --conf spark.default.parallelism=100 \
--conf spark.dynamicAllocation.maxExecutors=40\
--deploy-mode cluster --driver-memory 4G --executor-memory 4G \
--num-executors 40 --executor-cores 3 \
--conf spark.yarn.executor.memoryOverhead=4G \
--class com.lz.hbase.CompanyInfo /tmp/test_langzi/original-spark_hbase01-1.0-SNAPSHOT.jar