1. 程式人生 > >Troubleshoot Disk Space Issues with EMR Core Nodes

Troubleshoot Disk Space Issues with EMR Core Nodes

Check for these common causes of disk space use on the core node:

Local and temp files from the Spark application

When you run Spark jobs, Spark applications create local files that can consume the rest of the disk space on the core node. Check the size of the following directories on the core node:

  • <local-dir>/filecache
  • <local-dir>/usercache//filecache
  • <local-dir>/usercache//appcache/<app-id>/

Note: <local-dir> is specified by the yarn.nodemanager.local-dirs property in the /etc/hadoop/conf/yarn-site.xml file.

If local files are consuming the rest of the disk space, scale your cluster. For more information, see Scaling Cluster Resources.

Note: If the number of Spark executors does not scale up as expected, increase the storage capacity of the Amazon Elastic Block Store (Amazon EBS) volumes that are attached to the core node. Optionally, add more Amazon EBS volumes to the core node.

Spark application logs and job history files

When you run Spark jobs, Spark creates application logs and job history files on the HDFS. These logs can consume the rest of the disk space on the core node. To resolve this problem, check the directories where the logs are stored and change the retention parameters, if necessary.

Spark application logs, which are the YARN container logs for your Spark jobs, are located in /var/log/hadoop-yarn/apps on the core node. These logs are moved to HDFS when the application is finished running. By default, YARN keeps application logs on HDFS for 48 hours. Perform the following steps to reduce the retention period.

  1. Open the /etc/hadoop/conf/yarn-site.xml file on each node in your Amazon EMR cluster (master, core, and task nodes).
  2. Reduce the value of the yarn.log-aggregation.retain-seconds property on all nodes.

Note: After the application logs are copied to HDFS, they remain on the local disk so that Log Pusher can push the logs to Amazon Simple Storage Service (Amazon S3). The default retention period is four hours. To reduce the retention period, modify the /etc/logpusher/hadoop.config file.

Spark job history files are located in /var/log/spark/apps on the core node. When the filesystem cleaner runs, Spark deletes job history files that are older than seven days.

To reduce the default retention period, perform the following steps:

  1. Open the /etc/spark/conf/spark-defaults.conf file on the master node.
  2. Reduce the value of the spark.history.fs.cleaner.maxAge property.

By default, the filesystem history cleaner runs once a day. The frequency is specified in the spark.history.fs.cleaner.interval property. For more information, see Monitoring and Instrumentation.

相關推薦

Troubleshoot Disk Space Issues with EMR Core Nodes

Check for these common causes of disk space use on the core node: Local and temp files from the Spark application When yo

Use Logs to Troubleshoot Issues with Hive Queries in Amazon EMR

$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/ PRE containers/

Troubleshoot Issues with CloudHSM Classic using Logs

Collect syslogs from your CloudHSM appliance The HSM appliance generates logs that can be exported via syslog. Syslogs can be used t

Troubleshoot SMTP Connectivity or Timeout Issues with Amazon SES

2.    Note the output. 3.    If the connection times out, check your local firewall rules, routes, and access control lists (ACLs).

Troubleshoot Issues with VPC Route Tables

To identify the source of the issue, check the route tables of the subnets with the resources that are impacted. Public subnets

Troubleshoot Cluster Launch Issues after Amazon EMR Release Version Upgrade

<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://<HOSTNAME OF YOUR EXTERNAL METASTO

Troubleshoot Issues With Amazon VPC Interface Endpoints

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

Troubleshoot Issues with CloudFront Caching Times

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

Troubleshoot Issues with SES Publishing Data to Kinesis Firehose

Here are some reasons why Amazon SES might not publish data to Amazon Kinesis Firehose: The delivery stream was deleted S

Troubleshoot Issues with Latency

Note: Latency between hosts on the internet can change over time due to changes in network connectivity and routing. Clients with

LLVM每日談之十九 LLVM的第一本系統的書&lt;Getting Started with LLVM Core Libraries&gt;

關於 日本 簡單的 lvm 作者 普通 lan 最好 裏的 作者:史寧寧(snsn1984)LLVM最終有了一本系統的書了——《Getting Started with LLVM Core Libraries》。這本書號稱是LLVM的第一本書,可是據說日本早就有兩本日文的

Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and

聯合字典學習的目標函式: 和是來自兩個不同領域的n個無標籤資料對,維度分別為。  表示字典學習的能量項,它典型地關於資料重構誤差。聯合能量項調整觀察到的字典和,或結果係數和之間的關係。注意分別是的字典原子的數目。 在我們的工作中,我們考慮的稀疏表示的公式,因為它已被證明在

《詳解比特幣白皮書》-Reclaiming Disk Space (回收硬碟空間)

Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space. 一旦一枚幣最近的交

Command: Linux: disk space

Command: Linux: disk space [[email protected] ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 40G 2.2G

[Python] Find free disk space

If you need the device name and mount point associated with the file, you should call an external program to get this information. df will provide all the

Classic Rubik's cube with robotic core creepily completes its own puzzle in just 30 seconds

A scientist has built a Rubik's cube that can solve itself. The toy, which has a 3D-printed robotic core, creepily completes its own iconic puzzle in just

What disk space usage software are you using?

What are you using for viewing and managing disk space usage? I'm developing diskover file system crawler ( https://shirosaidev.github.io/diskover ) and wa

Navigant Research Publishes Report on Key Issues with Autonomous Vehicle Data

The autonomous systems piloting self-driving cars will make decisions that have life-and-death consequences. There is little room for error in their abilit

The 4 Main Hurdles Holding Humanity Back From Space Colonization with Eric Ward Artificial intelligence Latest Technology Ne

Eric Ward is the co-founder and CEO of both Odyne Space and Aten Engineering, two space tech startups with a ton of promise for the future. Eric is an expe

Resolve Intermittent Connection Issues With NAT Instances

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So