1. 程式人生 > >Copy Data From S3 to HDFS in EMR

To troubleshoot problems with S3DistCp, check the step and task logs.

Step logs:

1.    Open the Amazon EMR console, and then choose Clusters.

2.    Choose the Amazon EMR cluster from the list, and then choose Steps.

3.    In the Log files

column, choose the appropriate step log:

  • controller: Information about the processing of the step. If your step fails while loading, you can find the stack trace in this log.
  • syslog: Describes the execution of Hadoop jobs in the step.
  • stderr: The standard error channel of Hadoop while it processes the step.
  • stdout: The standard output channel of Hadoop while it processes the step.

If you can't find the root cause of the failure in the step logs, check the S3DistCp task logs:

1.    Open the Amazon EMR console, and then choose Clusters.

2.    Choose the Amazon EMR cluster from the list, and then choose Steps


3.    In the Log files column, choose View jobs.

4.    In the Actions column, choose View tasks.

5.    If there are failed tasks, choose View attempts to see the task logs.

Common errors

Reducer task fails due to insufficient memory:

If you see an error message similar to the following in the step's stderr log, the S3DistCp job failed because there wasn't enough memory to process the reducer tasks:


