Cloudera Hadoop管理員（CCAH）＆開發者（CCA）認證大綱

阿新 • • 發佈：2019-01-27

Cloudera Certified Administrator forApache Hadoop (CCA-500)

Number of Questions: 60 questions
Time Limit: 90 minutes
Passing Score: 70%
Language: English, Japanese
Exam Sections and Blueprint

HDFS (17%)
- Describe the function of HDFS daemons
- Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing
- Identify current features of computing systems that motivate a system like Apache Hadoop
- Classify major goals of HDFS Design
- Given a scenario, identify appropriate use case for HDFS Federation
- Identify components and daemon of an HDFS HA-Quorum cluster
- Analyze the role of HDFS security (Kerberos)
- Determine the best data serialization choice for a given scenario
- Describe file read and write paths
- Identify the commands to manipulate files in the Hadoop File System Shell
YARN and MapReduce version 2 (MRv2)(17%)
- Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings
- Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons
- Understand basic design strategy for MapReduce v2 (MRv2)
- Determine how YARN handles resource allocations
- Identify the workflow of MapReduce job running on YARN
- Determine which files you must change and how in order to migrate a cluster from MapReduce - version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN
Hadoop Cluster Planning (16%)
- Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster
- Analyze the choices in selecting an OS
- Understand kernel tuning and disk swapping
- Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
- Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA
- Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
- Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
- Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario
Hadoop Cluster Installation andAdministration (25%)
- Given a scenario, identify how the cluster will handle disk and machine failures
- Analyze a logging configuration and logging configuration file format
- Understand the basics of Hadoop metrics and cluster health monitoring
- Identify the function and purpose of available tools for cluster monitoring
- Be able to install all the ecoystme components in CDH 5, including (but not limited to): Impala, - Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, and Pig
- Identify the function and purpose of available tools for managing the Apache Hadoop file system
Resource Management (10%)
- Understand the overall design goals of each of Hadoop schedulers
- Given a scenario, determine how the FIFO Scheduler allocates cluster resources
- Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
- Given a scenario, determine how the Capacity Scheduler allocates cluster resources
  1. Monitoring and Logging (15%)
- Understand the functions and features of Hadoop’s metric collection abilities
- Analyze the NameNode and JobTracker Web UIs
- Understand how to monitor cluster daemons
- Identify and monitor CPU usage on master nodes
- Describe how to monitor swap and memory allocation on all nodes
- Identify how to view and manage Hadoop’s log files
- Interpret a log files

CCA Spark and Hadoop Developer Exam(CCA175)

Number of Questions: 10–12performance-based (hands-on) tasks on CDH5 cluster. See below for full clusterconfiguration
Time Limit: 120 minutes
Passing Score: 70%
Language: English, Japanese (forthcoming)
Required Skills
Data Ingest
The skills to transfer data between external systemsand your cluster. This includes the following:

Import data from a MySQL database into HDFS using Sqoop
Export data to a MySQL database from HDFS using Sqoop
Change the delimiter and file format of data during import using Sqoop
Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
Load data into and out of HDFS using the Hadoop File System (FS) commands

Transform, Stage, Store
Convert a set of data values in a given format storedin HDFS into new data values and/or a new data format and write them into HDFS.This includes writing Spark applications in both Scala and Python:

Load data from HDFS and store results back to HDFS using Spark
Join disparate datasets together using Spark
Calculate aggregate statistics (e.g., average or sum) using Spark
Filter data into a smaller dataset using Spark
Write a query that produces ranked or sorted data using Spark

Data Analysis
Use Data Definition Language (DDL) to create tables inthe Hive metastore for use by Hive and Impala.

Read and/or create a table in the Hive metastore in a given schema
Extract an Avro schema from a set of datafiles using avro-tools
Create a table in the Hive metastore using the Avro file format and an external schema file
Improve query performance by creating partitioned tables in the Hive metastore
Evolve an Avro schema by changing JSON files

Cloudera Hadoop管理員（CCAH）＆開發者（CCA）認證大綱

Cloudera Certified Administrator forApache Hadoop (CCA-500) Number of Questions: 60 questions Time Limit: 90 minutes Passing Sco

快速入門CDH（Cloudera Hadoop）叢集的搭建

在此說明，本人用的是普通使用者（如hadoop）一、建立三臺虛擬機器如何建立在本人部落格裡有詳細介紹，這裡就不詳細介紹了。二、修改IP，以及主機名等 1、修改IP： sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0

Hadoop單點部署與案例開發（微博用戶數據分析）

環境搭建 hadoop 數據分析微博用戶一、環境搭建1、Hadoop運行環境搭建1.1 安裝虛擬機（1）下載並安裝VMware虛擬機軟件。（2）創建虛擬機，實驗環境虛擬機配置如下圖所示。（3）安裝Ubuntu系統，安裝結果如下圖所示。1.2 配置JDK環境下載並安裝JDK，安裝結束後需對

配置MapReduce插件時，彈窗報錯org/apache/hadoop/eclipse/preferences/MapReducePreferencePage : Unsupported major.minor version 51.0（Hadoop2.7.3集群部署）

ava 不一致 nbsp 1.0 log class dll blog 無效原因： hadoop-eclipse-plugin-2.7.3.jar 編譯的jdk版本和eclipse啟動使用的jdk版本不一致導致。解決方案一：修改myeclipse.ini文件

Hadoop學習筆記—15.HBase框架學習（基礎知識篇）

dfs hdfs keep 負載均衡包含兩個列族文件存儲 version HBase是Apache Hadoop的數據庫，能夠對大型數據提供隨機、實時的讀寫訪問。HBase的目標是存儲並處理大型的數據。HBase是一個開源的，分布式的，多版本的，面向列的存儲模型，它

Hadoop環境 IDE配置（在eclipse中安裝hadoop-eclipse-plugin-2.7.3.jar插件）

map bubuko other 9.png 查看 3.2 div 集群點擊一、hadoop-eclipse-plugin-2.7.3.jar插件下載點擊下載二、把插件放到eclipse的安裝目錄dropins下三、eclipse上的配置 3.1 打開Wind

CentOS6.5下Cloudera安裝搭建部署大數據集群（圖文分五大步詳解）（博主強烈推薦）

centos6 數據 http 時間 log .com pos OS 客戶端　第一步： Cloudera Manager安裝之Cloudera Manager安裝前準備(CentOS6.5）（一）第二步： Cloudera Manager安裝之時間服務器和

Hadoop生態圈-phoenix的視圖（view）管理

oop 視圖聲明版權作品管理原創 view 創作　　　　　　　　　　　　　　　　　　　　　　Hadoop生態圈-phoenix的視圖（view）管理　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　作者：尹正傑版權

Libgdx Developer's Guide(Libgdx開發者手冊)-2（專案建立、執行和除錯）

專案建立由於它的Android 和HTML5/GWT後臺，libgdx 與Eclipse緊密結合。本章討論怎樣在Eclipse中建立一個libgdx專案。如果你使用的是IntelliJ, 請檢視guide to use libgdx with IntelliJ IDEA 。如果你更喜歡使用其他

本機號碼認證黑科技：極光（JG）開發者服務推出“極光認證”新產品

近日，中國領先的大資料服務商極光（JG）推出全新產品——極光認證JVerification。極光認證是極光針對APP使用者註冊登陸，二次安全驗證等身份驗證場景打造的一款本機號碼認證SDK，驗證使用者提供的號碼是否與本機SIM卡號碼一致。既能提升使用者體驗從而提升運營效率，又可以提升使用者身份驗證過程的安全性，

Hadoop中split數量計演算法則（原始碼跟蹤）

　　從前面的文章（MapReduce執行原理【原始碼跟蹤】）我們知道計算切片的部分在JobSubmitter類中，然後我們看此類的Structure（在idea中View->Tool Windows ->Structure）檢視類結構我們很輕易的就能找到有關split的方法

7.Hadoop的學習（Hadoop的配置（偽分散式的搭建）-3(啟動守護程序)）

1.經過前面的兩節，我們就可以使用HDFS檔案系統了 2.首先要對檔案系統進行格式化：執行格式化的命令： hadoop namenode -format 3. 啟動守護程序進入到

《Hadoop Yarn權威指南》學習筆記（一）——Yarn架構

1 ResourceManager元件 1.1 客戶端和ResourceManager互動使用者和平臺第一互動點為客戶端和ResourceManager的互動，涉及以下元件 1.1.1 Client Service 該元件處理所有客戶端到ResourceManager的遠端過程呼叫

《Hadoop Yarn權威指南》學習筆記（零）——Yarn核心概念

本文是我讀《Hadoop Yarn權威指南》的筆記，文字部分是書上的內容摘錄，如有誤歡迎指出 yarn的架構圖如下 1 ResourceManager 為系統中所有應用分配資源。有一個可插拔的排程器Scheduler，負責為執行中的各種應用分配資源，使用一個叫Con

如何成為一個Linux核心開發者（經典）

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

hadoop備戰：yarn框架的搭建（mapreduce2）

author welcome start profile handler prope indent 好用機器名昨天沒有寫好了沒有更新。今天一起更新，yarn框架也是剛搭建好的。我

Hadoop-2.9.2單機版安裝（偽分布式模式）

eba root 默認 core reduce 配置文件 dsa 火墻 info 一、環境硬件：虛擬機VMware、win7 操作系統：Centos-7 64位主機名： hadoopServerOne 安裝用戶：root軟件：jdk1.8.0_181、Hadoop-2

Hadoop-2.9.2單機版安裝（偽分散式模式）

一、環境硬體：虛擬機器VMware、win7 作業系統：Centos-7 64位主機名： hadoopServerOne 安裝使用者：root軟體：jdk1.8.0_181、Hadoop-2.9.2 二、安裝jdk 1.建立hadoop目錄，並賦許可權chmod 777 hadoop2.下載jdk-

hadoop叢集執行jar包報錯（eclipse導jar）

報錯日誌： Exception in thread "main" java.lang.UnsupportedClassVersionError: com/hdfs/wordcount/WordcountDriver has been compiled by a more recent v

hadoop 2.8.0在centos7 搭建（初入大資料）

第一步： ①準備三臺虛擬機器使用 VMware Workstation工具配置好使用工具CRT 連線 ②更改主機名 192.168.220.128 s1 192.168.220.128 s2 192.

Cloudera Hadoop管理員（CCAH）＆開發者（CCA）認證大綱

Cloudera Certified Administrator forApache Hadoop (CCA-500)

CCA Spark and Hadoop Developer Exam(CCA175)

相關推薦