1. 程式人生 > >第2章 Spark及其生態圈概述

第2章 Spark及其生態圈概述

2-1課程目錄

1、Spark及生態圈概述

Spark產生背景 Spark 概述及特點

Spark發展歷史 Spark Survey

Spark對比Hadoop Spark和Hadoop的協作性

Spark開發語言 Spark執行模式

2-2 -Spark概述及特點

官網:https://spark.apache.org/

1、概述

Apache Spark™ is a unified analytics engine for large-scale data processing.

Apache Spad是大規模資料處理的統一分析引擎。

2、特點

1、Speed(快速)

Run workloads 100x faster.

Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.

2、Ease of Use(使用方便)

Write applications quickly in Java, Scala, Python, R, and SQL.

Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells.

3、Generality(通用)

Combine SQL, streaming, and complex analytics.

Spark powers a stack of libraries including SQL and DataFramesMLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

4、Runs Everywhere(可以執行在任何處)

Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.

You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFSApache CassandraApache HBaseApache Hive, and hundreds of other data sources.

 

2-3 -Spark產生背景

MapReduce的侷限性

1)程式碼繁瑣

2)只能支援map和reduce方法

3)執行效率低下

4)不適合迭代多次,互動式,流式處理

框架多元化

1)批處理(離線):MapReduce、HIve、Pig

2)流式處理(實時):Storm、JStorm

3)互動式計算:Impala

學習、運維成本無形中提高了很多

===》Spark

 

2-4 -Spark發展歷史

 

2-5 -Spark Survey

 

 

2-6 -Spark對比Hadoop

 

 

2-7 -Spark和Hadoop的協作性