Spark Tutorial – Learn Spark Programming
Introduction to Spark Programming
That is Spark? Spark Programming is nothing but a general-purpose & lightning fast cluster computing platform. That reveals development API’s, which also qualifies data workers to author streaming,單詞machine learning or SQL workloads which demand repeated access to data sets. However, Spark can perform
Also, it is designed in such a way that it integrates with all the Big data tools. Like spark can access any
One more common belief about Spark is that it is an extension of Hadoop. Although that is not true. However, Spark is independent of Hadoop since it has its own
Although, there is one spark’s key feature that it has in-memory cluster computation capability. Also increases the processing speed of an application.
Basically, Apache Spark offers high-level APIs to users, such as Java, Scala, Python and R., Spark is written in Scala still offers rich APIs in Scala, Java, Python, as well as R. We can say, it is a tool for running lighting applications.
Most importantly, on comparing Spark with Hadoop, It is 100 times faster faster than Big Data Hadoop and 10 times faster than accessing data from disk.
Spark History
At first, in 2009 Apache Spark was introduced in the UC Berkeley R & D Lab. Which is now known as AMPLab. Afterwards , in 2010 it became open source under BSD license. Further, the spark was donated to Apache Software Foundation, in 2013 Then in 2014, it became top-level Apache project.
Why Spark?
As we know, there was no general purpose computing engine in the industry, since
- To perform batch processing, we were using Hadoop MapReduce.
- Also, to perform stream processing, we were using Apache Storm / S4.
- Moreover, for interactive processing, we were using Apache Impala / Apache Tez.
- To perform graph processing, we were using Neo4j / Apache Giraph.
There was was no powerful engine in the industry, that can process the data both in real-time and batch mode. Also, there was a requirement that one engine can respond in sub-second and perform in-memory processing.
Basic, these features create the difference between Hadoop and Spark. Also makes a huge comparison between Spark vs Storm.
Apache Spark Components
In this Apache Spark Tutorial, we discussed Spark Components. It puts the promise for faster data processing as well as easier development. It is only possible because of its components. All these Spark components resolved the issues that occurred while while using Hadoop MapReduce.
to learn more Spark Ecosystem Component
Spark Tutorial – Learn Spark Programming