Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop
Deep dive into the new Tungsten execution engine
- https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets(中英雙語)
What’s new for Spark SQL in Apache Spark 1.3(中英雙語)
Introducing Apache Spark Datasets(中英雙語)
Introducing DataFrames in Apache Spark for Large Scale Data Science(中英雙語)
Deep Dive into Spark SQL’s Catalyst Optimizer(中英雙語)
Spark 論文篇-Spark:工作組上的叢集計算的框架(中英雙語)
Livy : A REST Interface for Apache Spark
官網:http://livy.incubator.apache.org/ 概述: 當前spark上的管控平臺有spark job server,zeppelin,由於spark job server和zeppelin都存在一些缺陷,比如spark job se
Why Apache Spark is a Crossover Hit for Data Scientists [FWD]
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as operational, analytics. Data science is a broad church. I a
Winning a Kaggle competition with Apache Spark and SparkML Machine Learning Pipelines
IBM Chief Data Scientist Romeo Kienzler demonstrates how to use the new DataFrames-based SparkML pipelines (with data from a recent Kaggle competition on
