大資料SQL互動查詢 presto/spark/mapreduce 計算引擎對比
阿新 • • 發佈:2019-01-01
presto/spark/mapreduce 計算引擎對比
對比的表結構為146列, 15920816 行資料,資料壓縮前的大小15G。
對於執行語句的效率,單位秒
TextFile格式
執行的SQL | presto | spark | mr |
SELECT COUNT(*) FROM tmp.mb_crm1 | 5 | 9.264 | 21.711 |
SELECT sum(lately_land_btw) FROM tmp.mb_crm1; | 7 | 17.23 | 25.781 |
SELECT sum(cast(lately_land_btw as bigint)) num,mb_name FROM tmp.mb_crm1 where age>=25 group by mb_name order by num desc | 8 | 20.265 | 128.811 |
Parquet格式
執行的SQL | presto | spark | mr |
SELECT COUNT(*) FROM tmp.mb_crm1 | 1 | 5.255 | 24.142 |
SELECT sum(lately_land_btw) FROM tmp.mb_crm1; | 1 | 3.181 | 42.893 |
SELECT sum(cast(lately_land_btw as bigint)) num,mb_name FROM tmp.mb_crm1 where age>=25 group by mb_name order by num desc | 3 | 11.486 | 66.903 |
可看出presto優勢明顯,spark次之,mr 最慢。
使用列式儲存後,presto提速明顯。