python訪問elasticsearch_關於spark dataframe資料匯入elasticsearch

阿新 • • 發佈：2021-01-08

首先elasticsearch是什麼？

elasticsearch 的功能如其名字，是彈性搜尋資料庫。簡單來說就是一個搜尋引擎，你可以把它理解為一個小百度。spark是大資料框架，或者說工具。當然elasticsearch也是支援分散式的。

我們有很多格式化的資料是存在分散式的叢集上也就是hdfs上的，用spark能夠將這些資料匯出出來。

elasticsearch 5以後有了一個工具包elasticsearch-hadoop 支援hadoop與elasticsearch的資料互動。如果不用這個工具的話，就得通過elasticsearch自己的介面或者http請求來進行資料的轉存。相對麻煩，也不夠優雅。

有了elasticsearch-hadoop以後，就能夠很方便的將資料從hadoop匯入到elasticsearch上。

為了完成這一目標，首先我們需要

elasticsearch-hadoop-6.1.1.zip這個包,下載地址在這裡

對大資料的即時洞察 | Elasticwww.elastic.co

如果使用jupyter的話，需要加上這個配置

import os
# set environment variable PYSPARK_SUBMIT_ARGS
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars elasticsearch-hadoop-6.1.1/dist/elasticsearch-spark-20_2.11-6.1.1.jar pyspark-shell'

接下來，我們需要配置好spark調取dataframe的工具，也就是sparksession

from pyspark.sql import SparkSession
sparkSession = SparkSession 
    .builder 
    .appName("es_books") 
    .master("local") 
    .enableHiveSupport() 
    .getOrCreate()

接下來就是提數啦

df = sparkSession.sql()

然後我們需要把df轉換成為elasticsearch適合的儲存格式

import json
def format_data(row):
    return (row['id'], row['content'])
transed_data = df.rdd.map(format_data)

當然es配置也是必要的

es_write_conf = {

# specify the node that we are sending data to (this should be the master)
"es.nodes" : '127.0.0.1',

# specify the port in case it is not the default port
"es.port" : '9200',

# specify a resource in the form 'index/doc-type'
"es.resource" : 'index/doc_type',

# is the input JSON?
"es.input.json" : "yes",

# is there a field in the mapping that should be used to specify the ES document ID
"es.mapping.id": "id"

}

然後最重要的步驟來了

rdd_transed.saveAsNewAPIHadoopFile(
path='-',
outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_write_conf)

然後，你就可以用python訪問es啦

from elasticsearch import Elasticsearch

es = Elasticsearch(["127.0.0.1"])
body = '{"query" : {"match" : {"content" : "hello es"} }}'
# 獲取索引為my_index,文件型別為test_type的所有資料,result為一個字典型別
result = es.search(index="index",doc_type="doc_type", body=body)

本文假定你已經在es裡面設定過了相應type的mapping。

python訪問elasticsearch_關於spark dataframe資料匯入elasticsearch

技術標籤：python訪問elasticsearch 首先elasticsearch是什麼？ elasticsearch 的功能如其名字，是彈性搜尋資料庫。簡單來說就是一個搜尋引擎，你可以把它理解為一個小百度。spark是大資料框架，或者說工具。當

Python如何把Spark資料寫入ElasticSearch

這裡以將Apache的日誌寫入到ElasticSearch為例，來演示一下如何使用Python將Spark資料匯入到ES中。

Python訪問MongoDB,並且轉換成Dataframe的方法

如下所示： #!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2018/7/13 11:10 # @Author : baoshan

用python簡單實現mysql資料同步到ElasticSearch的教程

之前部落格有用logstash-input-jdbc同步mysql資料到ElasticSearch，但是由於同步時間最少是一分鐘一次，無法滿足線上業務，所以只能自己實現一個，但是時間比較緊，所以簡單實現一個

使用python將excel資料匯入資料庫過程詳解

因為需要對資料處理，將excel資料匯入到資料庫，記錄一下過程。使用到的庫：xlrd 和 pymysql （如果需要寫到excel可以使用xlwt）

獲取python執行輸出的資料並解析存為dataFrame例項

在學習xg的時候，想畫學習曲線，但無奈沒有沒有這個 evals_result_ AttributeError: \'Booster\' object has no attribute \'evals_result_\'

007.PGSQL-python讀取txt檔案，將資料轉化為dataFrame,dataFrame資料插入到pgsql; dataframe去掉索引，指定列為索引；python讀取pgsql資料,讀取資料庫表導成excel

python讀取txt檔案，將資料轉化為dataFrame,dataFrame資料插入到pgsql 1.pd.io.sql.to_sql(dataframe,\'table_name\',con=conn,schema=\'w_analysis\',if_exists=\'append\')

Python 刷db資料入ElasticSearch

歡迎關注【無量測試之道】公眾號，回覆【領取資源】,Python程式設計學習資源乾貨、Python+Appium框架APP的UI自動化、Python+Selenium框架Web的UI自動化、Python+Unittest框架API自動化、

Python小白的報錯之路（一）資料匯入：

Python小白的報錯之路（一）資料匯入：這裡寫自定義目錄標題 #使用pandas庫匯入資料報錯Initializing from file failed 時

ALINK(十六)：資料匯入與匯出 (七)與 Dataframe 互操作

https://github.com/alibaba/Alink/blob/master/docs/pyalink/pyalink-dataframe.md 與 Dataframe 互操作

python資料分析-DataFrame資料框常用基本知識（列、行、切片、計算等code）

Python資料分析博文彙總 Pandas重複值處理函式drop_duplicates() Pandas資料庫缺失值處理函式dropna

Python DataFrame資料針對內容的索引操作總結

在平時學習和建模過程中，處理Excel文件是非常常見的工作，而Excel在Python眼中就是一個簡單的datafram型別的資料

Python實現將文字資料批量匯入Excel表格，並按格式儲存

本文實現用Python將文字檔案自動儲存到Excel表格裡面去。需求將錦江區.txt 檔案中的資料整理到錦江區.xlsx 的錦江區 sheet ；

Excel表資料匯入Oracle資料表

01.第一步先desc emp;檢視emp表的結構，看有什麼欄位和什麼資料型別。新建一個excel表，表頭填入一樣的欄位名，表頭下面的資料格式要和emp表的資料型別一致。

探索ElasticSearch-無任何索引資料的ElasticSearch狀態（八）

前言之前做了一些簡單的ElasticSearch的基準測試，但是現在看來還是有兩個方面的缺點。一個是不夠全面，只是簡單測試了下3種執行緒場景，另外一個是可能機器環境，感覺一直沒有壓上去。之後打算重新搞一下基準測試。

淺談Python訪問MySQL的正確姿勢

Py2 時代，訪問 MySQL 資料庫的模組除了 PyMySQL 和 MySQL-python 之外，還有以速度見長的 Umysql，以及非常小眾的 Oursql 模組。進入了 Py3 時代之後，PyMySQL 與時俱進，順利升級到 Py3 版本， MySQL-python 則被它

Navicat把csv資料匯入mysql

本文為大家分享瞭如何用Navicat把csv資料匯入mysql，供大家參考，具體內容如下

Python實現讀取SQLServer資料並插入到MongoDB資料庫的方法示例

本文例項講述了Python實現讀取SQLServer資料並插入到MongoDB資料庫的方法。分享給大家供大家參考，具體如下：

分析Mysql大量資料匯入遇到的問題以及解決方案

在專案中，經常會碰到往資料庫中匯入大量資料，以便利用sql進行資料分析。在匯入資料的過程中會碰到一些需要解決的問題，這裡結合匯入一個大約4G的txt資料的實踐，把碰到的問題以及解決方法展現出來，一方面自己做個

Python實現生成隨機資料插入mysql資料庫的方法

本文例項講述了Python實現生成隨機資料插入mysql資料庫的方法。分享給大家供大家參考，具體如下：

python訪問elasticsearch_關於spark dataframe資料匯入elasticsearch

相關推薦