1. 程式人生 > >scala jdbc遠端訪問hive資料倉庫

scala jdbc遠端訪問hive資料倉庫

需求:

        通過簡單的Scala程式碼遠端連線Hive,查詢Hive表資料並將資料轉存到本地。另外,用Scala查詢到資料後,我們還可以將查詢到的ResultSet集合轉化為RDD或者DataFrame進行scala的運算元運算 

第一步:啟動伺服器以及需要的服務(hiveserver2)遠端連線埠預設配置為10000

hive  --service hiveserver2 10000

第二步:建立maven專案匯入pom.xml依賴

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.cyy.sparkSql</groupId>
    <artifactId>sparkSqlTest</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <java.version>1.8</java.version>
        <spark.version>2.1.0</spark.version>
    </properties>

    <dependencies>
        <!-- spark-->
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.1.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.0</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.38</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.1.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>1.1.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.json/json -->
        <dependency>
            <groupId>org.json</groupId>
            <artifactId>json</artifactId>
            <version>20160810</version>
        </dependency>

    </dependencies>

</project>

第三步:程式碼實現:

import java.io.{File, PrintWriter}
import java.sql.{Connection, DriverManager, ResultSet, Statement}

import org.json.{JSONArray, JSONObject}

object sparkSqlTest {


  def main(args: Array[String]): Unit = {

    val driverName:String = "org.apache.hive.jdbc.HiveDriver"
    try {
      Class.forName(driverName)
    } catch{
      case e: ClassNotFoundException =>
        println("Missing Class",e)
    }

    val con:Connection =DriverManager.getConnection("jdbc:hive2://tdxy-bigdata-04:10000/p2p")
    val stmt:Statement = con.createStatement()
    val res:ResultSet = stmt.executeQuery("show tables")
    System.out.println("------"+res)

    while (res.next()) {
      System.out.println(res.getString(1))
    }

    val jsonArray:JSONArray=new JSONArray()
    val rs:ResultSet = stmt.executeQuery("select * from p2p_order_info_user")

    println(rs.getMetaData.getColumnCount)
    println(rs.getMetaData.getColumnName(2))
    val writer = new PrintWriter(new File("src/"+"order_table.txt"))

    writer.print("[")
    while (rs.next()) {
//      System.out.println(rs.getObject(4))
      val jsonObject:JSONObject = new JSONObject()
      for (i <- 1 to rs.getMetaData.getColumnCount) {
        if (rs.getObject(i) != null) {
        println(rs.getObject(i))
        jsonObject.put(rs.getMetaData.getColumnName(i).split("\\.")(1),rs.getString(i))
        }else{
          jsonObject.put(rs.getMetaData.getColumnName(i).split("\\.")(1),"null")
        }
      }
      writer.println(jsonObject+",")
      println("物件串"+jsonObject.toString)
      jsonArray.put(jsonObject)
    }

    writer.println("]")
    writer.close()
  }

}

執行結果: