04 hbase提取kafka中的資料儲存

阿新 • • 發佈：2019-02-11

上一篇中的測試時是採用kafka消費者，如果把消費者換成hbase就可以實現hbase提取kafka中的資料進行儲存。

啟動hbase要先啟動hdfs，hbase需要zk

啟動hdfs：start-dfs.sh

啟動hbase：start-hbase.sh

要hbase高可用，需要在其他節點中啟動：hbase-daemon.sh start master

各節點程序：

建立hbase消費者：

在idea中需要引入hbase-site.xml以及hdfs-site.xml 檔案一樣配置檔案外部化：

kafka.properties:

zookeeper.connect=s128:2181,s129:2181,s130:2181
group.id=g4  //使用者組
zookeeper.session.timeout.ms=500
zookeeper.sync.time.ms=250
auto.commit.interval.ms=1000
auto.offset.reset=smallest
#主題
topic=calllog    //kafka中的topic
#表名
table.name=ns1:calllogs //hbase中資料表名
#分割槽數
partition.number=100
#主叫標記
caller.flag=0
#hash區域的模式
hashcode.pattern=00

建立HbaseDao類，訪問hbase，進行資料相關操作:

/**
 * Hbase資料訪問物件
 */
public class HbaseDao {
    //
    private DecimalFormat df = new DecimalFormat() ;

    private Table table = null ;

    private int partitions ;

    private String flag  ;
    public HbaseDao(){
        try {
            Configuration conf = HBaseConfiguration.create();
            Connection conn = ConnectionFactory.createConnection(conf);
            TableName name = TableName.valueOf(PropertiesUtil.getProp("table.name"));
            table = conn.getTable(name);

            df.applyPattern(PropertiesUtil.getProp("hashcode.pattern"));

            partitions = Integer.parseInt(PropertiesUtil.getProp("partition.number"));
            flag = PropertiesUtil.getProp("caller.flag") ;
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    /**
     * put資料到hbase
     */
    public void put(String log){
        if (log == null || log.equals("")) {
            return;
        }
        try {
            //解析日誌
            String[] arr = log.split(",");
            if (arr != null && arr.length == 4) {
                String caller = arr[0];
                String callee = arr[1];
                String callTime = arr[2];
                callTime = callTime.replace("/","") ;       //刪除/
                callTime = callTime.replace(" ","") ;       //刪除空格
                callTime = callTime.replace(":","") ;       //刪除空格

                String callDuration = arr[3];
                //結算區域號

                //構造put物件
                String rowkey = genRowkey(getHashcode(caller, callTime), caller, callTime, flag, callee, callDuration);
                //
                Put put = new Put(Bytes.toBytes(rowkey));
                put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("caller"), Bytes.toBytes(caller));
                put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("callee"), Bytes.toBytes(callee));
                put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("callTime"), Bytes.toBytes(callTime));
                put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("callDuration"), Bytes.toBytes(callDuration));
                table.put(put);

            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public String getHashcode(String caller ,String callTime){
        int len = caller.length();
        //取出後四位電話號碼
        String last4Code = caller.substring(len - 4);
        //取出時間單位,年份和月份.
        String mon = callTime.substring(0,6);
        //
        int hashcode = (Integer.parseInt(mon) ^ Integer.parseInt(last4Code)) % partitions ;
        return df.format(hashcode);
    }

    /**
     * 生成rowkey
     * @param hash
     * @param caller
     * @param time
     * @param flag
     * @param callee
     * @param duration
     * @return
     */
    public String genRowkey(String hash,String caller,String time,String flag,String callee,String duration){
        return hash + "," + caller + "," + time + "," + flag + "," + callee + "," + duration ;
    }
}

建立HbaseConsumer（hbase消費者）:

**
 * Hbase消費者，從kafka提取資料，儲存到hbase中。
 */
public class HbaseConsumer {

    public static void main(String[] args) throws Exception {
        HbaseDao dao = new HbaseDao();
        //建立配置物件
        ConsumerConfig config = new ConsumerConfig(PropertiesUtil.props);

        //獲得主題
        String topic = PropertiesUtil.getProp("topic");
        //
        Map<String, Integer> map = new HashMap<String, Integer>();
        map.put(topic, new Integer(1));
        Map<String, List<KafkaStream<byte[], byte[]>>> msgs = Consumer.createJavaConsumerConnector(new ConsumerConfig(PropertiesUtil.props)).createMessageStreams(map);

        List<KafkaStream<byte[], byte[]>> msgList = msgs.get(topic);

        String msg = null ;
        for (KafkaStream<byte[], byte[]> stream : msgList) {
            ConsumerIterator<byte[], byte[]> it = stream.iterator();
            while (it.hasNext()) {
                byte[] message = it.next().message();
                //取得kafka的訊息
                msg = new String(message) ;
                //寫入hbase中。
                dao.put(msg);
            }
        }
    }
}

打成jar包放到s128。

因為事先要到入很多相關包，所以在window下使用mvn命令，下載工件的所有依賴軟體包
----------------------------------------

mvn -DoutputDirectory=./lib -DgroupId=com.chenway -DartifactId=CallLogConsumerModule -Dversion=1.0-SNAPSHOT dependency:copy-dependencies -DgroupId=com.chenway -DartifactId=CallLogConsumerModule -Dversion=1.0-SNAPSHOT dependency:copy-dependencies

將生成的所有jar包放入s128下lib資料夾

編寫run-kafkaconsumer.sh指令碼：

執行生成資料以及hbase消費者指令碼：

./run-kafkaconsumer.sh

./calllog.sh

可以進入hbase shell

檢視命令：scan ‘ns1:calllogs’

04 hbase提取kafka中的資料儲存

上一篇中的測試時是採用kafka消費者，如果把消費者換成hbase就可以實現hbase提取kafka中的資料進行儲存。啟動hbase要先啟動hdfs，hbase需要zk 啟動hdfs：start-dfs.sh 啟動hba

Django中資料儲存，資料加密功能

1、cookie 1、會話技術 2、客戶端的會話技術（資料儲存在瀏覽器上） 3、問題導致原因：在web應用中，一次網路請求是從request開始，到response結束，跟以後的請求或者跟其他請求沒有關係（導致每次請求之間的資料沒有關係（短連線、長連結））解決：在

Kafka中Broker儲存訊息的方式

1.儲存方式物理上把topic分成一個或多個patition(對應 server.properties 中的num.partitions=3配置)，每個patition物理上對應一個檔案 (該資料夾儲存該patition的所有訊息和索引檔案) 2.儲存策略無論訊息

Hbase(1)-MySQL海量資料儲存的啟發

寬表拆分有一張user表,記錄了使用者的資訊,,如果表中的列有很多,就稱之為寬表,為了提升效率,會進行垂直拆分拆分後將使用者的資訊分為基本資訊和其他資訊,頁面一開打就需要展示的資訊為基本資訊,其他資訊例如訂單,收貨地址等等需要使用者點選後才需要到的高表拆分

storm trident讀取kafka中資料

1. 建立kafka spout public TransactionalTridentKafkaSpout kafkaSpout(String topic) { StormConfig stormConfig = StormConfig.getIns

sparkstreaming+kafka+redis+hbase消費kafka的資料實現exactly-once的語義

最近在做實時流處理的一個專案,遇到N多問題,經過不斷的除錯,終於有點進展,記錄一下,防止後人遇到同樣的問題. 1,sparkstreaming消費kafka有兩種方法,這裡我就不介紹了,網上關於這方面的資料很多,我就簡單說一下兩者的區別吧, (1)基於receiver的方

qt中資料儲存方法（介面）的思路應用1（thinkvd開發日誌）

　　<qt中資料儲存方法（介面）的思路>個人最早釋出在qtcn bbs　http://www.qtcn.org/bbs/read.php?tid=32483中，可能由於比較理論化而讓人感覺其實際應用意義，今後其有相關的應用會逐步寫出來。　　　　　關於載入視訊檔案後

C++中資料儲存的位置

一個由 c/c++編譯過的程式佔用的記憶體分為以下幾個部分：1. 棧區：就是那些由編譯器在需要的時候分配，在不需要的時候自動清除的變數的儲存區。裡面的變數通常是區域性變數、函式引數等。 2. 堆區（動態

HBase刪除表中資料

1、使用hbase shell中delete命令刪除表中特定的單元格資料，命令格式如下： delete 'tablename','row','column name','time stramp' 刪除emp表中第二行personal data:name列

SparkStreaming消費Kafka中的資料使用zookeeper和MySQL儲存偏移量的兩種方式

Spark讀取Kafka資料的方式有兩種，一種是receiver方式，另一種是直連方式。今天分享的SparkStreaming消費Kafka中的資料儲存偏移量的兩種方式都是基於直連方式上的話不多說直接上程式碼！第一種是使用zookeeper儲存偏移量 object Kafka

大資料儲存---HBase常用介紹（中）

我們這裡主要介紹HBase的API 基礎API 封裝工具類基礎API 建立表新增資料查詢資料的三種方式掃描查詢 get方式執行查詢過濾查詢 PS:刪除表請通過shell命令進入客戶端刪除。 package com.hbase; imp

使用flume從kafka中的topic取得資料，然後存入hbase和es中

接上一篇部落格，將資料進行處理！！！！！！！！！！！！#HBASEtier2.sources = HbaseAuditSource HbaseRunSource HdfsAuditSources HdfsRunSources HiveAuditSources HiveRun

Spark Stream整合flum和kafka，資料儲存在HBASE上，分析後存入資料庫

開發環境：Hadoop+HBASE+Phoenix+flum+kafka+spark+MySQL 預設配置好了Hadoop的開發環境，並且已經安裝好HBASE等元件。下面通過一個簡單的案例進行整合：這是整個工作的流程圖：第一步：獲取資料來源　　由於外部埋點獲取資源較為繁瑣

微信小程式中資料的儲存和獲取

/儲存資料 try { wx.setStorageSync('key',this.data.radioCheckVal2) //key表示data中的引數

java中資料的5種儲存位置(堆與棧)

任何語言所編寫的程式，其中的各型別的資料都需要一個儲存位置，Java中資料的儲存位置分為以下5種： 1.暫存器最快的儲存區，位於處理器內部，但是數量極其有限。所以暫存器根據需求進行自動分配，無法直接人為控制。 2.棧記憶體位於RAM當中，通過堆疊指標可以從處理器獲得直接支援。堆疊指標向下

爬蟲資料儲存為csv檔案時，表格中間隔有空行問題

問題描述：將爬取的資料儲存的csv檔案，遇到幾個問題，原始碼如下： with open('F:\\Pythontest1\\douban.csv','w') as f: writer = csv.writer(f,dialect='excel') writer.writero

雲時代的大資料儲存-雲HBase

為什麼縱觀資料庫發展的幾十年，從網狀資料庫、層次資料庫到RDBMS資料庫，在最近幾年的NewSQL的興起，加上開源的運動，再加上雲的特性，可以說是日新月異。在20世紀80年代後，大部分的業務確定使用RDBMS資料為儲存基礎。新世紀開始，隨著網際網路的發展，資料量的增大，慢慢RDBMS資料庫撐不住，就出

python 中資料結構的儲存方法

python中的一切都是物件，任何自定義的資料結構都可以寫成類一、線性表 1.陣列實現 list, import array, np.array Python中list實現為動態陣列，而不是連結串列? 常用方法 append,extend, insert ,remove …

HTML5中的資料儲存

1.初始WebStorage 2.使用WebStorage中的API 一.什麼是Web Storage WebStorage功能就是在Web上儲存資料的功能，而這裡的儲存，是針對客戶端本地而言的。它包含兩種儲存型

Kafka在zookeeper中的儲存

Kafka在zookeeper中的儲存目錄一、Kafka在zookeeper中儲存結構圖二、分析 2.1　topic註冊資訊 2.2　partition狀態資訊 2.3　Broker註冊資訊 2.4　C

04 hbase提取kafka中的資料儲存

相關推薦