hbase與solr的架構整合

阿新 • • 發佈：2019-02-19

大資料架構-使用HBase和Solr將儲存與索引放在不同的機器上

摘要：HBase可以通過協處理器Coprocessor的方式向Solr發出請求，Solr對於接收到的資料可以做相關的同步：增、刪、改索引的操作，這樣就可以同時使用HBase儲存量大和Solr檢索效能高的優點了，更何況HBase和Solr都可以叢集。這對海量資料儲存、檢索提供了一種方式，將儲存與索引放在不同的機器上，是大資料架構的必須品。關鍵詞：HBase, Solr, Coprocessor, 大資料, 架構 HBase和Solr可以通過協處理器Coprocessor的方式向Solr發出請求，Solr對於接收到的資料可以做相關的同步：增、刪、改索引的操作。將儲存與索引放在不同的機器上，這是大資料架構的必須品，但目前還有很多不懂得此道的同學，他們對於這種思想感到很新奇，不過，這絕對是好的方向，所以不懂得抓緊學習吧。

有個朋友給我的那篇部落格留言，說CDH也可以做這樣的事情，我還沒有試過，他還問我要與此相關的程式碼，於是我就稍微整理了一下，作為本篇文章的主要內容。關於CDH的事，我會盡快嘗試，有知道的同學可以給我留言。下面我主要講述一下，我測試對HBase和Solr的效能時，使用HBase協處理器向HBase新增資料所編寫的相關程式碼，及解釋說明。一、編寫HBase協處理器Coprocessor 一旦有資料postPut，就立即對Solr裡相應的Core更新。這裡使用了ConcurrentUpdateSolrServer，它是Solr速率效能的保證，使用它不要忘記在Solr裡面配置autoCommit喲。

/* *版權：王安琪 *描述：監視HBase，一有資料postPut就向Solr傳送，本類要作為觸發器新增到HBase *修改時間：2014-05-27 *修改內容：新增 */ package solrHbase.test; import java.io.UnsupportedEncodingException; import ***; publicclass SorlIndexCoprocessorObserver extends BaseRegionObserver { private static final Logger LOG = LoggerFactory

.getLogger(SorlIndexCoprocessorObserver.class); private static final String solrUrl = "http://192.1.11.108:80/solr/core1"; private static final SolrServer solrServer = new ConcurrentUpdateSolrServer( solrUrl, 10000, 20); /** * 建立solr索引 * * @throws UnsupportedEncodingException */ @Override public void postPut(final ObserverContext<RegionCoprocessorEnvironment> e, final Put put, final WALEdit edit, final boolean writeToWAL) throws UnsupportedEncodingException { inputSolr(put); } public void inputSolr(Put put) { try { solrServer.add(TestSolrMain.getInputDoc(put)); } catch (Exception ex) { LOG.error(ex.getMessage()); } } }

注意：getInputDoc是這個HBase協處理器Coprocessor的精髓所在，它可以把HBase內的Put裡的內容轉化成Solr需要的值。其中String fieldName = key.substring(key.indexOf(columnFamily) + 3, key.indexOf("我在這")).trim();這裡有一個亂碼字元，在這裡看不到，請大家注意一下。

publicstatic SolrInputDocument getInputDoc(Put put) { SolrInputDocument doc = new SolrInputDocument(); doc.addField("test_ID", Bytes.toString(put.getRow())); for (KeyValue c : put.getFamilyMap().get(Bytes.toBytes(columnFamily))) { String key = Bytes.toString(c.getKey()); String value = Bytes.toString(c.getValue()); if (value.isEmpty()) { continue; } String fieldName = key.substring(key.indexOf(columnFamily) + 3, key.indexOf("")).trim(); doc.addField(fieldName, value); } return doc; }

二、編寫測試程式入口程式碼main 這段程式碼向HBase請求建了一張表，並將模擬的資料，向HBase連續地提交資料內容，在HBase中不斷地插入資料，同時記錄時間，測試插入效能。

/* *版權：王安琪 *描述：測試HBaseInsert，HBase插入效能 *修改時間：2014-05-27 *修改內容：新增 */ package solrHbase.test; import hbaseInput.HbaseInsert; import ***; publicclass TestHBaseMain { private static Configuration config; private static String tableName = "angelHbase"; private static HTable table = null; private static final String columnFamily = "wanganqi"; /** * @param args */ public static void main(String[] args) { config = HBaseConfiguration.create(); config.set("hbase.zookeeper.quorum", "192.103.101.104"); HbaseInsert.createTable(config, tableName, columnFamily); try { table = new HTable(config, Bytes.toBytes(tableName)); for (int k = 0; k < 1; k++) { Thread t = new Thread() { public void run() { for (int i = 0; i < 100000; i++) { HbaseInsert.inputData(table, PutCreater.createPuts(1000, columnFamily)); Calendar c = Calendar.getInstance(); String dateTime = c.get(Calendar.YEAR) + "-" + c.get(Calendar.MONTH) + "-" + c.get(Calendar.DATE) + "T" + c.get(Calendar.HOUR) + ":" + c.get(Calendar.MINUTE) + ":" + c.get(Calendar.SECOND) + ":" + c.get(Calendar.MILLISECOND) + "Z 寫入: " + i * 1000; System.out.println(dateTime); } } }; t.start(); } } catch (IOException e1) { e1.printStackTrace(); } } }

下面的是與HBase相關的操作，把它封裝到一個類中，這裡就只有建表與插入資料的相關程式碼。

/* *版權：王安琪 *描述：與HBase相關操作，建表與插入資料 *修改時間：2014-05-27 *修改內容：新增 */ package hbaseInput; import ***; import org.apache.hadoop.hbase.client.Put; publicclass HbaseInsert { public static void createTable(Configuration config, String tableName, String columnFamily) { HBaseAdmin hBaseAdmin; try { hBaseAdmin = new HBaseAdmin(config); if (hBaseAdmin.tableExists(tableName)) { return; } HTableDescriptor tableDescriptor = new HTableDescriptor(tableName); tableDescriptor.addFamily(new HColumnDescriptor(columnFamily)); hBaseAdmin.createTable(tableDescriptor); hBaseAdmin.close(); } catch (MasterNotRunningException e) { e.printStackTrace(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } public static void inputData(HTable table, ArrayList<Put> puts) { try { table.put(puts); table.flushCommits(); puts.clear(); } catch (IOException e) { e.printStackTrace(); } } }

三、編寫模擬資料Put 向HBase中寫入資料需要構造Put，下面是我構造模擬資料Put的方式，有字串的生成，我是由mmseg提供的詞典words.dic中隨機讀取一些詞語連線起來，生成一句字串的，下面的程式碼沒有體現，不過很easy，你自己造你自己想要的資料就OK了。

publicstatic Put createPut(String columnFamily) { String ss = getSentence(); byte[] family = Bytes.toBytes(columnFamily); byte[] rowKey = Bytes.toBytes("" + Math.abs(r.nextLong())); Put put = new Put(rowKey); put.add(family, Bytes.toBytes("DeviceID"), Bytes.toBytes("" + Math.abs(r.nextInt()))); ****** put.add(family, Bytes.toBytes("Company_mmsegsm"), Bytes.toBytes("ss")); return put; }

當然在執行上面這個程式之前，需要先在Solr裡面配置好你需要的列資訊，HBase、Solr安裝與配置，它們的基礎使用方法將會在之後的文章中介紹。在這裡，Solr的列配置就跟你使用createPut生成的Put搞成一樣的列名就行了，當然也可以使用動態列的形式。四、直接對Solr效能測試如果你不想對HBase與Solr的相結合進行測試，只想單獨對Solr的效能進行測試，這就更簡單了，完全可以利用上面的程式碼段來測試，稍微組裝一下就可以了。

privatestaticvoid sendConcurrentUpdateSolrServer(final String url, final int count) throws SolrServerException, IOException { SolrServer solrServer = new ConcurrentUpdateSolrServer(url, 10000, 20); for (int i = 0; i < count; i++) { solrServer.add(getInputDoc(PutCreater.createPut(columnFamily))); } }

hbase與solr的架構整合

hbase與solr的架構整合

HBase與Sqoop的整合

HBase與Hive的整合案例二

HBase與Hive的整合案例一

架構師入門：Spring Cloud系列，Hystrix與Eureka的整合

架構師系列文：通過Spring Cloud元件Hystrix合併請求架構師入門：Spring Cloud系列，Hystrix與Eureka的整合

HBase與MapReduce整合操作

使用docker+consul+nginx整合分散式的服務發現與註冊架構

Maven整合Spring與Solr

hbase與flume整合程式設計

Linux-centos下安裝hue視覺化以及與hdfs、hive、hbase和mysql的整合

HBase-與Hive的區別、與Sqoop的整合

HBase權威指南學習記錄（五、hbase與MapReduce整合）

HBase與MapReduce整合2-Hdfs2HBase

Hbase與Mapreduce整合的案例

大資料（HBase-應用場景、原理與基本架構）

《Kafka筆記》4、Kafka架構，與其他元件整合

信息技術開拓視野——記IT戰略規劃與企業架構培訓課程

cas4.2.7與shiro進行整合

Sqoop_具體總結使用Sqoop將HDFS/Hive/HBase與MySQL/Oracle中的數據相互導入、導出

hbase與solr的架構整合

相關推薦