Hbase基於Mapreduce的程式設計

阿新 • • 發佈：2019-02-14

小試牛刀，將mapreduce的輸出結果儲存到大型分散式資料庫中HBase中，一個例子，求各url的訪問pv資料,由於用到rcfile格式需要匯入hive-exce包，還需要載入hbase包，如果這兩個包都已經被叢集管理員放到各節點的hadoop/lib下那就可以省去這一步，廢話不說，乾貨，看程式碼：

package test.hbase;

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import com.test.dm.common.RCFileInputFormat;

public class URLCountHbase {
	public static class HBaseMap extends
			Mapper<LongWritable, BytesRefArrayWritable, Text, IntWritable> {

		private IntWritable i = new IntWritable(1);

		@Override
		protected void map(LongWritable key, BytesRefArrayWritable value,
				Context context) throws IOException, InterruptedException {
			byte[] url = value.get(4).getBytesCopy();
			context.write(new Text(url), i);
		}

	}

	public static class HBaseReduce extends
			TableReducer<Text, IntWritable, NullWritable> {

		@Override
		protected void reduce(Text key, Iterable<IntWritable> values,
				Context context) throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable i : values) {
				sum += i.get();
			}
			Put put = new Put(Bytes.toBytes(key.toString()));
			put.add(Bytes.toBytes("type"), Bytes.toBytes("count"),
					Bytes.toBytes(String.valueOf(sum)));
			context.write(NullWritable.get(), put);
		}

	}

	public static void createHbaseTable(String tablename) throws IOException {
		HTableDescriptor htd = new HTableDescriptor(tablename);
		HColumnDescriptor col = new HColumnDescriptor("type");
		htd.addFamily(col);
		HBaseConfiguration config = new HBaseConfiguration();
		HBaseAdmin admin = new HBaseAdmin(config);
		if (admin.tableExists(tablename)) {
			System.out.println("table exists, trying recreate table");
			admin.disableTable(tablename);
			admin.deleteTable(tablename);
		}
		System.out.println("create new table:" + tablename);
		admin.createTable(htd);

	}

	public static void main(String args[]) throws Exception {
		String tablename = "urlcount";
		Configuration conf = new Configuration();
		final FileSystem fs = FileSystem.getLocal(conf);
		final HashSet<String> localfiles = new HashSet<String>();
		localfiles.add("/opt/hadoop/hive-0.8.1/lib/hive-exec-0.8.1.jar");
		localfiles.add("/opt/hadoop/hbase/hbase-0.92.1.jar");
		final HashSet<String> files = new HashSet<String>();
		for (String s : localfiles) {
			files.add(URLCountHbase.convertPath(s, fs));
		}
		URLCountHbase.cacheJars(conf, files);
		conf.set(TableOutputFormat.OUTPUT_TABLE, tablename);
		createHbaseTable(tablename);
		Job job = new Job(conf, "WordCount table with " + args[0]);
		job.setJarByClass(URLCountHbase.class);
		job.setNumReduceTasks(3);
		job.setReducerClass(HBaseReduce.class);
		job.setMapperClass(HBaseMap.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		job.setOutputFormatClass(TableOutputFormat.class);
		job.setInputFormatClass(RCFileInputFormat.class);
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);

	}
	
	 private static String convertPath(String path, FileSystem fs) {
	        final Path p = new Path(path);
	        return p.makeQualified(fs).toString();
	 }

	 private static void cacheJars(Configuration job, Set<String> localUrls) throws IOException {
	     if (localUrls.isEmpty()) {
	            return;
	        }
	        final String tmpjars = job.get("tmpjars");
	        final StringBuilder sb = new StringBuilder();
	        if (null != tmpjars) {
	            sb.append(tmpjars);
	            sb.append(",");
	        }
	        sb.append(org.apache.hadoop.util.StringUtils.arrayToString(localUrls.toArray(new String[0])));
	        job.set("tmpjars", sb.toString());
	  }
}

Hbase基於Mapreduce的程式設計

小試牛刀，將mapreduce的輸出結果儲存到大型分散式資料庫中HBase中，一個例子，求各url的訪問pv資料,由於用到rcfile格式需要匯入hive-exce包，還需要載入hbase包，如果這兩個包都已經被叢集管理員放到各節點的hadoop/lib下那就可以省去這一步，

基於HBase的MapReduce實現大量郵件信息統計分析

inittab 寫入 img implement system return dea 比較 tco 一：概述在大多數情況下，如果使用MapReduce進行batch處理，文件一般是存儲在HDFS上的，但這裏有個很重要的場景不能忽視，那就是對於大量的小文件的處理（此處小文件

hbase資料匯入hdfs中之（使用MapReduce程式設計統計hbase庫中的mingxing表中男女數量）

資料 zhangfenglun,M,20,13522334455,[email protected],23521472 chenfei,M,20,13684634455,[email protected],84545472 liyuchen,M,20,1352233425

基於MapReduce的HBase開發

在偽分散式模式和全分散式模式下 HBase 是架構在 HDFS 上的，因此完全可以將MapReduce 程式設計框架和 HBase 結合起來使用。也就是說，將 HBase 作為底層“儲存結構”，MapReduce 呼叫 HBase 進行特殊的處理，這樣能夠充分

MapReduce程式設計實現txt檔案中的內容匯入HBase

一、建立java專案。寫入程式碼，如下： [java] view plain copy print? package translate1; import java.io.IOException; import org.apache.hadoo

HDFS的快照原理和Hbase基於快照的表修復

才會 vertical 根據註意 efault 失敗機制 soft hot 前一篇文章《HDFS和Hbase誤刪數據恢復》主要講了hdfs的回收站機制和Hbase的刪除策略。根據hbase的刪除策略進行hbase的數據表恢復。本文主要介紹了hdfs的快照原理和根據快照進

基於MapReduce的手機流量統計分析

methods ica spec err reduce same new form sel 1，代碼 package mr; import java.io.IOException; import org.apache.commons.lang.StringUtils;

MapReduce教程(一)基於MapReduce框架開發<轉>

mat 路徑重寫 combine 自定義單詞 tools 必須 www. 1 MapReduce編程 1.1 MapReduce簡介 MapReduce是一種編程模型，用於大規模數據集（大於1TB）的並行運算,用於解決海量數據的計算問題。 MapReduce

Hadoop偽分佈安裝詳解+MapReduce執行原理+基於MapReduce的KNN演算法實現

本篇部落格將圍繞Hadoop偽分佈安裝+MapReduce執行原理+基於MapReduce的KNN演算法實現這三個方面進行敘述。（一）Hadoop偽分佈安裝 1、簡述Hadoop的安裝模式中–偽分佈模式與叢集模式的區別與聯絡. Hadoop的安裝方式有三種:本地模式,偽分佈模式

MapReduce程式設計

MapReduce Coding Criteria 單個MapReduce 單元運算以WordCount為例分別編寫Map和Reduce函式編寫main方法，設定環境變數，進行註冊：

比起基於執行緒程式設計，更偏愛基於任務程式設計

如果你想非同步地執行函式doAsyncWork，你有兩個基本的選擇。你可以建立一個std::thread，用它來執行doAsyncWork，這是基於執行緒（thread-based）的方法： int doAsyncWork(); std::thread t(doAsyncWork);

基於mapreduce實現圖的三角形計數

direct () add array 線程 src 運行時 void 部分源代碼放在我的github上，想細致了解的可以訪問：TriangleCount on github 一、實驗要求 1.1 實驗背景 ????????圖的三角形計數問題是一個基本的圖計算問題,是很多

muduo基於物件程式設計風格[與]面向物件程式設計風格對比

結論 muduo程式碼的實現是基於物件程式設計風格，使用boost bind/function，替代了mem_fun，ptr_fun，bind1st，bind2nd等函式 boost::bind的使用示例：boost::bind能夠將一個函式介面，轉換為另一種函式介面

Eclipse 基於介面程式設計的時候，快速跳轉到實現類的方法(圖文)

https://www.cnblogs.com/taoweiji/p/3870922.html ******************************************************** Eclipse 基於介面程式設計的時候，要跳轉到實現類很麻煩，其實Eclipse已

面向物件程式設計風格與基於物件程式設計風格

使用面向物件風格對執行緒類封裝 #ifndef _THREAD_H_ #define _THREAD_H_ #include <pthread.h> class Thread { public: Thread(); virtual ~Thread();

大資料技術學習筆記之Hadoop框架基礎2-MapReduce程式設計及執行流程

一、回顧 -》hadoop的功能？ -》海量資料儲存和海量計算問題 -》分散式檔案儲存框架hdfs和

大資料之Hadoop學習——動手實戰學習MapReduce程式設計例項

文章目錄一、MapReduce程式設計例項 1.自定義物件序列化需求分析報錯：Exception in thread "main" java.lang.IllegalArgumentExcept

NLP之情感分析：基於python程式設計(jieba庫)實現中文文字情感分析(得到的是情感評分)

NLP之情感分析：基於python程式設計(jieba庫)實現中文文字情感分析(得到的是情感評分) 輸出結果 1、測試物件 data1= '今天上海的天氣真好！我的心情非常高興！如果去旅遊的話我會非常興奮！和你一起去旅遊我會更加幸福！' data2= '今天上海天氣真差,非常討厭下雨,把

基於MapReduce的詞頻統計程式WordCountApp(一)

詞頻統計案例分析： wordcount: 統計檔案中每個單詞出現的次數需求：求wc 檔案內容小：shell(wc_shell.sh) 使用IDEA+Maven開發wc： 1）開發 2）編譯：mv

MapReduce程式設計之Combiner

Combiner 可以理解為本地的reducer,減少了Map Tasks輸出的資料量以及資料網路傳輸量編譯執行： hadoop jar /home/zq/lib/HDFS_Test-1.0-SNAPSHOT.jar MapReduce.CombinerAp

Hbase基於Mapreduce的程式設計

相關推薦