Hadoop序列化與Writable接口(一)

阿新 • • 發佈：2017-07-05

temp 們的 ffi err 時間 sea 部分過程自身

Hadoop序列化與Writable接口(一)

序列化

序列化（serialization）是指將結構化的對象轉化為字節流，以便在網絡上傳輸或者寫入到硬盤進行永久存儲；相對的反序列化（deserialization）是指將字節流轉回到結構化對象的過程。

在分布式系統中進程將對象序列化為字節流，通過網絡傳輸到另一進程，另一進程接收到字節流，通過反序列化轉回到結構化對象，以達到進程間通信。在Hadoop中，Mapper，Combiner，Reducer等階段之間的通信都需要使用序列化與反序列化技術。舉例來說，Mapper產生的中間結果（<key: value1, value2...>

）需要寫入到本地硬盤，這是序列化過程（將結構化對象轉化為字節流，並寫入硬盤），而Reducer階段讀取Mapper的中間結果的過程則是一個反序列化過程（讀取硬盤上存儲的字節流文件，並轉回為結構化對象），需要註意的是，能夠在網絡上傳輸的只能是字節流，Mapper的中間結果在不同主機間洗牌時，對象將經歷序列化和反序列化兩個過程。

序列化是Hadoop核心的一部分，在Hadoop中，位於org.apache.hadoop.io包中的Writable接口是Hadoop序列化格式的實現。

Writable接口

Hadoop Writable接口是基於DataInput和DataOutput實現的序列化協議，緊湊（高效使用存儲空間），快速（讀寫數據、序列化與反序列化的開銷小）。Hadoop中的鍵（key）和值（value）必須是實現了Writable接口的對象（鍵還必須實現WritableComparable，以便進行排序）。

以下是Hadoop（使用的是Hadoop 1.1.2）中Writable接口的聲明：

package org.apache.hadoop.io;

import java.io.DataOutput;
import java.io.DataInput;
import java.io.IOException;

public interface Writable {
  /** 
   * Serialize the fields of this object to <code>out</code>.
   * 
   * @param out <code>DataOuput</code> to serialize this object into.
   * @throws IOException
   */
  void write(DataOutput out) throws IOException;

  /** 
   * Deserialize the fields of this object from <code>in</code>.  
   * 
   * <p>For efficiency, implementations should attempt to re-use storage in the 
   * existing object where possible.</p>
   * 
   * @param in <code>DataInput</code> to deseriablize this object from.
   * @throws IOException
   */
  void readFields(DataInput in) throws IOException;
}

Writable類

Hadoop自身提供了多種具體的Writable類，包含了常見的Java基本類型（boolean、byte、short、int、float、long和double等）和集合類型（BytesWritable、ArrayWritable和MapWritable等）。這些類型都位於org.apache.hadoop.io包中。

技術分享

(圖片來源：safaribooksonline.com)

定制Writable類

雖然Hadoop內建了多種Writable類提供用戶選擇，Hadoop對Java基本類型的包裝Writable類實現的RawComparable接口，使得這些對象不需要反序列化過程，便可以在字節流層面進行排序，從而大大縮短了比較的時間開銷，但是當我們需要更加復雜的對象時，Hadoop的內建Writable類就不能滿足我們的需求了(需要註意的是Hadoop提供的Writable集合類型並沒有實現RawComparable接口，因此也不滿足我們的需要)，這時我們就需要定制自己的Writable類，特別將其作為鍵（key）的時候更應該如此，以求達到更高效的存儲和快速的比較。

下面的實例展示了如何定制一個Writable類，一個定制的Writable類首先必須實現Writable或者WritableComparable接口，然後為定制的Writable類編寫write(DataOutput out)和readFields(DataInput in)方法，來控制定制的Writable類如何轉化為字節流（write方法）和如何從字節流轉回為Writable對象。

package com.yoyzhou.weibo;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.VLongWritable;
import org.apache.hadoop.io.Writable;

/**
 *This MyWritable class demonstrates how to write a custom Writable class 
 *
 **/
public class MyWritable implements Writable{
		
		
	private VLongWritable field1;
	private VLongWritable field2;
		
	public MyWritable(){
		this.set(new VLongWritable(), new VLongWritable());
	}
		
		
	public MyWritable(VLongWritable fld1, VLongWritable fld2){
			
		this.set(fld1, fld2);
			
	}
		
	public void set(VLongWritable fld1, VLongWritable fld2){
		//make sure the smaller field is always put as field1
		if(fld1.get() <= fld2.get()){
			this.field1 = fld1;
			this.field2 = fld2;
		}else{
				
			this.field1 = fld2;
			this.field2 = fld1;
		}
		}
				
	//How to write and read MyWritable fields from DataOutput and DataInput stream
	@Override
	public void write(DataOutput out) throws IOException {
			
		field1.write(out);
		field2.write(out);
	}

	@Override
	public void readFields(DataInput in) throws IOException {
			
		field1.readFields(in);
		field2.readFields(in);
	}

	/** Returns true if <code>o</code> is a MyWritable with the same values. */
	@Override
	public boolean equals(Object o) {
		 if (!(o instanceof MyWritable))
		    return false;
		
		    MyWritable other = (MyWritable)o;
		    return field1.equals(other.field1) && field2.equals(other.field2);
		
	}
		
	@Override
	public int hashCode(){
			
		return field1.hashCode() * 163 + field2.hashCode();
	}
		
	@Override
	public String toString() {
		return field1.toString() + "\t" + field2.toString();
	}
		
}

未完待續，下一篇中將介紹Writable對象序列化為字節流時占用的字節長度以及其字節序列的構成。

參考資料

Tom White, Hadoop: The Definitive Guide, 3rd Edition

---To Be Continued---

Hadoop序列化與Writable接口(一)

temp 們的 ffi err 時間 sea 部分過程自身 Hadoop序列化與Writable接口(一) 序列化序列化（serialization）是指將結構化的對象轉化為字節流，以便在網絡上傳輸或者寫入到硬盤進行永久存儲；相對的反序列化（deserializat

Hadoop序列化與Writable接口(一)

Hadoop序列化與Writable接口(一)

序列化

Writable接口

Writable類

定制Writable類

參考資料

Hadoop序列化與Writable接口(一)

Serializable 指示一個類可以序列化；ICloneable支持克隆，即用與現有實例相同的值創建類的新實例（接口）；ISerializable允許對象控制其自己的序列化和反序列化過程（接口）

hadoop深入研究:(十)——序列化與Writable介面

asp.net mvc中如何處理字符串與對象之間的序列化與反序列化(一)

MapReduce常見演算法與自定義排序及Hadoop序列化

Flink的序列化與flink-hadoop-compatibility

java 序列化與反序列化（一）

一：Newtonsoft.Json 支持序列化與反序列化的.net 對象類型；

一：Newtonsoft.Json 支援序列化與反序列化的.net 物件型別；

C# 實體類序列化與反序列化一 (XmlSerializer)

hadoop深入研究:(十六)——Avro序列化與反序列化

使用Newtonsoft.Json.dll(JSON.NET)動態解析JSON、.net 的json的序列化與反序列化（一）

Java核心類庫-IO-對象流（實現序列化與反序列化）

Java IO-5 序列化與反序列化流

JAVA序列化與反射

契約類相關的序列化與反序列化

Hadoop Serialization -- hadoop序列化具體解釋 (2)【Text,BytesWritable,NullWritable】

Java序列化與反序列化

C#對象序列化與反序列化

Java將對象寫入文件讀出——序列化與反序列化

Hadoop序列化與Writable接口(一)

Hadoop序列化與Writable接口(一)

序列化

Writable接口

Writable類

定制Writable類

參考資料

相關推薦