hash_map的key為自定義型別

阿新 • • 發佈：2019-01-02

說來慚愧，使用了很久Visual Stdio 2003了，只知道MFC升級到了7.0，ATL也升級到了7.0，對於這兩個經典的類庫做了一些研究，但一直沒有注意C++標準庫的變化。

今天嘗試的使用了stdext::hash_map這個庫，果然不錯。下面寫下一些心得。

hash_map類在標頭檔案hash_map中，和所有其它的C++標準庫一樣，標頭檔案沒有副檔名。如下宣告：

          #include <hash_map>
          using namespace std;
          using namespace stdext;

hash_map是一個聚合類，它繼承自_Hash類，包括一個vector，一個list和一個pair，其中vector用於儲存桶，list用於進行衝突處理，pair用於儲存key->value結構，簡要地偽碼如下：

          class hash_map<class _Tkey, class _Tval>
          {
          private:
               typedef pair<_Tkey, _Tval> hash_pair;
               typedef list<hash_pair>    hash_list;
               typedef vector<hash_list> hash_table;
          };

當然，這只是一個簡單模型，C++標準庫的泛型模版一向以巢狀複雜而聞名，初學時看類庫，無疑天書啊。微軟的hash_map類還聚合了hash_compare仿函式類，hash_compare類裡有聚合了less仿函式類，亂七八糟的。

下面說說使用方法：

     一、簡單變數作為索引：整形、實性、指標型
     其實指標型也就是整形，演算法一樣。但是hash_map會對char*, const char*, wchar_t*, const wchar_t*做特殊處理。
     這種情況最簡單，下面程式碼是整形示例：
            hash_map<int, int> IntHash;
            IntHash[1] = 123;
            IntHash[2] = 456;

            int val = IntHash[1];
            int val = IntHash[2];
     實型和指標型用法和整形一樣，原理如下：
     1、使用簡單型別作索引宣告hash_map的時候，不需要宣告模版的後兩個引數（最後一個引數指名hash_map節點的儲存方式，預設為pair，我覺得這就挺好，沒必要修改），使用預設值就好。
     2、對於除過字串的其它簡單型別，hash_map使用模版函式 size_t hash_value(const _Kty& _Keyval) 計算hash值，計算方法是經典的掩碼異或法，自動溢位得到索引hash值。微軟的工程師也許開了一個玩笑，這個掩碼被定義為0xdeadbeef(死牛肉，抑或是某個程式設計師的外號）。
     3、對於字串指標作索引的時候，使用定型別函式inline size_t hash_value(const char *_Str)或inline size_t hash_value(const wchar_t *_Str)計算hash值，計算方法是取出每一個字元求和，自動溢位得到hash值。對於字串型的hash索引，要注意需要自定義less仿函式。
     因為我們有理由認為，人們使用hash表進行快速查詢的預期成本要比在hash表中插入的預期成本低得多，所以插入可以比查詢昂貴些；基於這個假設，hash_map在有衝突時，插入連結串列是進行排序插入的，這樣在進行查詢衝突解決的時候就能夠更快捷的找到需要的索引。
     但是，基於泛型程式設計的原則，hash_map也有理由認為每一種型別都支援使用"<"來判別兩個型別值的大小，這種設計恰好讓字串型別無所適從，眾所周知，兩個字串指標的大小並不代表字串值的大小。見如下程式碼：
          hash_map<const char*, int> CharHash;
          CharHash["a"] = 123;
          CharHash["b"] = 456;

char szInput[64] = "";
scanf("%s", szInput);

int val = CharHash[szInput];

最終的結果就是無論輸入任何字串，都無法找到對應的整數值。因為輸入的字串指標是szInput指標，和"a"或"b"字串常量指標的大小是絕對不會相同。解決方法如下：
首先寫一個仿函式CharLess，繼承自仿函式基類binary_function（當然也可以不繼承，這樣寫只是符合標準，而且寫起來比較方便，不用被類似於指標的指標和指標的引用搞暈。

          struct CharLess : public binary_function<const char*, const char*, bool>
          {
          public:
               result_type operator()(const first_argument_type& _Left, const second_argument_type& _Right) const
               {
                    return(stricmp(_Left, _Right) < 0 ? true : false);
               }
          };

很好，有了這個仿函式，就可以正確的使用字串指標型hash_map了。如下：

          hash_map<const char*, int, hash_compare<const char*, CharLess> > CharHash;
          CharHash["a"] = 123;
          CharHash["b"] = 456;

char szInput[64] = "";
scanf("%s", szInput);

          int val = CharHash[szInput];

     現在就可以正常工作了。至此，簡單型別的使用方法介紹完畢。

     二、使用者自定義型別：比如物件型別，結構體。
     這種情況比價複雜，我們先說簡單的，對於C++標準庫的string類。

     慶幸的是，微軟為basic_string（string類的基類）提供了hash方法，這使得使用string物件做索引簡單了許多。值得注意（也值得鬱悶）的是，雖然支援string的hash，string類卻沒有過載比較運算子，所以標準的hash_compare仿函式依舊無法工作。我們繼續重寫less仿函式。

          struct string_less : public binary_function<const string, const string, bool>
          {
          public:
               result_type operator()(const first_argument_type& _Left, const second_argument_type& _Right) const
               {
                    return(_Left.compare(_Right) < 0 ? true : fase);
               }
          };

     好了，我們可以書寫如下程式碼：

          hash_map<string, int, hash_compare<string, string_less> > StringHash;
          StringHash["a"] = 123;
          StringHash["b"] = 456;

string strKey = "a";

          int val = CharHash[strKey];

     這樣就可以了。

     對於另外的一個常用的字串類CString（我認為微軟的CString比標準庫的string設計要灑脫一些）更加複雜一些。很顯然，標準庫裡不包含對於CString的支援，但CString卻過載了比較運算子（鬱悶）。我們必須重寫hash_compare仿函式。值得一提的是，在Virtual Stdio 2003中，CString不再是MFC的成員，而成為ATL的成員，使用#include <atlstr.h>就可以使用。我沒有采用重寫hash_compare仿函式的策略，而僅僅是繼承了它，在模版庫中的繼承是沒有效能損耗的，而且能讓我偷一點懶。
     首先重寫一個hash_value函式：

          inline size_t CString_hash_value(const CString& str)
          {
               size_t value = _HASH_SEED;
               size_t size = str.GetLength();
               if (size > 0) {
                    size_t temp = (size / 16) + 1;
                    size -= temp;
                    for (size_t idx = 0; idx <= size; idx += temp) {
                         value += (size_t)str[(int)idx];
                    }
               }
               return(value);
          }

     其次重寫hash_compare仿函式：

          class CString_hash_compare : public hash_compare<CString>
          {
          public:
               size_t operator()(const CString& _Key) const
               {
                    return((size_t)CString_hash_value(_Key));
               }

               bool operator()(const CString& _Keyval1, const CString& _Keyval2) const
               {
                    return (comp(_Keyval1, _Keyval2));
               }
          };

     上面的過載忽略了基類對於less仿函式的引入，因為CString具備比較運算子，我們可以使用預設的less仿函式，在這裡對映為comp。好了，我們可以宣告新的hash_map物件如下：

hash_map<CString, int, CString_hash_compare> CStringHash;

其餘的操作一樣一樣的。

     下來就說說對於自定義物件的使用方法：首先定義

          struct IHashable
          {
               virtual unsigned long hash_value() const = 0;
               virtual bool operator < (const IHashable& val) const = 0;
               virtual IHashable& operator = (const IHashable& val) = 0;
          };

     讓我們自寫的類都派生自這裡，有一個標準，接下來定義我們的類：

          class CTest : public IHashable
          {
          public:
               int m_value;
               CString m_message;
          public:
               CTest() : m_value(0)
               {
               }

               CTest(const CTest& obj)
               {
                    m_value = obj.m_value;
                    m_message = obj.m_message;
               }
          public:
               virtual IHashable& operator = (const IHashable& val)
               {
                    m_value   = ((CTest&)val).m_value;
                    m_message = ((CTest&)val).m_message;
                    return(*this);
               }

               virtual unsigned long hash_value() const
               {
                    // 這裡使用類中的m_value域計算hash值，也可以使用更復雜的函式計算所有域總的hash值
                    return(m_value ^ 0xdeadbeef
               }

               virtual bool operator < (const IHashable& val) const
               {
                    return(m_value < ((CTest&)val).m_value);
               }
          };

     用這個類的物件做為hash索引準備工作如下，因為介面中規定了比較運算子，所以這裡可以使用標準的less仿函式，所以這裡忽略：

          template<class _Tkey>
          class MyHashCompare : public hash_compare<_Tkey>
          {
          public:
               size_t operator()(const _Tkey& _Key) const
               {
                    return(_Key.hash_value());
               }

               bool operator()(const _Tkey& _Keyval1, const _Tkey& _Keyval2) const
               {
                    return (comp(_Keyval1, _Keyval2));
               }
          };

     下來就這樣寫：

          CTest test;
          test.m_value = 123;
          test.m_message = "This is a test";

          MyHash[test] = 2005;

          int val = MyHash[test];

     可以看到正確的數字被返回。

     三、關於hash_map的思考：

     1、效能分析：採用了內聯程式碼和模版技術的hash_map在效率上應該是非常優秀的，但我們還需要注意如下幾點：

     * 經過檢視程式碼，字串索引會比簡單型別索引速度慢，自定義型別索引的效能則和我們選擇hash的內容有很大關係，簡單為主，這是使用hash_map的基本原則。
     * 可以通過重寫hash_compair仿函式，更改裡面關於桶數量的定義，如果取值合適，也可以得到更優的效能。如果桶數量大於10，則牢記它應該是一個質數。
     * 在自定義型別是，過載的等號（或者拷貝構造）有可能成為效能瓶頸，使用物件指標最為索引將是一個好的想法，但這就必須重寫less仿函式，理由同使用字串指標作為索引。

一個測試程式：

#include "StdAfx.h"
#include <hash_map>
#include <string>
#include <iostream>
using namespace std;
using namespace stdext;

struct IHashable 
{ 
	virtual unsigned long hash_value() const = 0; 
	virtual bool operator < (const IHashable& val) const = 0; 
	virtual IHashable& operator = (const IHashable& val) = 0; 
};

//define the class
class ClassA: public IHashable{
public:
	ClassA(int a):c_a(a){}
	ClassA(const ClassA &A){c_a=A.c_a;}
	ClassA& operator=(const ClassA& A){
		c_a=A.c_a;
		return *this;
	}
	int getvalue()const { return c_a;}
	void setvalue(int a){c_a=a;}

	virtual IHashable& operator = (const IHashable& val) 
	{ 
		c_a   = ((ClassA&)val).c_a; 
		return(*this); 
	}

	virtual unsigned long hash_value() const
	{
		// 這裡使用類中的m_value域計算hash值，也可以使用更復雜的函式計算所有域總的hash值
		return(c_a ^ 0xdeadbeef);
	}

	virtual bool operator < (const IHashable& val) const 
	{ 
		return(c_a < ((ClassA&)val).c_a); 
	} 
private:
	int c_a;
};

//1 define the hash function
struct hash_A{
	size_t operator()(const class ClassA & A)const{
		// return hash<int>(classA.getvalue());
		return A.getvalue();
	}
};


template<class _Tkey> 
class MyHashCompare : public hash_compare<_Tkey> 
{ 
public: 
	size_t operator()(const _Tkey& _Key) const 
	{ 
		return(_Key.hash_value()); 
	}

	bool operator()(const _Tkey& _Keyval1, const _Tkey& _Keyval2) const 
	{ 
		return (comp(_Keyval1, _Keyval2)); 
	} 
};

//2 define the equal function
struct equal_A{
	bool operator()(const class ClassA & a1, const class ClassA & a2)const{
		return a1.getvalue() == a2.getvalue();
	}
	//bool operator()(int a1, int a2)const{
	//	return a1 == a2;
	//}
};


int main()
{

	//hash_map<ClassA, string, hash_A, equal_A> hmap;
	//hash_map<int, string, hash_compare<int,equal_A>> hmap;
	//typedef pair<int, string> MyPair;
	hash_map<ClassA, string, MyHashCompare<ClassA>> hmap;
	typedef pair<ClassA, string> MyPair;

	ClassA a1(12);
	//hmap[a1]="I am 12";
	//hmap.insert(MyPair(1,"I am 12"));
	hmap.insert(MyPair(a1,"I am 12"));

	//ClassA a2(198877);
	//hmap[a2]="I am 198877";	

	cout<<hmap[a1]<<endl;
	//cout<<hmap[a2]<<endl;

	system("pause");
	return 0;
}

hash_map的key為自定義型別

佇列——順序儲存的迴圈佇列（儲存元素為自定義型別）

hash_map的key為自定義型別

Groovy將字串型別轉換為自定義型別的方法

Dictionary使用自定義型別為KEY

6-4 求自定型別元素的平均（10 分）本題要求實現一個函式，求N個集合元素S[]的平均值，其中集合元素的型別為自定義的ElementType。

金額轉換為自定義字符串

資料遷移之excel修改時間字串為自定義格式或時間戳

自定義型別結構體型別建立結構體初始化

Qt中QVariant儲存自定義型別

自定義型別的建立

Sybase還原資料庫，業務表為其他使用者所屬，怎樣變更表所有者為自定義使用者。。

postgresql自定義型別並返回陣列

JAVA 排序自定義型別元素集合

spring data jpa 查詢自定義欄位，轉換為自定義實體

Hadoop自定義型別處理手機上網日誌

自定義型別轉換器

Android進階AIDL使用自定義型別

8.引數繫結-自定義型別(傳智播客)

ECharts圖表外掛(4.x版本)使用（一、關係圖force節點顯示為自定義影象/圖片，帶分類選擇）

Object型別轉換成自定義型別(向下轉型)

hash_map的key為自定義型別

相關推薦