C++ 系列：unordered_map

阿新 • • 發佈：2017-09-27

find inf bool .net author comm windows 測試 oos

1.結論
新版的hash_map都是unordered_map了，這裏只說unordered_map和map.
運行效率方面：unordered_map最高，而map效率較低但提供了穩定效率和有序的序列。
占用內存方面：map內存占用略低，unordered_map內存占用略高,而且是線性成比例的。
需要無序容器，快速查找刪除，不擔心略高的內存時用unordered_map；有序容器穩定查找刪除效率，內存很在意時候用map。
2.原理

map的內部實現是二叉平衡樹(紅黑樹)；hash_map內部是一個hash_table一般是由一個大vector，vector元素節點可掛接鏈表來解決沖突，來實現.

hash_map其插入過程是：
得到key
通過hash函數得到hash值
得到桶號(一般都為hash值對桶數求模)
存放key和value在桶內。

其取值過程是:
得到key
通過hash函數得到hash值
得到桶號(一般都為hash值對桶數求模)
比較桶的內部元素是否與key相等，若都不相等，則沒有找到。
取出相等的記錄的value。

hash_map中直接地址用hash函數生成，解決沖突，用比較函數解決。
3.內存占用測試

測試代碼：
測試條件window下，VS2015 C++。string為key, int 為value。
1.UnorderMap:
[cpp] view plain copy
#include <unordered_map>
#include <string>
#include <iostream>
#include <windows.h>
#include <psapi.h>
#pragma comment(lib,"psapi.lib")
using namespace std;
using namespace stdext;
void showMemoryInfo(void)
{
HANDLE handle = GetCurrentProcess();
PROCESS_MEMORY_COUNTERS pmc;
GetProcessMemoryInfo(handle, &pmc, sizeof(pmc));
cout << "Memory Use:" << pmc.WorkingSetSize/1024.0f << "KB/" << pmc.PeakWorkingSetSize/1024.0f << "KB, Virtual Memory Use:" << pmc.PagefileUsage/1024.0f << "KB/" << pmc.PeakPagefileUsage/1024.0f << "KB" << endl;
}

//define the class
/*-------------------------------------------*/
/*函數類
*作為hash_map的hash函數
*string沒有默認的hash函數
*/
class str_hash {
public:
size_t operator()(const string& str) const
{
unsigned long __h = 0;
for (size_t i = 0; i < str.size(); i++)
__h = 5 * __h + str[i];
return size_t(__h);
}
};

/*-------------------------------------------*/
/*函數類
*作為hash_map的比較函數 )
*(查找的時候不同的key往往可能對用到相同的hash值
*/
class str_compare
{
public:
bool operator()(const string& str1, const string& str2)const
{
return str1 == str2;
}
};

struct CharLess : public binary_function<const string&, const string&, bool>
{
public:
result_type operator()(const first_argument_type& _Left, const second_argument_type& _Right) const
{
return(_Left.compare(_Right) < 0 ? true : false);
}
};

int main()
{

cout << "Test HashMap(unorder map) Memory Use Start..."<< endl;
// VC下自定義類型
unordered_map<string, int, hash_compare<string, CharLess> > CharHash;
for (int i = 0; i < 10000000; i++)
{
string key = to_string(i);
CharHash[key] = i;
}
cout << "Test HashMap(unorder map) Memory Use End." << endl;
showMemoryInfo();
while (1);
return 0;
}

2.map:
[cpp] view plain copy
#include <iostream>
#include <map>
#include <string>
#include <windows.h>
#include <psapi.h>
#pragma comment(lib,"psapi.lib")
using namespace std;
void showMemoryInfo(void)
{
HANDLE handle = GetCurrentProcess();
PROCESS_MEMORY_COUNTERS pmc;
GetProcessMemoryInfo(handle, &pmc, sizeof(pmc));
cout << "Memory Use:" << pmc.WorkingSetSize / 1024.0f << "KB/" << pmc.PeakWorkingSetSize / 1024.0f << "KB, Virtual Memory Use:" << pmc.PagefileUsage / 1024.0f << "KB/" << pmc.PeakPagefileUsage / 1024.0f << "KB" << endl;
}

int main()
{
cout << "Test Map(Red-Black Tree) Memory Use Start..." << endl;
// VC下自定義類型
//map<const char*, int, hash_compare<const char*, CharLess> > CharHash;
map<string, int> CharMap;
for (int i = 0; i < 10000000; i++)
{
string key = to_string(i);
CharMap[key] = i;
}
cout << "Test Map(Red-Black Tree) Memory Use End." << endl;
showMemoryInfo();
while (1);
return 0;
}

測試結果：
1000個元素：
map:

unorder_map:

10萬個元素：
map:

unorder_map:

1000萬個元素：
map:

unorder_map:

可以看到unordermap始終比map內存空間占用量大些，而且是線性成比例的。

4.性能特點

非頻繁的查詢用map比較穩定；頻繁的查詢用hash_map效率會高一些，c++11中的unordered_map查詢效率會更高一些但是內存占用比hash_map稍微大點。unordered_map 就是 boost 裏面的 hash_map 實現。

其實，stl::map對於與java中的TreeMap，而boost::unordered_map對應於java中的HashMap。
python中的map就是hashmap實現的，所以查詢效率會比C++的map查詢快。(java,python官方版的虛擬機都是用C語言實現的，所以內部的思想和方法都是通用的。)

若考慮有序，查詢速度穩定，容器元素量少於1000,非頻繁查詢那麽考慮使用map。
若非常高頻查詢(100個元素以上，unordered_map都會比map快)，內部元素可非有序，數據大超過1k甚至幾十萬上百萬時候就要考慮使用unordered_map(元素上千萬上億時4GB的內存就要擔心內存不足了,需要數據庫存儲過程挪動到磁盤中)。
hash_map相比unordered_map就是千萬級別以上內存占用少15MB,上億時候內存占用少300MB，百萬以下都是unordered_map占用內存少，
且unordered_map插入刪除相比hash_map都快一倍，查找效率相比hash_map差不多，或者只快了一點約1/50到1/100。
綜合非有序或者要求穩定用map，都應該使用unordered_map,set類型也是類似的。
unordered_map 查找效率快五倍，插入更快，節省一定內存。如果沒有必要排序的話，盡量使用 hash_map(unordered_map 就是 boost 裏面的 hash_map 實現)。
5.使用unordered_map

unordered_map需要重載hash_value函數，並重載operator ==運算符。
詳細參考見(感謝orzlzro寫的這麽好的文章)：
http://blog.csdn.net/orzlzro/article/details/7099231
6.使用Hash_map需要註意的問題

/**
*\author peakflys
*\brief 演示hash_map鍵值更改造成的問題
*/
#include <iostream>
#include <ext/hash_map>
struct Unit
{
char name[32];
unsigned int score;
Unit(const char *_name,const unsigned int _score) : score(_score)
{
strncpy(name,_name,32);
}
};
int main()
{
typedef __gnu_cxx::hash_map<char*,Unit*> uHMap;
typedef uHMap::value_type hmType;
typedef uHMap::iterator hmIter;
uHMap hMap;
Unit *unit1 = new Unit("peak",100);
Unit *unit2 = new Unit("Joey",20);
Unit *unit3 = new Unit("Rachel",40);
Unit *unit4 = new Unit("Monica",90);
hMap[unit1->name] = unit1;
hMap[unit2->name] = unit2;
hMap.insert(hmType(unit3->name,unit3));
hMap.insert(hmType(unit4->name,unit4));
for(hmIter it=hMap.begin();it!=hMap.end();++it)
{
std::cout<<it->first<<"\t"<<it->second->score<<std::endl;//正常操作
}
for(hmIter it=hMap.begin();it!=hMap.end();++it)
{
Unit *unit = it->second;
//hMap.erase(it++);
delete unit; //delete釋放節點內存，但是hMap沒有除去,造成hMap內部錯亂，有可能宕機
}
hmIter it = hMap.begin();
strncpy(it->first,"cc",32);//強行更改
for(hmIter it=hMap.begin();it!=hMap.end();++it)
{
std::cout<<it->first<<"\t"<<it->second->score<<std::endl;//死循環，原因參加上面++操作說明
/*operator++ 操作是從_M_cur開始，優先_M_cur->_M_next，為空時遍歷vector直至找到一個_M_cur不為空的節點，遍歷vector 時需要取它對應的桶位置(參砍上面hash_map取值過程)，_M_bkt_num_key(key)中key的值是修改後的值，假如你改的鍵值，通過此函數得到的桶位置在你當前元素之前，這樣就造成了死循環.
*/
}
return 0;
}
7.VC下參考實例

[cpp] view plain copy
#include "stdafx.h"

// 存放過程：key->hash函數->hash值對桶數求模得到桶號(桶有值則解決沖突),存放key和value在桶內
// 取回過程：key->hash函數->hash值對桶數求模得到桶號(桶有值則解決沖突),比較桶內的key是否相等，
// 若不相等則返回空叠代器，否則返回叠代器。

// 1.hash_map為下面類型的key定義了hash尋址函數(用於從key到hash值）和哈希比較函數(用於解決沖突）。
//struct hash<char*>
//struct hash<const char*>
//struct hash<char>
//struct hash<unsigned char>
//struct hash<signed char>
//struct hash<short>
//struct hash<unsigned short>
//struct hash<int>
//struct hash<unsigned int>
//struct hash<long>
//struct hash<unsigned long>
// 內建的類型直接 hash_map<int, string> mymap;像普通map一樣使用即可。

// 2.自定義hash函數和比較函數
//在聲明自己的哈希函數時要註意以下幾點：

//使用struct，然後重載operator().
//返回是size_t
//參數是你要hash的key的類型。
//函數是const類型的。

// 定義自己的比較函數：
//使用struct，然後重載operator().
//返回是bool
//參數是你要hash的key的類型的兩個常量參數，用於比較。
//函數是const類型的。

// 自定義hash函數和比較函數的使用：
// hash_map<ClassA, string, hash_A, equal_A> hmap;

// 3.hash_map使用的常用函數

//hash_map的函數和map的函數差不多。具體函數的參數和解釋，請參看：STL 編程手冊：Hash_map，這裏主要介紹幾個常用函數。
//
//hash_map(size_type n) 如果講究效率，這個參數是必須要設置的。n 主要用來設置hash_map 容器中hash桶的個數。
//桶個數越多，hash函數發生沖突的概率就越小，重新申請內存的概率就越小。n越大，效率越高，但是內存消耗也越大。
//
//const_iterator find(const key_type& k) const. 用查找，輸入為鍵值，返回為叠代器。
//
//data_type& operator[](const key_type& k) . 這是我最常用的一個函數。因為其特別方便，可像使用數組一樣使用。
//不過需要註意的是，當你使用[key ]操作符時，如果容器中沒有key元素，這就相當於自動增加了一個key元素。
//因此當你只是想知道容器中是否有key元素時，你可以使用find。如果你希望插入該元素時，你可以直接使用[]操作符。
//
//insert 函數。在容器中不包含key值時，insert函數和[]操作符的功能差不多。但是當容器中元素越來越多，
//每個桶中的元素會增加，為了保證效率，hash_map會自動申請更大的內存，以生成更多的桶。因此在insert以後，
//以前的iterator有可能是不可用的。
//
//erase 函數。在insert的過程中，當每個桶的元素太多時，hash_map可能會自動擴充容器的內存。
//但在sgi stl中是erase並不自動回收內存。因此你調用erase後，其他元素的iterator還是可用的。

#include <hash_map>
#include <string>
#include <iostream>
using namespace std;
using namespace stdext;
//define the class
/*-------------------------------------------*/
/*函數類
*作為hash_map的hash函數
*string沒有默認的hash函數
*/
class str_hash{
public:
size_t operator()(const string& str) const
{
unsigned long __h = 0;
for (size_t i = 0 ; i < str.size() ; i ++)
__h = 5*__h + str[i];
return size_t(__h);
}
};

/*-------------------------------------------*/
/*函數類
*作為hash_map的比較函數 )
*(查找的時候不同的key往往可能對用到相同的hash值
*/
class str_compare
{
public:
bool operator()(const string& str1,const string& str2)const
{return str1==str2;}
};

struct CharLess : public binary_function<const char*, const char*, bool>
{
public:
result_type operator()(const first_argument_type& _Left, const second_argument_type& _Right) const
{
return(strcmp(_Left, _Right) < 0 ? true : false);
}
};

int main()
{
// 內建類型
hash_map<int,string> myHashMap;
myHashMap[0] = "JesseCen";
myHashMap[1] = "OZZ";
hash_map<int,string>::iterator itrHash = myHashMap.find(0);
if(itrHash != myHashMap.end())
{
cout<<"My Name is:"<<itrHash->second.c_str()<<endl;
}

// VC下自定義類型
hash_map<const char*, int, hash_compare<const char*, CharLess> > CharHash;
CharHash["a"] = 123;
CharHash["b"] = 456;
hash_map<const char*, int, hash_compare<const char*, CharLess> >::iterator itrChar = CharHash.find("b");
if( itrChar != CharHash.end())
{
cout<<"The find number is:"<< itrChar->second<<endl;
}

return 0;
}

C++ 系列：unordered_map

find inf bool .net author comm windows 測試 oos 1.結論新版的hash_map都是unordered_map了，這裏只說unordered_map和map.運行效率方面：unordered_map最高，而map效率較低但提供了穩

C++ 系列：unordered_map

C++ 系列：unordered_map

C++ 系列：extern

C++ 系列：多線程

詳解C# 網絡編程系列：實現類似QQ的即時通信程序

C#強化系列：HttpModule，HttpHandler，HttpHandlerFactory簡單使用

C#設計模式系列：橋接模式（Bridge）

c#設計模式系列：模板方法模式（Template Method Pattern）

c#設計模式系列：命令模式（Command Pattern）

C#知識點總結系列：3、C#中Delegate和Event以及它們的區別

詳解C# 網路程式設計系列：實現類似QQ的即時通訊程式

目標定位和檢測系列：交併比（IOU）和非極大值抑制（NMS）的python與C/C++實現

學習C++系列一：命名約定

Win10系列：C#應用控制元件進階4

Win10系列：C#應用控制元件進階10

Win10系列：C#應用控制元件進階7

Win10系列：C#應用控制元件進階6

Win10系列：C#應用控制元件進階9

Win10系列：C#應用控制元件進階8

Win10系列：C#應用控制元件進階2

Win10系列：C#應用控制元件進階5

C++ 系列：unordered_map

相關推薦