SSE2介紹及其簡單用法舉例
SSE2,全名為Streaming SIMD Extensions 2,是一種IA-32架構的SIMD指令集。SSE2是在2001年隨著Intel發表第一代Pentium 4處理器也一併推出的指令集。它延伸較早的SSE指令集,而且可以完全取代MMX指令集。在2004年,Intel再度擴充套件了SSE2指令為SSE3指令集。與70條指令的SSE相比,SSE2 新增了144條指令。在2003年,AMD也在釋出AMD64的64位元處理器時跟進SSE2指令集。
SSE2延伸了MMX指令群使用XMM暫存器來運算,這能讓開發人員完全避免讓8個64位元MMX暫存器與原有的IA-32浮點運算暫存器共用。而這樣子就能夠不需要切換MMX與x87浮點運算的前提之下混合SIMD標量與浮點向量運算。不過,這不會因為SSE的暫存器的精度提高而讓運算結果的精度也提高。而還有部分的SSE2指令集包含了一系列的
MMX與SSE2的差別:SSE2讓MMX指令群使用XMM暫存器來運算。換句話說,現有的MMX指令碼能夠完全轉換成SSE2。不過XMM的暫存器是MMX暫存器的兩倍大,迴圈計數器與儲存器訪問機制也會跟著修改來因此變化。而即使一個SSE2指令能夠比MMX指令操作多兩倍資料量,效能也並沒有很明顯的提升。有兩個主要原因導致此現象:儲存器內部訪問SSE2的資料並沒有以16位元組的間隔
支援SSE2的編譯器:(1)、微軟的Visual C++與MASM;(2)、Intel C++ 編譯器;(3)、GCC 3及更高版本;(4)、Sun Studio Compiler Suite。
不支援SSE2處理器的共同特點:SSE2是IA-32架構的延伸。所以目前所有不支援IA-32架構的其他架構一概不支援SSE2。由於
SSE2 是一套由越來越多的第三方應用和驅動程式使用的關於處理器的標準指令集。
SSE2 was first introduced on the IntelPentium 4, and are also known sometimes as "Willamette" instructions.These instructions are very similar to the SSE instructions in structure, butallow us considerably more flexibility in crunching numbers. The biggestdifferences between SSE and SSE2 were the ability to deal withdouble-precision, or 64bit, floating-point values as well as with 32bit ones,along with the ability to now work on 128bit integer types in XMM registers aswell. In total, 144 new instructions were added.
簡單用法舉例:
#include "stdafx.h"//#include <mmintrin.h>//mmx header file//#include <xmmintrin.h>//sse header file(include mmx header file)#include <emmintrin.h>//sse2 header file(include sse header file)void Integer_Add(const unsigned char* p1, const unsigned char* p2, unsigned char* p3, int num){ __m128i m1 = _mm_loadu_si128((__m128i*)p1); __m128i m2 = _mm_loadu_si128((__m128i*)p2); __m128i m3 = _mm_add_epi8(m1, m2); _mm_storeu_si128((__m128i*)p3, m3);}void Integer_Sub(const unsigned char* p1, const unsigned char* p2, unsigned char* p3, int num){ __m128i m1 = _mm_loadu_si128((__m128i*)p1); __m128i m2 = _mm_loadu_si128((__m128i*)p2); __m128i m3; m3 = _mm_setzero_si128(); m3 = _mm_subs_epi8(m1, m2); _mm_storeu_si128((__m128i*)p3, m3);}void Integer_Avg(const unsigned char* p1, const unsigned char* p2, unsigned char* p3, int num){ __m128i m1 = _mm_loadu_si128((__m128i*)p1); __m128i m2 = _mm_loadu_si128((__m128i*)p2); __m128i m3; m3 = _mm_setzero_si128(); m3 = _mm_avg_epu8(m1, m2); _mm_storeu_si128((__m128i*)p3, m3);}int _tmain(int argc, _TCHAR* argv[]){ const int num = 16; unsigned char array1[num] = {0x00, 0x10, 0x20, 0x30, 0x40, 0x50, 0x60, 0x70, 0x80, 0x90, 0xa0, 0xb0, 0xc0, 0xd0, 0xe0, 0xf0}; unsigned char array2[num] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f}; unsigned char array3[num] = {0}; Integer_Add(array1, array2, array3, num); Integer_Sub(array1, array2, array3, num); Integer_Avg(array1, array2, array3, num); return 0;}
參考文獻: