memcpy() 函式的效率與平臺相關.
阿新 • • 發佈:2019-02-14
先來看看微軟開發工具下的 memcpy() 原始碼(E:\Microsoft Visual Studio 9.0\VC\crt\src):
/*** *memcpy.c - contains memcpy routine * * Copyright (c) Microsoft Corporation. All rights reserved. * *Purpose: * memcpy() copies a source memory buffer to a destination buffer. * Overlapping buffers are not treated specially, so propogation may occur. * *******************************************************************************/ #include <cruntime.h> #include <string.h> #ifdef _MSC_VER #pragma function(memcpy) #endif /* _MSC_VER */ /*** *memcpy - Copy source buffer to destination buffer * *Purpose: * memcpy() copies a source memory buffer to a destination memory buffer. * This routine does NOT recognize overlapping buffers, and thus can lead * to propogation. * * For cases where propogation must be avoided, memmove() must be used. * *Entry: * void *dst = pointer to destination buffer * const void *src = pointer to source buffer * size_t count = number of bytes to copy * *Exit: * Returns a pointer to the destination buffer * *Exceptions: *******************************************************************************/ void * __cdecl memcpy ( void * dst, const void * src, size_t count ) { void * ret = dst; #if defined (_M_IA64) { __declspec(dllimport) void RtlCopyMemory( void *, const void *, size_t count ); RtlCopyMemory( dst, src, count ); } #else /* defined (_M_IA64) */ /* * copy from lower addresses to higher addresses */ while (count--) { *(char *)dst = *(char *)src; dst = (char *)dst + 1; src = (char *)src + 1; } #endif /* defined (_M_IA64) */ return(ret); }
在 16/32 位系統中, 一次拷貝一個位元組的情況是非常浪費 CPU 效率的. 因為他們一般都要半字或字對齊. 讀寫資料一次就是 16/32bit. 如果在奇數地址上訪問一個位元組效率可想而知. 所以, 對於像 ARM 這種 4Byte 對齊的CPU而言下面的這種寫法是效率最高的, 而且效率相比於一次一位元組的情況, 不止是 4 倍的效率增長:
void my_memcpy(void * dest, const void * src, unsigned int n) { unsigned int i = 0; long * Dest = (long *)dest; long * Src = (long *)src; for (i = 0; i < (n >> 2); i++) { Dest[i] = Src[i]; } }
當然了, 如果不能保證使用的 CPU 平臺是 4B 對齊的, 可以在上述程式中新增程式碼來儘量保證實現高效率.
而這種調整隻有在對效能要求敏感的場合使用, 如果不是這樣還是要使用標準的庫函式. 畢竟可移植性和可維護性也是很重要的.void my_memcpy(void * dest, const void * src, unsigned int n) { unsigned int i = 0; long * Dest = (long *)dest; long * Src = (long *)src; if (((unsigned long)Src) % 4 == 0) && ((unsigned long)Dest % 4 == 0) { for (i = 0; i < (n >> 2); i++) { Dest[i] = Src[i]; } } else { memcpy(dest, src, n); } }
===============================================================================================================================