1. 程式人生 > >CS107-Lecture 5-Note

CS107-Lecture 5-Note

lsearch

在Lecture 4和5介紹lsearch的設計過程中,我有一些小感悟:程式設計的問題有很多種答案,優秀的答案不是大筆一揮就躍然紙上的,而是不斷地思考完善,不斷地根據需求(比如採用的語言,針對的資料型別)優化得到的。言歸正傳,繼續上次課的內容:針對specific資料型別的lsearch –> generic的lsearch –> generic lsearch中自定義comparison函式。

Program. 1. lsearch (generic data type)

void *lsearch(void *key, void *base, int n, int
elemSize, int (*cmpfn)(void *, void *)) { for (int i=0; i<n; i++) { void *elemAddr = (char*)base + i*elemSize; if(cmpfn(elemAddr, key) == 0) //memcmp有侷限性,這裡實現自定義cmpfn return elemAddr; } return NULL; }

line 1,Jerry習慣將lsearch函式定義中的cmpfn表示成(*cmpfn),儘管不加”()”結果也一樣。不加表示返回一個指向int型變數的指標(本質是返回指標),加了”()”表明cmpfn是一個函式指標,指向函式首地址,該函式返回一個int值。

Program. 2. how to call lsearch

int array[] = {4, 2, 3, 7, 11, 6};
int size = 6;   //I'll just hard code it as 6.
int number = 7; //search for the number of 7
int *found = lsearch(&number, array, size, sizeof(int), IntCmp);

if(found == NULL) :-(
else :-)

line 4, array前不需要&,因為array隱式地包含了&。lsearch前兩個引數,無論傳入的值是什麼資料型別,在實現的時候都被視為void型指標,(經過強制型別轉換後)用於接下來的指標的算術運算。

在call lsearch前首先implement comparison函式:

Program. 3. implement IntCmp()

int IntCmp(void *elem1, void *elem2)
{
    int *ip1 = elem1; //為了和lsearch中的引數型別完全匹配,強制轉換為int*
    int *ip2 = elem2; //至於為啥非要int *我暫時也沒理解
    return *ip1-*ip2;
}

講完了這個針對integer的comparison函式,Jerry又比較了C實現的comparison泛型和其它語言中的templates:在C和其各種古老的specification中,能夠做到如此輕量和快速已經很cool;現有各種語言中的template則more type safe、compiler time時more information get,但也存在code bloat問題。

Jerry: You have to recognize that this is not exactly the most elegant way, it just the best that C, with its specification that was more or less defined 35 years ago, can actually do. All the other languages you’ve ever heard of, they are all so much younger that they’ve learned from C’s mistakes, and they have better solution for supporting generics. There’re some plus to this. It’s very fast. You only use one copy of the code, ever, to do all of your linear searching. The template approach, it’s more type safe. You get more information at compiler time, but you get code bloat because you’ve got one instance of that lsearch algorithm for every single data type you ever searched for.

實現了int資料型別的比較函式,接下來實現字串資料型別的比較函式。Jerry打了預防針:”This gets a lot more complicated when you start dealing with the problem of lsearching an array of C-strings. So, you’re going to have an array of char *’s, and you’re gonna have to search for a particular char * to see whether you have a match or not.”

Program. 4. implement StrCmp()

char *notes[] = {"Ab", "F#", "B", "Gb", "D"};
char *favoriteNote = "Eb";
char **found = lsearch(&favoriteNote, notes, 5, sizeof(char *), StrCmp);

int StrCmp(void *vp1, void *vp1)
{
    char *s1 = *(char **)vp1;
    char *s2 = *(char **)vp2;
}

line 1,字串陣列notes的儲存需要理解:

Jerry: They’re not in the heap, they’re actually global variables that happen to be constant. It’s like normal global variables, except they happen to be character arrays that reside up there, and these are replaced at load time with the base address of the A, F and the D.

用過Java的話,應該對這種儲存很熟悉。Java也會在new一個String物件時進行優化,防止效能較差或記憶體洩漏。一種方式是將String換成StringBuilder,因為String物件是immutable(不可變)的,對它的修改總會生成新物件;另一種方式是在某些情況下可以依靠編譯器,比如連線靜態字串時如String test = "1"+"0"+"1";,編譯器是不會在連線過程中生成3個String物件的。

line 3,變數found的型別“char **”需要理解:

這和IntCmp中的int *found如出一轍,都是已知要查詢的資料型別X*,將found型別定位為X*的指標。

line 6, 7, 對vp1是不能直接解引用的:

因為編譯器無法解釋void*,但編譯器理解void**解引用後是void *,所以才有了先對vp1強制型別轉換為char **,再解引用。但如果在呼叫lsearch的時候,傳入的是favoriteNote而不是&favoriteNote,那麼這裡可以直接char *s1 = (char *)vp1,但會使得函式實現不對稱。

Figure. 1. StrCmp中的兩跳指標和解引用

這裡寫圖片描述

以上就講完了lsearch。Jerry提了下作業的情況:“Now, for Assignment 2, search certainly comes up. As opposed to all of these examples, you know that there are some sordid flavor to the arrays that you’re searching there. If you haven’t read Assignment 2, again, I’ll try to be as generic as possible in my description. But you basically have the opportunity to binary search as opposed to linear search for Assignment 2.”課上到這,學生應該完成了Assignment 1並預習了Assignment 2,在能夠用lsearch解決問題的基礎上嘗試使用bsearch。

bsearch

Jerry: There’s a built-in function called bsearch. It turns out that there’s a built-in function called lsearch as well. It’s not technically standard, but almost all compilers provide it, at least on UNIX systems. I’m gonna want you to use the generic bsearch algorithm which has more or less the same prototype as lsearch right here.

This is the prototype of the built-in bsearch:

void *bsearch(void *key, void *base, int n, int elemSize, int (*cmp)(void *, void*));

Jerry在這裡強調了int (*cmp)(void *, void *)的性質,即cmp是純函式不是方法,即使在Java和C++的大類裡,也必須是和類無關的全域性函式或static 函式,因為一旦涉及到方法,那麼cmp就會隱式地包含this指標接收傳入物件的地址。Jerry解釋了函式和方法的區別(用Java和C++的話也是常說方法不說函式的):

Jerry:“The difference between a function and a method, they look very similar, except that methods actually have the address of the relevant object lying around as this invisible paramter via this parameter called this.”

用C的語法實現棧

模仿C++和Java中的templates或泛型,用C的語法struct儘可能實現“類”定義,同時將應用前提限制為int型資料。

Program. 5. implement a stack data structure

stack.h

typedef struct {
    int *elems;
    int logicalLen; //已經使用了多少空間
    int allocLen;   //動態申請了多少空間
}stack;

void StackNew(stack *s);
void StackDispose(stack *s); 
void StackPush(stack *s, int value);
void StackPop(stack *s);

首先,C中沒有class關鍵字,可以用struct類比;
其次,C中沒有const, public, private(Jerry說他們的編譯器支援C語言中的const?好神奇);
最後,technically,三個int域都是暴露在外的,相當於public,操作這些域應該使用“函式”,而不是“方法”。

Program. 6. how to call stack

stack s;      //宣告一塊12位元組大小的記憶體,但編譯器不會對這塊記憶體進行清理
StackNew(&s); //初始化時:申請4個位元組,一開始使用了0位元組
for(int i=0; i<5; i++)
{
    StackPush(&s, i);
}
StackDispose(&s);

因為已經預先申請了4個位元組,所以這4個位元組在初始化時很快,因為已經被預留。如果想push第5個的話,則會doubling strategy另外尋找一個4*2的記憶體,將前4個位元組copy過來dispose舊空間,再push第5個。

Program. 7. how to call stack

void StackNew(stack *s) //s是一個區域性變數(地址值),指向一塊12位元組大小的記憶體
{
    s->logicalLen = 0;
    s->allocLen = 4;
    s->elems = malloc(4*sizeof(int));
    assert(s->elems != NULL);
}

line 5, malloc是Java和C++中new的前身,Operator new會隱式地考慮資料型別,例如new int[4]new double[20],malloc只會從heap中找出這樣一個塊,返回該塊的地址。

line 6, malloc一般都會和assert成對使用。It’s actually not a function it’s actually a macro. 如果測試結果為true,則什麼都不執行;為false,assert會終止程式,compiler會告訴你執行程式碼的檔案號(file number)和終止的assert語句的行號(line number of the assert that broke)。如果沒有assert,記憶體申請失敗(雖然不太可能),編譯器會返回NULL,那麼在程式接下來執行的某處,對NULL解引用,導致程式崩潰。

小tip:對seg fault或bus error,可以查一下發生seg fault行的assert。