編譯原理實驗---詞法分析
一、實驗概述
1.1、實驗要求
選擇計算機高階語言之一-----C語言,運用恰當的此法分析技術線路,設計和實現其對應的詞法分析器。
建議:程式語言,選擇《計算機程式設計》課程所採用的語言。
提示:技術線路選擇如下兩種之一:
正則式→NFA→DFA→minDFA→程式設計
或正則文法→NFA→DFA→minDFA→程式設計。
要求:分析器輸出結果存入到磁碟檔案中,具有出錯處理功能。
1.2、實驗目的
1)加深對編譯原理及其構造詞法分析器的原理和技術理解與應用,進一步提高學生程式設計能力;
2)培養、提高學生分析問題、解決問題的綜合能力;
3)整理資料,撰寫規範的實驗報告;
二、系統分析
2.1、系統需求
根據C語言語法,待分析的詞可以分為如下幾類:
(1) 關鍵字
如if, else, whlile, int 等。
(2) 標示符
開頭只能為字母,後面可以接數字或者字母,用來表示各種名字,如變數名、常量名和過程名等
(3) 常數
各種型別的常數,如整型(1, 30),浮點型(2.16),字串型(”AHD”),字元型(’A’)
(4) 運算子與界符
如+, *, <= , 逗號等。
2.2、系統功能
讀入一個C語言源程式(經過預處理的),對每一個單詞輸出一些三元組的集合。
2.3、系統實現步驟
按照如下順序構造詞法分析器:
(1) 設計出各類單詞的正規式,畫出有限狀態自動機。
(2) 將各類單詞的正規式轉換成相應的NFA M,並將其合併成一個NFA M`
(3) 將NFA M`轉換成對應的DFA M``
(4) 將DFA M``最小化為DFA M```
(5) 根據DFA M```用C語言設計出相應的詞法分析器。
三、系統設計
3.1、有限狀態自動機設計
狀態機說明:由於單詞的構成較為複雜,所以再設計時,邊的變遷不再是一個字元,而是一個函式。若當前輸入串滿足該函式,則當前狀態可以變遷到該邊連線的下一狀態。
根據終態可以看出自動機可以分離的狀態有:
INT |
整數 |
FLOAT |
浮點數 |
CHAR |
字元型 |
CHARS |
字串型 |
IDENT |
識別符號(包括關鍵字) |
SYMBOL |
符號 |
其中,關鍵字的分離在輔助程式中進行。
3.2、單詞符號對應的種別碼
種別碼 |
單詞符號及說明 |
種別碼 |
單詞符號及說明 |
0 |
INT(整數) |
30 |
>= |
1 |
FLOAT(浮點數) |
31 |
<= |
2 |
CHAR(字元型) |
32 |
> |
3 |
CHAR(字串型) |
33 |
< |
4 |
IDENT(識別符號) |
34 |
== |
5 |
if |
35 |
= |
6 |
else |
36 |
!= |
7 |
int |
37 |
++ |
8 |
char |
38 |
+= |
9 |
float |
39 |
+ |
10 |
double |
40 |
/ |
11 |
long |
41 |
- |
12 |
short |
42 |
\ |
13 |
return |
43 |
; |
14 |
while |
44 |
( |
15 |
break |
45 |
) |
16 |
46 |
{ |
|
17 |
47 |
} |
|
18 |
48 |
[ |
|
19 |
49 |
] |
|
20 |
50 |
: |
|
21 |
51 |
-> |
|
22 |
52 |
? |
|
23 |
53 |
, |
|
24 |
54 |
. |
|
25 |
55 |
* |
|
26 |
56 |
||
27 |
57 |
||
28 |
58 |
||
29 |
59 |
3.3、基本資料結構及程式碼設計
程式碼是用C++完成的。為狀態機定義了兩個基本的結構體,分別為STATE和LIST。其中STATE是LIST的友元,STATE表示的是狀態機中的一個狀態,包括error和start等狀態。LIST的例項是依附於一個STATE的例項存在的,他表示一條邊,邊的值是一個函式指標,該邊指向一個滿足該函式的另一狀態。
另外用到了STL庫中的MAP模板,定義為map<string,string>type用以存放關鍵字及其對應的種別碼。
程式的輸出為三元組的集合,其中三元組定義為<單詞名, 單詞含義 ,種別碼>。如一個識別符號abc的三元組為<abc, IDENT , 4> 。若某個單詞錯誤,則會輸出 error: name。 輸出的最後會顯示共識別了多少單詞,並發現多少錯誤。
四、系統實現
4.1 系統執行
l 在命令列裡直接輸入待翻譯的檔案和輸出的檔名。如果沒有給引數,預設為輸入”input.txt”,輸出”output.txt”。
l 輸出的結果。
4.2 系統結果
l input.txt
其中,第六行為錯誤行。
int main()
{
freopen("input.txt","r",stdin);
char input[255],*s = input;
int t = 1;
floatp = 12.4;
int 0a = 2;
init();
while(gets(s))
{
curn = 0;
printf("Line %d:\n",t++);
while((*s)!= 0)
{
while((*s) == ' ') s++;
curn = 0;
curtype = 0;
print(s,start->start(s));
s += curn;
}
}
return 0;
}
l output.txt
********Line 1*********:
<int, int, 7>
<main, IDENT, 3>
<(, (, 34>
<), ), 35>
********Line 2*********:
<{, {, 36>
********Line 3*********:
<freopen, IDENT, 3>
<(, (, 34>
<"input.txt",CHARS, 2>
<,, ,, 43>
<"r", CHARS, 2>
<,, ,, 43>
<stdin, IDENT, 3>
<), ), 35>
<;, ;, 33>
********Line 4*********:
<char, char, 8>
<input, IDENT, 3>
<[, [, 38>
<255, INT, 0>
<], ], 39>
<,, ,, 43>
<*, *, 45>
<s, IDENT, 3>
<=, =, 25>
<input, IDENT, 3>
<;, ;, 33>
********Line 5*********:
<int, int, 7>
<t, IDENT, 3>
<=, =, 25>
<1, INT, 0>
<;, ;, 33>
********Line 6*********:
<float, float, 9>
<p, IDENT, 3>
<=, =, 25>
<12.4, FLOAT, 1>
<;, ;, 33>
********Line 7*********:
<int, int, 7>
error: 0a
<=, =, 25>
<2, INT, 0>
<;, ;, 33>
********Line 8*********:
********Line 9*********:
<init, IDENT, 3>
<(, (, 34>
<), ), 35>
<;, ;, 33>
********Line 10*********:
********Line 11*********:
<while, while, 14>
<(, (, 34>
<gets, IDENT, 3>
<(, (, 34>
<s, IDENT, 3>
<), ), 35>
<), ), 35>
********Line 12*********:
<{, {, 36>
********Line 13*********:
<curn, IDENT, 3>
<=, =, 25>
<0, INT, 0>
<;, ;, 33>
********Line 14*********:
<printf, IDENT, 3>
<(, (, 34>
<"Line %d:\n",CHARS, 2>
<,, ,, 43>
<t, IDENT, 3>
<++, ++, 27>
<), ), 35>
<;, ;, 33>
********Line 15*********:
<while, while, 14>
<(, (, 34>
<(, (, 34>
<*, *, 45>
<s, IDENT, 3>
<), ), 35>
<!=, !=, 26>
<0, INT, 0>
<), ), 35>
********Line 16*********:
<{, {, 36>
********Line 17*********:
<while, while, 14>
<(, (, 34>
<(, (, 34>
<*, *, 45>
<s, IDENT, 3>
<), ), 35>
<==, ==, 24>
<' ', CHAR, 2>
<), ), 35>
<s, IDENT, 3>
<++, ++, 27>
<;, ;, 33>
********Line 18*********:
<curn, IDENT, 3>
<=, =, 25>
<0, INT, 0>
<;, ;, 33>
********Line 19*********:
<curtype, IDENT, 3>
<=, =, 25>
<0, INT, 0>
<;, ;, 33>
********Line 20*********:
<print, IDENT, 3>
<(, (, 34>
<s, IDENT, 3>
<,, ,, 43>
<start, IDENT, 3>
<-, -, 31>
<>, >, 23>
<start, IDENT, 3>
<(, (, 34>
<s, IDENT, 3>
<), ), 35>
<), ), 35>
<;, ;, 33>
********Line 21*********:
<s, IDENT, 3>
<+=, +=, 28>
<curn, IDENT, 3>
<;, ;, 33>
********Line 22*********:
<}, }, 37>
********Line 23*********:
<}, }, 37>
********Line 24*********:
********Line 25*********:
<return, return, 13>
<0, INT, 0>
<;, ;, 33>
********Line 26*********:
<}, }, 37>
*******************************
1 error!
116 Word Have Been Found Out!
原始碼:
[cpp] view plain copy print?- #include <iostream>
- #include <string.h>
- #include <map>
- #include <stdio.h>
- #define num_before_symbol 20
- usingnamespace std;
- bool isNum(char *a);
- bool isWord(char *a);
- bool isSymbol(char *a);
- bool isNULL(char *a);
- map<string,string>type;
- char symbol[][10] = {">=","<=","<",">","==","=","!=","++","+=","+","/","-","\\",";","(",")","{","}",
- "[","]",":","->","?",",",".","*","\0"};
- int curn = 0;
- int curtype = 0;
- int nerror = 0;
- class STATE;
- class LIST;
- class STATE
- {
- LIST *list;
- static STATE *error;
- public:
- staticint count;
- int type;
- char *name;
- void enlist(bool (*fun)(char *),STATE *out);
- const STATE *next(char *in)const;
- const STATE *start(char *)const;
- STATE(char *name);
- ~STATE();
- };
- class LIST{
- LIST *next;
- bool (*fun)(char *);
- STATE *output;
- LIST(bool (*fun)(char *),STATE *out);
- ~LIST();
- friendclass STATE;
- };
- STATE *STATE::error = 0;
- int STATE::count = 0;
- LIST::LIST(bool (*fun)(char *),STATE *out)
- {
- this->next = NULL;
- this->fun = fun;
- this->output = out;
- }
- LIST::~LIST() //怎麼delete????
- {
- if(this->next!=NULL)
- deletethis->next;
- }
- const STATE *STATE::next(char *in)const
- {
- LIST *p = list;
- //if(this == error) return error;
- while(p!=NULL)
- {
- if(p->fun(in))
- return p->output;
- else
- p = p->next;
- }
- return error;
- }
- const STATE *STATE::start(char *s)const
- {
- const STATE *p;
- if(list == NULL)
- {
- if(this != error)
- count++;
- else
- {
- while(isWord(s))
- curn++;
- }
- returnthis;
- }
- p = this->next(s);
- if(p == error)
- return error; //error是否要加字首
- return p->start(s+1);
- }
- STATE::STATE(char *name)
- {
- if(name == 0)
- {
- error = this;
- this->type = 1;
- return;
- }
- if(strcmp(name,"SYMBOL"))
- this->type = 0;
- else
- this->type = 1;
- this->name = newchar[strlen(name)]; //strlen+1
- strcpy(this->name,name);
- this->list = NULL;
- }
- STATE::~STATE()
- {
- if(list)
- {
- delete list;
- list = 0;
- }
- if(name)
- {
- delete name;
- name = 0;
- }
- }
- void STATE::enlist(bool (*fun)(char *),STATE *out)
- {
- LIST *p = new LIST(fun,out);
- LIST *cur = this->list;
- if(cur == NULL)
- this->list = p;
- else
- {
- while(cur->next!=NULL)
- cur = cur->next;
- cur->next = p;
- }
- }
- bool mystrcmp(char *a,char *s)
- {
- int i = 0;
- while(s[i]!='\0')
- {
- if(a[i]!=s[i])
- returnfalse;
- i++;
- }
- returntrue;
- }
- bool isNum(char *a)
- {
- if(a[0]<='9' && a[0]>='0')
- {
- curn++;
- returntrue;
- }
- returnfalse;
- }
- bool isDot(char *a)
- {
- if(a[0] == '.')
- {
- curn++;
- returntrue;
- }
- returnfalse;
- }
- bool isWord(char *a)
- {
- if((a[0]<='Z' && a[0]>='A') || (a[0]>='a' && a[0]<='z'))
- {
- curn++;
- returntrue;
- }
- returnfalse;
- }
- bool isNotNumOrWord(char *a)
- {
- if((!(a[0]<='9' && a[0]>='0')) && !isWord(a))
- {
- returntrue;
- }
- curn--;
- returnfalse;
- }
- bool isSymbol(char *a)
- {
- int i = 0;
- while(strcmp(symbol[i],"\0"))
- {
- if(mystrcmp(a,symbol[i]))
- {
- curtype = i;
- curn = strlen(symbol[i]);
- break;
- }
- i++;
- }
- }
- bool isDQuotation(char *a)
- {
- if(a[0] == '"')
- {
- curn++;
- returntrue;
- }
- returnfalse;
- }
- bool isNotDQuotation(char *a)
- {
- if(a[0] != '"')
- {
- curn++;
- returntrue;
- }
- returnfalse;
- }
- bool isNotSQuotation(char *a)
- {
- if(a[0]!='\'')
- {
- curn++;
- returntrue;
- }
- returnfalse;
- }
- bool isSQuotation(char *a)
- {
- if(*(a-1)!='\\' && (*a)== '\'')
- {
- curn++;
- returntrue;
- }
- returnfalse;
- }
- STATE *start = new STATE("start");
- STATE *s1 = new STATE("s1");
- STATE *s2 = new STATE("s2");
- STATE *s3 = new STATE("s3");
- STATE *s4 = new STATE("s4");
- STATE *s5 = new STATE("s5");
- STATE *s6 = new STATE("s6");
- STATE *INT = new STATE("INT");
- STATE *FLOAT = new STATE("FLOAT");
- STATE *IDENT = new STATE("IDENT");
- STATE *SYMBOL = new STATE("SYMBOL");
- STATE *CHAR = new STATE("CHAR");
- STATE *CHARS = new STATE("CHARS");
- STATE error(0);
- void init()
- {
- start->enlist(isNum,s1);
- start->enlist(isWord,s3);
- start->enlist(isSymbol,SYMBOL);
- start->enlist(isDQuotation,s4);
- start->enlist(isSQuotation,s5);
- s1->enlist(isNum,s1);
- s1->enlist(isDot,s2);
- s1->enlist(isNotNumOrWord,INT);
- s1->enlist(isWord,&error); //需要將錯誤部分剩下的跳過,對error型別進行標識。
- s2->enlist(isNum,s2);
- s2->enlist(isNotNumOrWord,FLOAT);
- s2->enlist(isWord,&error);
- s3->enlist(isWord,s3);
- s3->enlist(isNum,s3);
- s3->enlist(isNotNumOrWord,IDENT);
- s4->enlist(isNotDQuotation,s4); //if is """ will get wrong answer
- s4->enlist(isDQuotation,CHARS);
- s5->enlist(isNotSQuotation,s6);
- s5->enlist(isSQuotation,&error);
- s6->enlist(isSQuotation,CHAR);
- type["INT"] = "0";
- type["FLOAT"] = "1";
- type["CHAR"] = "2";
- type["CHARS"] = "2";
- type["IDENT"] = "3";
- type["if"] = "5";
- type["else"] = "6";
- type["int"] = "7";
- type["char"] = "8";
- type["float"] = "9";
- type["double"] = "10";
- type["long"] = "11";
- type["short"] = "12";
- type["return"] = "13";
- type["while"] = "14";
- type["break"] = "15";
- //引號算什麼??
- }
- void print(char *s,const STATE *p)
- {
- int i = 0;
- char temp[255];
- for(i = 0;i<curn;i++)
- temp[i] = s[i];
- temp[i] = '\0';
- if(p->name == 0)
- {
- printf("error: %s\n",temp);
- nerror++;
- return;
- }
- printf("<");
- for(i = 0;i<curn;i++)
- printf("%c",*(s+i));
- if(curtype == 0)
- {
- if(type.find(temp) == type.end())
- cout<<", "<<p->name<<", "<<type[p->name]<<">"<<endl;
- else
- cout<<", "<<temp<<", "<<type[temp]<<">"<<endl;
- }
- else
- printf(", %s, %d>\n",symbol[curtype],num_before_symbol+curtype);
- }
- int main(int argv,char *argc[])
- {
- char temp[2][255];
- if(argv<2)
- strcpy(temp[0],"input.txt");
- else
- strcpy(temp[0],argc[1]);
- if(argv<3)
- strcpy(temp[1],"output.txt");
- else
- strcpy(temp[1],argc[2]);
- freopen((constchar*)temp[0],"r",stdin);
- freopen((constchar*)temp[1],"w",stdout);
- char input[255],*s = input;
- int t = 1;
- init();
- while(gets(s))
- {
- curn = 0;
- printf("********Line %d*********:\n",t++);
- if(t == 18)
- t = 18;
- while((*s)!='\0') //n is last edit
- {
- while((*s) == ' ') s++; //濾掉空格
- curn = 0;
- curtype = 0;
- print(s,start->start(s));
- s += curn;
- while((*s) == ' ') s++; //濾掉空格
- }
- s = input;
- printf("\n");
- }
- printf("*******************************\n%d error!\n%d Word Have Been Found Out!\n",nerror,STATE::count);
- return 0;
- }