1. 程式人生 > 其它 >編譯原理 Principles of Compilers

編譯原理 Principles of Compilers

Overview

In this course we mainly learned about the function of compiler and process of compiling.

Compiler is like a translator,it translates source code into assembly language supported by the target machine.

Process of compiling has 6 steps:

graph TD start(Source Code)-->op1(Lexical Analysis) op1-->op2(Syntax Analysis) op2-->op3(Semantic Analysis) op3-->op4(Intermediate Code Generator) op4-->op5(Machine Independent Code Optimiser) op5-->op6(Code Optimiser) op6-->e(Target Code)

In this process, we focused on lexical analysis and syntax analysis.

一、Lexical Analysis

This phase scans the source code and as a stream of characters and recognize different types of tokens, like identifier, keyword or operator.

For example:

int val = 10;

To recognize a token, we need to use something to formalize the description of token.

Like regular expression and finite automata

.

1. Regular Expression

A regular expression can represent a type of strings.

We use regular expressions to match the strings we need.

Like:

(a | b)*

This regx represent all strings consists of 'a' and 'b'.

2. Finite Automata

FA is a machine that recognizes regular expressions.

It has a set of states and rules for moving from one state to another.

There are two types of FA:

  • DFA: Deterministic Finite Automata
  • NFA: Nondeterministic Finite Automata

1) DFA

"Deterministic" means that, for each state, each input symbol corresponds to only one target state.

2) NFA

"Nondeterministic" means that, for each state, each input symbol corresponds to one or more target states.

NFA can be converted to DFA.

二、Syntax Analysis

This phase checks whether the token string given by the lexical analysis conforms to the grammar of source code language.

We use context-free grammar(CFG) to represent a grammar.

1. CFG

A CFG has four components:

  • Non-Terminals(V): It denote sets of strings.

  • Terminal Symbols(Σ): or a set of tokens.

  • Productions(P): The rules. In CFG, left side of productions are non-terminals.

  • Start symbol(S)

Example: rules of arithmetic expression:

\[E \rightarrow identifier\\ E \rightarrow E + E\\ E \rightarrow E * E \]

2. Syntax Analyzer

Syntax analyzer check the input according to the CFG. Output of this phase is a parse tree.

For example:

input: num1 + num2 * num1

Derivation Process:

\[E \rightarrow E * E\\ E \rightarrow E + E * E\\ E \rightarrow id + E * E\\ E \rightarrow id + id * E\\ E \rightarrow id + id * id \]

The Tree:

3. Two Types of Parsing

1) Top-Down

LL Parser.

2) Bottom-Up

SLR Parser, LR Parser and LALR Parser.

三、課程設計內容

The professor gave us three context-free grammars, and asked us to choose one grammar.

And then write a lexical analyzer, syntax analyzer based on SLR(1) parsing in C++ language.

We could also choose to write a semantic analyzer.

I chose the most difficult grammar and enlisted to write a semantic analyzer. But unfortunately I didn't make it to finish the semantic analyzer.