Tutorial: Write a Finite State Machine to parse a custom language in pure Python

阿新 • • 發佈：2018-12-29

1. Analyze the structure

First a couple simple examples of the POSH Syntax one per line (3 examples):

VB(noise+3)

NNS(acoustics) & RB(not)

(NNS(acoustics) & RB(not)) | JJ(muted)

Let’s not dive too deep into the meaning of my syntax. Instead we are going to focus on the stream of characters and how they may be parsed. A couple quick observations:

There is a prefix like “VB” then a parenthesis set.
Inside the parenthesis is a subject
each of those prefix + ( + subject + ) we can call a Rule
then there may be an operator that strings rules together like ‘&’ or ‘|’
sometimes there are groups of things also encapsulated by a parenthesis.
if I look left to right, character per character, I observe what options from that spot to the next will modify state

2. Draw a state diagram

It’s easiest for me if I simply start with one state and I ask myself, “self, where can I go from here?” For example if start with the state PREFIX while looking at the very first character of the first example the ‘V’ (in “VB(noise+3)”) I see I can only go two places, 1) to another character, in this case the ‘B’ or 2) I can go to the first ‘(‘ parenthesis. If I graph this out it looks like this:

Here I am saying when I go to the next character I am either still in the prefix or I move to the subject state if I ever to see the “(“ character.

We need to do this for every single state. We just keep going character per character and ask ourselves what are the possible options and what are the results of those options. Then end diagram looks like this:

POSH Syntax State Diagram with transitions

Notice I have given each transition a name, each state as well.

A note on coverage: This is where TDD (test driven development) comes it. A later goal will be to try to cover (run a “coverage” report) every transition and every state.

3. List our transitions and states and convert them to code

For the transitions I will give each a short all caps name then a lower case longer name (so I can implement those in Python later):

T_SKIP = transition_skipT_NEW_GROUP = transition_new_groupT_APPEND_CHAR_PRE = transition_append_preT_ADD_OP = transition_add_opT_ADD_OP_NEW_RULE = transition_add_op_new_ruleT_END_GROUP = transition_end_groupT_END_RULE = transition_end_ruleT_APPEND_CHAR_SUBJ = transition_append_subjT_ADD_GROUP_OP = transition_add_op_new_group

Now for the states, give each one a short name and then a longer full text name to help identify them.

S_NEW_GROUP = “STATE: NEW_GROUP”S_END_GROUP = “STATE: END_GROUP”S_PRE = “STATE: PREFIX”S_OP = “STATE: OPERATOR”S_END_RULE = “STATE: END_RULE”S_SUBJ = “STATE: SUBJECT”

4. Create a transition table of state changes

Our transition table must contain one entry for each:

start state (src)
end state (dst)
rule for transition (condition)
callback for the transition (callback)

In Python it will look something like this:

# For transition 1{‘src’: S_NEW_GROUP, ‘dst’: S_PRE, ‘condition’: “[A-Za-z|+|-|\d]”, ‘callback’: T_APPEND_CHAR_PRE}

Now we number all of our possible stage changes so we are sure we don’t miss one.

The end table looks something like this:

5. Complete our program class design

We start with three class:

A Rule class: It will hold our Prefix, Suffix, and operator (left hand side)
A RuleGroup: This will have a parent RuleGroup — poor man’s pushdown automata
And the Rule_Parse_FSM class that main goal is to iterate over the input string and handle the transitions while callling the callbacks.

Rule Class:

RuleGroup Class:

And finally the Rule_Parse_FSM class:

A quick overview of the parse logic:

Run take the input string character by character and sends it to process next.
process takes only the relevant transitions (those with matching states to current state) and sends the character to an evaluator
The evaluator (iterate_re_evaluators) takes the cmd logic (which happens to be a very short regex statement) I stress Very short and not confusing, I hope. If it matches it fires a state change
The state changer stores a new stage and calls the call backs

Now we need callbacks that do something different depending on the transition from state to to state. We already outlined them but we need to now write each. Here is what we end up with:

It’s very convenient to pass in the fsm_obj which is an instance of Rule_Parse_FSM class. Each transition does something different. Let’s take transition_add_op for example. That is when we hit a operator like the ’&’ or the ‘|’ we find the Rule it belongs to (which will eventually be the complete left hand side of the rule) and assign that operator to the instance.

6. Test our Program

The complete program may be located here.

Running this files gives us:

N -> STATE: NEW_GROUP : STATE: PREFIXN -> STATE: PREFIX : STATE: PREFIX( -> STATE: PREFIX : STATE: SUBJECTh -> STATE: SUBJECT : STATE: SUBJECTe -> STATE: SUBJECT : STATE: SUBJECTl -> STATE: SUBJECT : STATE: SUBJECTl -> STATE: SUBJECT : STATE: SUBJECTo -> STATE: SUBJECT : STATE: SUBJECT) -> STATE: SUBJECT : STATE: END_RULEskip ' ' in STATE: END_RULE& -> STATE: END_RULE : STATE: OPERATORskip ' ' in STATE: OPERATORN -> STATE: OPERATOR : STATE: PREFIXN -> STATE: PREFIX : STATE: PREFIX( -> STATE: PREFIX : STATE: SUBJECTw -> STATE: SUBJECT : STATE: SUBJECTo -> STATE: SUBJECT : STATE: SUBJECTr -> STATE: SUBJECT : STATE: SUBJECTl -> STATE: SUBJECT : STATE: SUBJECTd -> STATE: SUBJECT : STATE: SUBJECT) -> STATE: SUBJECT : STATE: END_RULE

<RuleGroup: {'rules': [<Rule:  NN(hello)>, <Rule: & NN(world)>], 'level': 0, 'rule_count': 2, 'parent': None, 'op': None}>

Now we have completely implemented our state machine in 200 lines of Python code. Yay! ?

Again the final code may be located HERE

Tutorial: Write a Finite State Machine to parse a custom language in pure Python

1. Analyze the structure

2. Draw a state diagram

3. List our transitions and states and convert them to code

4. Create a transition table of state changes

5. Complete our program class design

6. Test our Program

Tutorial: Write a Finite State Machine to parse a custom language in pure Python

FPGA學習筆記（七）——FSM（Finite State Machine，有限狀態機）設計

Unity 有限狀態機（Finite State Machine）的理解與實現簡單的可插拔（Pluggable）AI指令碼物件。

證明與計算(7): 有限狀態機(Finite State Machine)

Three Laws of Privacy: A Set of Rules to Build a Privacy Standard

What is a Bounty Program? Steps to make a successful Bounty Program

文獻筆記：《Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences》讀後感~

How to create a Network File SystemHow to create a Network File System

ASM ClassReader failed to parse class file- probably due to a new Java class file version that isn't supported yet問題

Fragment null must be a public static class to be properly recreated from instance state.

ASM ClassReader failed to parse class file - probably due to a new Java class file version that isn

The Best Way To Have A Quality Concrete Pump Machine Available For Sale

A Gentle Introduction to Applied Machine Learning as a Search Problem (譯文)

A Survey of Machine Learning Techniques Applied to Software Defined Networking (SDN): Research Issues and Challenges

機器學習_論文筆記_1: A few useful things to know about machine learning

Unity 3.Adventure Game tutorial（Event Systems、Animator State Machine、Inventory）

Machine Learning: How to Build a Model From Scratch

How to become a machine learning engineer: A cheat sheet

Top Machine Learning Algorithms You Should Know to Become a Data Scientist

Utilising machine learning, AI and the cloud to deliver a more personalised customer experience

Tutorial: Write a Finite State Machine to parse a custom language in pure Python

1. Analyze the structure

2. Draw a state diagram

3. List our transitions and states and convert them to code

4. Create a transition table of state changes

5. Complete our program class design

6. Test our Program

相關推薦