- Tokenize the source
- Parse statements
- Execute the AST
The first step is to be able to recognize the different parts of the source string into different pieces. For example "1+2" would be 3 tokens: [Number:"1", Operator:"+", Number:"2"]
Tokens are responsible of just having a content and knowing the type of data but don't yet know if it's valid to have a given type after another.
Parsing statement uses the tokens produces by the first step, and basically cascade the parsing from "expression", "and operator", "or operator" down to the "base statement". The order of the cascade actually defines the order of precedence of the different pieces, for example multiplications must be done before an addition: 3*2+4 must be done as (3*2)+4 and not 3*(2+4) as the result is not the same. In this step the parser can detect syntax errors, for example missing a parenthesis. At the end of this process we will have a AST (https://en.wikipedia.org/wiki/Abstract_syntax_tree) which can then be run.
Executing the AST is after that quite simple, each node knows how to execute its own operation and nothing more. Also if the AST is well built the execution of it should be fast. Ideally AST optimizations should be a step before the execution, and an AST could ideally be transformed into a binary code directly run by the CPU (that would produce a JIT if done on run-time).
Now you may wonder but why do I need to write all that, and yes it's not a small piece of software? Yes there is available parsers like the Javascript function "eval" or use 3rd party libs which would implement a scripting language for you. Well the answer is pretty simple: eval would be easy to use and cheap for the developer (me). However it would have a major issue: security, any valid code would be able to run, for example even nasty operations. 3rd party libs may actually much bigger than what we would ideally want and may be harder to integrate or debug in case of issues.
Stay tuned to see my parser first in action!
No comments:
Post a Comment