SAOC 2024 "Learning about AST Nodes and Semantic Analysis in Compiler Design" Weekly update #1
Dennis
dennis.onyeka.4 at gmail.com
Sun Sep 22 20:33:22 UTC 2024
**Tasks Accomplished**
### Learning AST Nodes and Semantic Analysis in Compiler Design:
Before delving into how to decouple AST nodes from semantics
functions, I looked at how compilers work in general and the
processes involved.
A typical compiler works this way:
Character Stream=> |**Lexer**| =>Tokens=>|**Parser**| =>AST
=>|**Semantic Routines**| =>**Intermediate
Representation(Optimization)** =>|Code Generator| => **Assembly
Code**
**Character stream:** It is also known as source code or input
that the programmer wrote.
**Lexer/scanner:** lexing/lexical analysis is the process of
breaking down a string into meaningful units, the result of this
process is called tokens.
**Parser:** The job of the parser is to obtain strings of tokens
from the lexical analyzer and verifies that the string is a
grammar from the source language. It detects and reports any
syntax errors and produces a parse tree from which intermediate
code can be generated.
The output of the parser is an abstract syntax tree (AST).
**Abstract syntax tree(AST):** The AST is like a blueprint that
represents the structure of my code. It breaks down the code into
smaller chunks and organizes them in a tree-like structure so
that the compiler can understand.
An important fact I learnt is that the AST only contains
information related to analyzing the source text and ignores
extra syntactic information used for parsing text.
In the dmd compiler codebase, AST nodes are classes and structs,
while the semantic routines are function tightly coupled within
the AST classes.
I also learnt about the core differences between an AST tree and
a parse tree which in summary I would say an AST is focusing on
the essential elements and their relationships. It captures the
underlying structure and semantics of the code, excluding
unnecessary syntactic details while parse tree captures the
complete structure of the input code, including all the syntactic
details, such as parentheses, semicolons, and other
language-specific constructs.
A simple ast node constructed for the practice
https://github.com/dchidindu5/test_demo/blob/main/README.md
**Semantic Analysis:** It is a process in compiling where the
compiler checks whether the code is logical and meaningful. Its
major role is type checking to
confirm whether variable declarations, functions, and control
flow adheres to the semantics of the language.
So far these processes are the frontend of the dmd compiler.
- To fully understand the directory for the dmd codebase, I used
this as a guide, which outlines the files and what they perform.
https://github.com/dlang/dmd/blob/master/compiler/src/dmd/README.md
- Looked up into each and every file I would work on.
### Initial Refactoring of DMD AST
- Chose the attrib.d AST node file as recommended by my mentor
- I examined the imports and commented out //import
dmd.dsymbolsem which is a semantic import.
- Built the compiler and errors were encountered.
- Looked at the error messages and moved the affected semantic
functions to dsymbolsem.d which is a semantic analysis file.
- The affected functions were `newScope` func
- Converted it into a visitor which is a design pattern for
refactoring. Had trouble mastering it so my mentor sent a
previous commit on visitors to
[Extract dsymbol.Dsymbol.importAll and turn it into a
visitor](https://github.com/dlang/dmd/pull/15870/)
- Implemented it on the newScope func.
**First error encountered:**
```
src/dmd/dsymbolsem.d(7494): Error: function `extern (C++) Scope*
dmd.dsymbolsem.newScopeVisitor.visit(Scope* sc)` does not
override any function, did you mean to override alias
`dmd.visitor.Visitor.visit`?
src/dmd/dsymbolsem.d(7494): Functions are the only
declarations that may be overridden
Functions are the only declarations that may be overridden
```
**First commit-**
https://github.com/dlang/dmd/commit/c01f76b25b4eb210d92d0ab858dd025ee72bfc6a
**Solution**
My mentor helped me to discover that the method signature in
newScopeVisitor is not exactly the same as in the base class
Visitor. That means that the method I'm trying to override does
not have the exact same name, return type,and parameters.
I worked on it and used the exact name and argument and no return
type, because it’s a virtual function(does not return any value)
**Challenges**
Although still refactoring the code, working on new errors
**Current commit:**
https://github.com/dlang/dmd/compare/master...dchidindu5:dmd:practice1?expand=1
https://github.com/dlang/dmd/commit/36489c94755a502f7141168ed6e006ef95339062
**Summary:**
This week was focused on building a strong theoretical foundation
in compiler design, particularly around AST nodes and semantic
analysis, while also getting acquainted with the practical
aspects of contributing to the DMD compiler project.
**Resources:**
AST
https://medium.com/basecs/leveling-up-ones-parsing-game-with-asts-d7a6fc2400ff
https://pgrandinetti.github.io/compilers/page/what-is-semantic-analysis-in-compilers/
Visitors
https://www.geeksforgeeks.org/visitor-method-design-patterns-in-c/
D language Book
http://ddili.org/ders/d.en/index.html
More information about the Digitalmars-d
mailing list