Notes for Nand2Tetris: Assembler
This is a note for Nand2Tetris Unit 6.
Unit 6.1
Cross compiler: It's running on one computer and producing code intended for another computer.
Basic Assembler Logic
Repeat:
- Read the next Assembly language command
- Break it into the different fields it is composed of
- Lookup the binary code for each field
- Combine these codes into a single machine language command
- Output this machine language command
Until end-of-file reached
Symbols
Symbols are used for labels and variables.
Assembler must replace names with addresses.
Introducing... Symbol table.
Allocation of variables
First we look at the table. If the variable has already been in the table, we can just read its address. But if the variable is not there, we must allocate memory to it and record its address.
Labels
...
Forward references
Sometimes we can jump into a label before the label was actually defined.
Possible solutions:
- Leave blank until label appears, then fix
- In first pass just figure out all addresses
Unit 6.2
Assembly program elements:
- White space
- Empty lines / indentation
- Line comments
- In-line comments
- Instructions
- A-instruction
- C-instruction
- Symbols
- References
- Label declarations
Unit 6.3
Translating A-instruction
Translation to binary:
- If $value$ is a decimal constant, generate the equivalent 15-bit binary constant
- If $value$ is a symbol, later
Translating C-instruction
...
The overall assembly logic
For each instruction
- Parse the instruction: break it into its underlying fields
- A-instruction: translate the decimal value into a binary value
- C-instruction: for each field in the instruction, generate the corresponding binary code; assemble the translated binary codes into a complete 16-bit machine instruction
- Write the 16-bit instruction to the output file
Unit 6.4
Handling symbols
Symbols:
- variable symbols: represent memory locations where the programmer wants to maintain values
- label symbols: represent destinations of goto instructions
- pre-defined symbols: represent special memory locations
The Hack language specification describes 23 pre-defined symbols: ...
Translating @$preDefinedSymbol$:
Replace $preDefinedSymbol$ with its value
Translating @$label$:
...
Variable symbols:
- Any symbol XXX appearing in an assembly program which is not pre-defined and is not defined elsewhere using the (XXX) directive is treated as a $variable$
- Each variable is assigned a unique memory address, starting at 16 (specified by the Hack language)
Translating @$variableSymbol$:
- If you see it for the first time, assign a unique memory address
- Replace $variableSymbol$ with its value
Symbol table
- Initialization: Add the pre-defined symbols
- First pass: Add the label symbols
- Second pass: Add var. symbols
To resolve a symbol, look up its value in the symbol table.
The assembly process
- Initialization
- Construct an empty symbol table
- Add the pre-defined symbols to the symbol table
- First pass
Scan the program;
For each "instruction" of the form (XXX):- Add the pair (XXX, $address$) to the symbol table, where $address$ is the number of the instruction following (XXX)
- Second pass
Set $n$ to 16
Scan the entire program again; for each instruction:- If the instruction is @$symbol$, look up $symbol$ in the symbol table;
- If ($symbol$, $value$) is found, use $value$ to complete the instruction's translation;
- If not found:
- Add ($symbol$, $n$) to the symbol table
- Use $n$ to complete the instruction's translation
- $n$++
- If the instruction is a C-instruction, complete the instruction's translation
- Write the translated instruction to the output file
- If the instruction is @$symbol$, look up $symbol$ in the symbol table;
Project 6
My assembler for Hack computer written in Python:
1 | import argparse |