Practical Compiler Construction

Whoami

  • Sreedeep CV A.K.A Cripto
  • wannabe reverse engineer
  • Compiler Designer
  • student

Why compiler Design

Learn forward engineering 

Learn C 

Become language hacker

Become Reverse engineer

types of Interpreters

Single-pass compilers

Tree-walk interpreters

Transpilers

Just-in-time compilation

The Lox Language

  • A High-Level Language

  • Dynamic typing

  • Automatic memory management

  • Closures

  • object oriented

  • Inheritance

Chunks of ByteCode

  • structurally bytecode resembles machine code
  • using “chunk” to refer to sequences of bytecode
  • each instruction has a one-byte operation code
  • 								
    									typedef struct {
    										int count;
    										int capacity;
    										uint8_t* code;
    									  } Chunk;
    								
    							

1.Allocate a new array with more capacity.

2.Copy the existing elements from the old array to the new one.

3.Store the new capacity.

4.Delete the old array.

5.Update code to point to the new array.

6.Store the element in the new array now that there is room.

7.Update the count.

Lexer or scanner

converts Source code into tokens

Basic REPL (read-evaluate-print-loop)

							

								static void repl() {
									char line[1024];
									for (;;) {
									printf("> ");
								
									if (!fgets(line, sizeof(line), stdin)) {
										printf("\n");
										break;
									}
								
									interpret(line);
									}
								}

								int main(int argc, const char* argv[]) {
									initVM();
								  
									if (argc == 1) {
									  repl();
									} else if (argc == 2) {
									  runFile(argv[1]);
									} else {
									  fprintf(stderr, "Usage: clox [path]\n");
									  exit(64);
									}
								  
									freeVM();
									return 0;
								  }
								  
							
						

A Lexer implimentation

							
									#include "common.h"
									#include "scanner.h"
									
									typedef struct {
									  const char* start;
									  const char* current;
									  int line;
									} Scanner;
									
									Scanner scanner;
									
							
						
  • The start pointer marks the beginning of the current lexeme being scanned, and current points to the current character being looked at.

A virtual Machine

  • simulated chip written in software that interprets the bytecode one instruction at a time
  • adds overhead, which is a key reason bytecode is slower than native code

						#ifndef clox_vm_h
						#define clox_vm_h
						
						#include "chunk.h"
						
						typedef struct {
						  Chunk* chunk;
						  uint8_t* ip;
						  Value stack[STACK_MAX];
						  Value* stackTop;
						
						} VM;
						
						void initVM();
						void freeVM();
						
						#endif
						  
  • its an instruction execution Machine
  • run() is the most important function in all of clox. When the interpreter executes a user’s program(90% of its time)

Compiling Expressions

  • Single-Pass Compilation

  • It parses the user’s source code to understand what it means.Then it takes that knowledge and outputs low-level instructions that produce the same semantics

  • A parser produces an AST

  • then a code generator traverses the AST and outputs target code

Parse with presidence

Hash Table

  • A heash table associates a set of keys with a set of values. Each key/value pair is an entry in the table.
  • Given a key, you can look up its corresponding value. You can add new key/value pairs and remove entries by key. If you add a new value for an existing key, it replaces the previous entry.
  • A contiguous array of buckets that you index directly into.
  • Separate chaining
  • A hash function takes some larger blob of data and “hashes” it to produce a fixed-size integer hash code whose value depends on all of the bits of the original data.

Mark-Sweep Garbage Collection

  • Marking: We start with the roots and traverse or trace through all of the objects those roots refer to. This is a classic graph traversal of all of the reachable objects. Each time we visit an object, we mark it in some way
  • Sweeping: Once the mark phase completes, every reachable object in the heap has been marked. That means any unmarked object is unreachable. We go through and free each one

Local Variables

  • Local variables are one of the most-used parts of a language. If locals are slow, everything is slow. So we want a strategy for local variables that’s as efficient as possible
  • we use stack to represent Local Variables