CSE 351

Lecture 1 - June 21

We will cover three main topics:
- Data
- Programs
- Scale
The prefix for binary is: 0b, hex prefix is: 0x
With $n$ binary digits we can represent $2^n$ things
- With $n$ digits of base $b$ we can represent $b^n$ things
ASCII characters are one byte, representing 256 characters
- lol we have so much more memory now, hello UTF encoding

Lecture 2 - June 23

The CPU executes instructions which are also stored in memory
CSE 351 is about data movement - how is memory stored or found?
Base-2 is the default (although DNA/quantum computing is researched)
We manipulate memory as chunks of bytes (even if it doesn’t need all bits - hello bools!)
Definition “word”: a 8-byte chunk of memory
x86-64 systems use 64-bit words
- The potential address space is 2^64 addresses
Pointers are also stored in memory like any other data
Big-endian: least signifcant byte stored in the largest address
- Not commonly used
Little-endian: least significant byte stored in smallest address
- x86, x86-64
Bi-endian (used on ARM/M1)
The pointer always points to the lowest address of the data

Lecture 3 - June 26

Pointers are defined: type* ptr;
- The type encodes size information, the * designates it is a pointer
To get the address of a variable, write: type* ptr = &var;
- The address must be stored in a pointer
The “dereference” operator * is used to “grab” the data a pointer points to
- i.e. int num = *ptr;
NULL is a special pointer to nothing
Addition of a pointer and a scalar scales the scalar by the size of the type
Subtraction of two pointers the value is scaled (divided) by the size of the type
- We can use this to find the amount of types between two pointers
Arrays are sets of contiguous locations in memory that store the same type of data object
- C declaration: type name[size]
- Indexing actually works with: *(array_name + n)
“Strings” in C are actually just arrays of chars terminated by the null character
- i.e. char str[] = "hi" allocates an array of size three
- Null character is backslash 0
  - not every array in C has this null character
In C, declaration does not mean initialization: there can be garbage data

Lecture 4 - June 28

0 is false, everything else is true
Bitwise operators:
- AND: &
- OR: |
- XOR: ^
- NOT: ~
Logical operators:
- AND: &&
- OR: ||
- NOT: !
Unsigned integers work as expected
Signed integers use: Two’s Complement
- The weight of the most significant bit is negative
- This means -x == ~x + 1
Bitmasking allows us to change our inputs as we want

Lecture 5 - June 30

Signed/unsigned integers are represented in memory the same way
Type casting includes:
- Changes in bit-widths
- Changes in interpretation
- Full changes in representations
Implicit casting happens when there is a type mismatch, but a well defined conversion between the two types
Explicit casts: (new_type) expression
When casting between bit-widths, we must extend the data
- Zero extension: padding with extra zeros on the left (preserves the value)
- Sign extension: pad signed data with copies on most signficant bit (preserves sign and value)
Operations between signed and unsigned values implicitly cast all values to unsigned representations
Fixed-with binary arithmatic is modular
- This results in integer overflow
Bit shifting: move all bits a specified amount in a direction, filling in what is lost
- Left shift: x << n fills in n zeros on the right
- Logical right shift: x >> n where x is unsigned, fills in zeros on the left
- Arithmetic right shift: x >> n where x is signed, fills in with copies of most significant bit on the left
Bit shifting can be interpreted as integer multiplication or division by powers of two
To force unsigned ints, we must append number with u or U

Lecture 6 - July 5

Masking in C: #define SEARCH replace
IEEE 754 floating point encoding uses normalized scientific binary notation:
- This includes the sign, mantissa, and exponent
- 32 and 64 bit encodings for float and long
Sign is represented using just a single bit
Exponent is biased notation:
- The value is shifted by: $2^{w - 1} - 1$ where $w$ is the width of the exponent field
- Bias is a zero followed by all ones: 0b01...1
- Bias is an unsigned int
- Positive is one bigger
Mantissa:
- Implicit leading one: $1.M$, not actually stored
- Can lead to weird rounding errors
32 bit encoding (float):
- first bit is sign
- next 8 bits are exponent
- last 23 bits are mantissa
64 bit encoding (long):
- first bit is sign
- next 11 bits are exponent
- last 52 bits are mantissa
Steps to go from FP -> decimal:
1. Append the bits of $M$ to implicit leading 1 to form the mantissa.
2. Multiply the mantissa by $2^{\text{E - bias}}$
3. Multiply the sign $(-1)^{\text{S}}$
4. Multiply out the exponent by shifting the binary point.
5. Convert from binary to decimal.
Steps to go from decimal -> FP:
1. Convert decimal to binary.
2. Convert binary to normalized scientific notation.
3. Encode the sign as $S$ (0 or 1).
4. Encode $E$ as exponent + bias in unsigned.
5. Encode the 23 bits after the binary point (padding with trailing 0’s as needed) into $M$
Precision is the number of bits in a computer word to represent a value
- Gives capacity for accuracy
Accuracy is a measure of the difference between the true and computer representations of a value
If the exponent and mantissa are zeros, the float is zero
If the exponent is all ones and mantissa is zero, it is positive or negative zero
If the exponent is all ones and mantissa is not zero, it is not a number (NaN)
- The mantissa will store some value about what went wrong
When the exponent is zero and the mantissa is not zero, we have denormalized numbers
- This fills in the gaps between 0 and a real number
FP errors:
- Overflow: exp too large
- Underflow: exp too small
- Rounding: between possible numbers

Lecture 7 - July 7

Instruction set architecture (ISA) is the processor spec necessary to write assembly code
- System state: all information that defines culmination of past instructions
- Instruction set: list and format of all instructions the CPU can execute
- The effect on the system state of each instruction
The spectrum of ISAs falls between CISC and RISC:
- CISC: complex instruction set computer, easy for programmers but harder for hardware
- RISC: reduced instruction set computer, easier for hardware but low-level programming
x86-64 is CISC, we will only look at integral data and use “AT&T” syntax
In x86-64 assembly, instructions are specified using a short instruction name followed by 0-3 operands
- We will primarily look at instructions with 1-2 operands
- “AT&T” syntax typically follows:
  - instr op
  - instr src, dest
Main types of x86-64 instructions:
- Data transfer: copy data from one place to another
- Arithmetic and logical operations
- Control flow
Each instruction includes a “size specifier” letter appended:
- b for byte
- w word (two bytes)
- l long word (4 bytes)
- q quad word (8 bytes)
addb does integer addition using a byte data width
Operands for x86-64 are one of three types:
1. Immediates: constant integer data, uses $ prefix
2. Registers: name of any of the 16 general purpose registers, uses % prefix
3. Memory: specific address, ususally dereferenced, parenthesis before a %
For binary instructions (two operands), the operations can be used in any combination except:
- Immediate cannot be used as a destination operand
- No memory to memory operation (both operations cannot be Memory)
A register is a location in the CPU which stores small amount of data (word size) which can be accessed quickly
- Fixed number per architecture
  - x86-64 has 16 with 8-byte word size
- Referred by name, different for smaller divisions
- Some registers are reserved
Register names referring to smaller divisions always refer to least significant bytes of the register
- %ax refers to the lowest two bytes of %rax
When a smaller register name is used, the upper bytes are untouched except:
- Any instruction that generates a 32-bit value for a register also sets the higher bits to all zeros (x86-64)
General way to express Memory operand: D(Rb,Ri,S)
- D: displacement value, must be an immediate or constant
- Rb: base register, name of the register which acts as the “base” of our address calculation
- Ri: index register, name of the register whos value will be scaled and added to the base
- S: scale factor, scales the value in Ri by the specified number, must be 1, 2, 4, or 8
The computed address is: Reg[Rb] + Reg[Ri] \* S + D where Reg[] means “the value in the register specified”
- Most instructions will dereference this, giving the memory of this
Default values of Memory operand:
- D = 0
- Reg[Rb] = 0
- Reg[Ri] = 0
- S = 1
GCC and Clang can compile down to x86-64 or ARM instruction sets
Assembly doesn’t have “data types”, only byte amounts
“Moving” data actually is more similar to copying data

Lecture 8 - July 10

Calling conventions: for assembly functions
- We store first argument in %rdi
- We store second argument in %rsi
- Return value is stored in %rax
Addressing modes:
- We can use parenthesis to dereference pointers i.e. (%rax)
Remember registers can store pointers as well!
leaq src, dst: load effective address
- Breaks the rules :/
- src is address expression
- dst is a register
- Sets dst to the address computed by the src expression
  - Only does math, doesn’t do memory stuff
As multiplication is an expensive operation, it is often broken down to bitshifts and additions
%rip stores a pointer to the next instruction we will execute
There are one bit “condition codes” which store one-bit of memory and are implicitly set
- CF carry flag
- ZF zero flag
- SF sign flag
- OF overflow flag
Compare cmpq a, b sets condition codes by doing b - a
testq a, b does the same, setting codes with a&b
- testq a, a allows us to test based on an existing value
jmp is unconditional
- Updates %rip
set* sets the lowest byte of a register to 0 or 1 based on current condition codes

Lecture 9 - July 12

Extension instructions: movz and movs
- Similar to the mov command, but source operand is shorter than the dest operand
- Performs zero extension and sign extension
- Requires two width specifier suffixes
Conditionals are typically made up of two instructions:
1. Logical instruction that sets/changes the condition codes (cmp, test)
2. A conditional jump instruction (j*)
Operand to a jump instruction is a label
- Name followed by a colon, i.e. main:
- Labels are symbolic, representing the location of the next instruction found
Loops are made by using a jump statement to go backwards to beginning of loop body
- Conditionals can be at the top or bottom of the loop
Switch statement branches condition depending on a value
- Need to break to prevent falling through cases
- Jump table has an “array” of pointers to code
- Those code blocks are contiguous
ja: jump to the default case (unsigned case)
jmp *.L4(,%rdi,8): jump table, index starting at .L4

Lecture 10/11 - July 17

The stack is a section of memory which takes up the highest possible address space
- Holds local variables and procedural context
- It grows downwards, occupying lower addresses as it needs more space
- The end of the stack is indicated by the address stored in the stack pointer %rsp
Stack pointer can be modified by subq and addq, allocation and deallocation
- Doesn’t affect any data stored in memory
Stack pointer can also be modified with push src and pop dest
- push decrements %rsp then copies data from source to %rsp
- pop copies data from %rsp to dest, then increments %rsp
Calling conventions: ensure proecdures can pass data and control one another
- We store a return value on the stack, pointer to what the caller should execute next
- To pass control to another procedure we use the call label instruction
- To return control we use the ret instruction
To pass arguments we must use specific registers
- “Diane’s Silk Dress Cost $89” is a memonic
- All other arguments must be placed on the stack in reverse order
The return value must be placed in %rax
- This could be a pointer to the return value
Stack frames are a way of enabling recursion, dividing our stack into chunks
- Each of these frames are for only one call
Stack frame layout:
1. return address from call
2. (optional) %rbp frame pointer, indicator of the beginning of the current frame
3. Old register values need to be pushed, followed by local variables
4. If this calls another procedure, more register values and arguments 7+
As any procedure can overwrite registers, we have register saving conventions
- Calle-saved registers it is the callee’s responsibility to save the old value before using the register
  - Restores the old value before returning
- Caller-saved registers it is the responsibility of the caller to save the old value before passing control

Lecture 12 - July 19

A buffer is a region of memory used to temporarly store data
A buffer overflow is writing past the end of the buffer into adjacent memory
Stack smashing is writing past the end of a local array in the stack
- We can write back enough in the frame to overwrite the return address
gets() can write user input past the end of an array
Code injection:
1. Write exploit code/binary and put it at the start of the buffer
2. Pad to the end of the stack frame
3. Replace return address with buffer address (exploit code)
Good practice when exploiting someone’s code not to mess with past stack frames, so normal operation can be resumed

Lecture 13 - July 21

Process of building and running an executable has four phases - CALL:
- Compiling
- Assembling
- Linking
- Loading
Building an executable is the first three phases
- This is done by gcc
Compiling:
- Compiler translates a text file of source code to a text file of assembly
  1. In C, preprocessor step which runs command that start with #
  2. Interpret the meaning of the code and convert to assembly
- We can supply compiler optimization flags to gcc to change certain attributes
Assembly:
- Assembler converts a text file of assembly code to a binary object file
  - These contain object code which is incomplete machine code
  - The object code is incomplete because it lacks the addresses associated with the labels of the final executable
- In order to patch the object code, we create two tables:
  1. Symbol table: list of globally-accessible labels
  2. Relocation table: list of the addresses to be patched
Linking:
- The linker stiches together the object and static library files necessary to build the final executable
- Links all the symbol and relocation tables
- “Unresolved symbols” (i.e. undefined function) will lead to failure
Disassembling:
- Reading binary code and interpreting it as instructions is disassembling
- ghidra my beloved
Loading:
- The loader takes an executable and starts a running process
- Sets up memory sections and initializes register values
- Primarily handled by the OS
Arrays in assembly:
- The name of an array arr in C is turned into a placeholder for the starting address of an array
- Array subscription (arr[idx]) in assembly is: D(,Ri,S) with D the array name/label, Reg[Ri] the index, and S = sizeof(T)
Multidimensional arrays are a contiguous chunk of memory for all arrays
- Accessing an element is an address calculation followed by a single memory acceess
Multilevel arrays do not require a large contiguous block of memory, but extra memory accesses
- Stores pointers vs arrays, allows variable array sizes

Lecture 14 - July 24

A struct in C is a user-defined, structured group of variables
- Defining a struct: struct struct_tag {type1 field1; ...}
- Declaring a struct variable instance: struct struct_tag my_struct_var;
  - The initial struct does not need to be written if you typedef it
- Possible to also declare and intialize a struct at once, use *ptr to declare pointer
Struct name can (typically) only be used after it was defined and in scop
Used to allow compiler or program to understand the size and layout of the struct
A struct can have fields of any type except itself (but can have pointers to itself)
Fields are accessed with . operator or -> from a pointer
As two word data type names are unweildy, we often combine them with typedef:
- i.e. typedef unsigned int uint;
- This can be done with structs after defintion or while defining:
  - typedef struct {type1 field1; ...} struct_tag;
A primitive object of K bytes in memory is aligned if address % K = 0
The struct layout follows the ordering of the fields with padding inserted such that each field is aligned
- The padding (unused space) is known as internal fragmentation
The overall size of the struct also needs to be alligned:
- overall_size % K_max = 0 where K_max is the largest type in the fields
- This padding is called *external fragmentation`

Lecture 15 - July 26

IEC prefixes use a base 1024 and refer to large numbers in computing
Caches ($) are intermediate places of storing memory for a short time
- Data is transferred between caches and memory in blocks, machine-specified units larger than a word
When accessing memory, the machine checks the cache first:
- If the data is in the cache, that is a cache hit
- Otherwise it must be fetched from memory and stored in the cache, a cache miss
  - Placement policy determines where the block is placed
  - Replacement policy determines what memory is removed from the cache
The principle of locality: programs tend to use data at addresses near to what was used recently
- Temporal locality: referenced items are likely to be referenced in the near future
- Spatial locality: items with nearby addresses tend to be referenced close together in time
Cache performance metrics:
- Hit Time: how long a cache hit takes
- Miss Penalty: how long it takes to fetch a block of data from memory
- Hit/Miss Rate: the fraction of memory accesses that result in a hit or miss respectively
- Average Memory Access Time = hit time + miss rate * miss penalty
There are typically multiple levels of caching
- Some are closer (quicker) to memory
- Block size might differ between cache levels
- AMAT is calculated for overall system caching

Lecture 16 - July 28

Caches are one of the quickest parts of the memory hierarchy
Cache size is the capacity of the cache, always a multiple of the block size
The block number of A is simply: A // K
The block offset of A is: A % K
Direct-mapped placement strategy: hash and index the blocks
Higher types of memory on the hierarchy cost more (money)
Caches act as caches for each other on different levels
On a CPU, each core has a d-cache and i-cache
- These are L1 level
- d-cache is for data, what we typically focus on
- i-cache is for instructions, such as %rip
We then have a L2 unified cache between the two
Below, we have a L3 unified cache between all the different cores
We hash addresses for caches
The least significant bits from the address map to the cache
As we overwrite the cache blocks with new ones, we need a way to determine which memory value it is from
- We use the leading (remaining) bits from the address and store them in the tag field
- The tag field is not part of the memory size
TIO address breakdown:
- Index: tells you where to look in cache
- Tag: checks if the block is the correct block
- Offset: finds the byte you’re looking for

Lecture 17 - July 31

Repeating patterns of block access which are offset can lead to no relevant data being stored in cache
Associativity: we can now have an arbitrary number of blocks per set
- Kinda like a fixed-length linked list hashmap
- Direct-mapped caches are $E = 1$
We need a valid bit to ensure that the data is not randomly initialized
When replacing data in a cache, we typically replace the least recently used (LRU)
- On hardware, this is often not most recently used
Three types of misses:
- Compulsory miss: first access to a block
- Conflict miss: multiple data objects map to the same slot
- Capacity miss: the active set of blocks of memory is larger than the cache

Lecture 18 - August 2

There is the issue of writing to memory with caches
Write-hit: data already in cache
- Write-through: write immediately to next level
- Write-back: defer write to next level until line is evicted
  - Requires us to keep track of what has been modified, using the dirty bit
Write-miss: data not in cache:
- Write allocate: load into cache, then execute the write-hit policy
- No-write allocate: simply write to next level immediately
Caches are incredibly important for matmuls
- We can block the matricies into sub-matricies and matmul them - see linalg textbook
- We want to make sure that $3r^2 < C$
Always use reasonably small set of data (spatial locality)

Lecture 19 - August 4

Static global data: fixed size at compile time, exists for the entire runtime
Stack allocated data: size known at runtime, no resizing
- It exists only for the lifetime of a function
Dynamically allocated data: between the two extremes
- Use malloc and free to do so
Explicit allocation: programmer allocates and frees space
- C or other low-level languages
Implicit allocator: programmer only allocates space
- Java, Python, high-level languages
%brk points to the top of the heap, which grows up
- When we need more space, we increment %brk
malloc
- Returns a void*: an arbitrary pointer
- Takes in a size t size, a long representing an amount of memory
- Returns null a pointer to allocated memory or NULL if failed
Good practice is: (int*) malloc(n*sizeof(int));
calloc: malloc, but zeros all of the data
realloc: changes the size of a previously allocated block
sbrk: function to increase or shrink the heap (change %brk)
free
- Takes in a pointer obtained by malloc
- Returns the space to the heap
- No action occurs if you call free on NULL
malloc allocates in words of 8 bytes
Internal fragmentation happens when payload is smaller than the block
External fragmentation: unused/unallocated memory between blocks in the heap
We keep a header field for all allocations
- This stores the size of the block in a word
As all of the sizes are multiples of 8, the last size bits are always zero
- Thus, we can use the last bit to store if it is free or not

Lecture 20 - August 7

x86 typically uses 16-byte alignment
One-word marker is the end of the list: size zero but allocated
First fit allocation:
- Start at the beginning of the list, then jump until we have a free block of correct size
Next fit allocation:
- Start the search where the previous search finished
Best fit allocation:
- Search through the entire list, return the free block with closest size
When we allocate in a free block with not excess space, we need to split it
- This means we add the correct size for remaining free blocks
When freeing data, we could just zero the last bit
- This leads to the problem of false fragmentation
- Instead, whenever we free a block we coalesce with the next free blocks
  - This works easilly with the next free blocks
To coalesce backwards, we store the size in a word after the payload as well
Explicit free lists:
- Allocated block stays the same
- Free blocks store a next and previous pointer in memory, where the payload would be
This makes up a doubly-linked list
- The blocks are not necessarily sequentially linked in memory
Explicit free list allocation policies:
- LIFO: insert block at the beginning of the free list
  - Simple and constant time performance
  - Bad caching performance
- Address-ordered policy: insert free list blocks in address order
  - Requires linear time search
  - Improved caching performance
We don’t need the proceding boundary tag if the block is not free
- We can not use the tag by keeping an extra flag, is preceeding block free
Comparisons between explicit and implicit free lists:
- Explicit free lists are faster when most of the memory is full
- Explicit free lists are more challenging to implement
- Implicit free lists require less pointers, leading to a smaller minimum block size
- Explicit free lists are most commonly used with segregated free lists
  - These are different lists of multiple sizes or object types
Minimum block size:
- When we split a block, we need to know if the leftover is worth it
- For an explicit free list, we need 32 bytes
  - 4 words which are pointers and sizes for free blocks

Lecture 21 - August 9

We use exceptional control flow to manage multiple processes at a time and low-level mechanisms
- Exceptions (hardware)
- Signals (OS)
Typical control flow is sequential and reacts only to program state
- This can’t handle modern challenges, i.e. system interupts or program stalling
Kernel: a core component of an operating system, privileged instructions
- The lowest layer and security mechanism
For managing exceptions, we use an exception table
- This classifies different exceptions into their behaviors/responses
Asynchronous exceptions:
- Interrupts: caused by events external to the processor
  - e.g. Timer interrupt, a interrupt allowing the OS to take back control of programs
Synchronous exceptions:
- Traps:
  - Intentional way to transfer control to the OS to perform a function
  - E.g. syscalls
  - Returns control to the next instruction
- Faults:
  - Unintentional but can be recovered
  - E.g. divide-by-zero
  - Either re-executes the instruction or aborts
- Aborts:
  - Unintentional and unrecoverable
  - E.g. hardware failure
  - Aborts the current program
Each program gets its own process, an illusion that a processor is entirely for itself
- A process is an interface between the program and hardware
- Programs get their own logical control flow
- It gives programs a private address space
The OS tells each processor when to switch between programs
To switch between processes, we need to store execution context
- This is context switching
- It saves current registers, schedules the next process, and loads the saved registers and address space
Two processes run concurrently if their instructions run at the same time
Kernel code manages processes as part of a user process, switching between them
In Linux, when creating a new process:
- First, the current process is fork()‘ed
  - fork() creates a copy of the current process
  - It returns 0 to the child process and the child’s process ID (PID) to the parent
- Second, exec*() is called on the new process from the fork
  - exec*() replaces a current process’ code and address space with that for another program
A process graph is a way to visualize executions and processes

Lecture 22 - August 11

exec*() functions loads a fresh instance of the specified program, the “Loader” from “CALL”
Execution context:
- Data
  - Static data and executables are copied from the program executable
- Code
  - Copied from the program executable
- Heap
  - Empty: no allocated memory could have yet been requested
- Stack
  - Setup the main stack frame and command-line arguments
- Registers
  - main takes two inputs, %rdi and %rsi
  - %rip and %rsp need to point to main and the stack
Ending a process - three typical ways:
1. Executing a return statement from main
2. Calling the library function exit()
3. From an abort by the exception handler
1 and 2 provide a status code that can be read by the parent process
A terminated process must be reaped to read the status code and delete resouces
- A process which is terminated but not yet reaped is a zombie process
Reaping is done by the parent process can can be done explicitly or implicitly
- Explicitly: using wait() or waitpid() calls
- Implicitly when the parent process terminates
When a parent process terminates before a child process, responsibility for reaping is given to init or systemd
- These have a PID of 1
Abstracting memory for specific programs is known as virtual memory
Virtual address space: the amount of bytes a process thinks is available
Physical address space: the amount of bytes that is physically available on RAM
There is also extra space allocated on disk, called swap space for excess temporary memory
The computer needs to map the virtual address space into physical address spaces

Lecture 23 - August 14

The memory management unit (MMU) maps virtual addresses to physical addresses
An n-bit CPU generated address is split into virtual page number and page offset
- The physical and virtual offsets are aligned
We use a page table lookup table
- We replace virtual page numbers with physical page tables to generate physical addresses
PTE: page table entry (PPN + management bits)
Page tables store PPN and a valid bit (is the page in RAM?)
- If the valid bit is zero and the data is null, it is not mapped at all
- If it is zero and data is not null, the page is stored in disk (or other virtual memory)
Each process has its own page table
When we switch between processes, we also switch the page table (along with other parts of memory)
Page hit: VM reference is in physical memory
Page fault: VM reference is not in physical memory
There are also read, write, and execute bits for the data

Lecture 24 - August 16

We’re done!