Skip to main content
CSE 351
Lecture 1 - June 21
- We will cover three main topics:
- The prefix for binary is:
0b
, hex prefix is: 0x
- With $n$ binary digits we can represent $2^n$ things
- With $n$ digits of base $b$ we can represent $b^n$ things
- ASCII characters are one byte, representing 256 characters
- lol we have so much more memory now, hello UTF encoding
Lecture 2 - June 23
- The CPU executes instructions which are also stored in memory
- CSE 351 is about data movement - how is memory stored or found?
- Base-2 is the default (although DNA/quantum computing is researched)
- We manipulate memory as chunks of bytes (even if it doesn’t need all bits - hello bools!)
- Definition “word”: a 8-byte chunk of memory
- x86-64 systems use 64-bit words
- The potential address space is 2^64 addresses
- Pointers are also stored in memory like any other data
- Big-endian: least signifcant byte stored in the largest address
- Little-endian: least significant byte stored in smallest address
- Bi-endian (used on ARM/M1)
- The pointer always points to the lowest address of the data
Lecture 3 - June 26
- Pointers are defined:
type* ptr;
- The
type
encodes size information, the *
designates it is a pointer
- To get the address of a variable, write:
type* ptr = &var;
- The address must be stored in a pointer
- The “dereference” operator
*
is used to “grab” the data a pointer points to
NULL
is a special pointer to nothing
- Addition of a pointer and a scalar scales the scalar by the size of the type
- Subtraction of two pointers the value is scaled (divided) by the size of the type
- We can use this to find the amount of
type
s between two pointers
- Arrays are sets of contiguous locations in memory that store the same type of data object
- C declaration:
type name[size]
- Indexing actually works with:
*(array_name + n)
- “Strings” in C are actually just arrays of chars terminated by the null character
- i.e.
char str[] = "hi"
allocates an array of size three
- Null character is backslash 0
- not every array in C has this null character
- In C, declaration does not mean initialization: there can be garbage data
Lecture 4 - June 28
0
is false, everything else is true
- Bitwise operators:
- AND:
&
- OR:
|
- XOR:
^
- NOT:
~
- Logical operators:
- Unsigned integers work as expected
- Signed integers use: Two’s Complement
- The weight of the most significant bit is negative
- This means
-x == ~x + 1
- Bitmasking allows us to change our inputs as we want
Lecture 5 - June 30
- Signed/unsigned integers are represented in memory the same way
- Type casting includes:
- Changes in bit-widths
- Changes in interpretation
- Full changes in representations
- Implicit casting happens when there is a type mismatch, but a well defined conversion between the two types
- Explicit casts:
(new_type) expression
- When casting between bit-widths, we must extend the data
- Zero extension: padding with extra zeros on the left (preserves the value)
- Sign extension: pad signed data with copies on most signficant bit (preserves sign and value)
- Operations between signed and unsigned values implicitly cast all values to unsigned representations
- Fixed-with binary arithmatic is modular
- This results in integer overflow
- Bit shifting: move all bits a specified amount in a direction, filling in what is lost
- Left shift:
x << n
fills in n
zeros on the right
- Logical right shift:
x >> n
where x
is unsigned, fills in zeros on the left
- Arithmetic right shift:
x >> n
where x
is signed, fills in with copies of most significant bit on the left
- Bit shifting can be interpreted as integer multiplication or division by powers of two
- To force unsigned ints, we must append number with
u
or U
Lecture 6 - July 5
- Masking in C:
#define SEARCH replace
- IEEE 754 floating point encoding uses normalized scientific binary notation:
- This includes the sign, mantissa, and exponent
- 32 and 64 bit encodings for float and long
- Sign is represented using just a single bit
- Exponent is biased notation:
- The value is shifted by: $2^{w - 1} - 1$ where $w$ is the width of the exponent field
- Bias is a zero followed by all ones:
0b01...1
- Bias is an unsigned int
- Positive is one bigger
- Mantissa:
- 32 bit encoding (float):
- first bit is sign
- next 8 bits are exponent
- last 23 bits are mantissa
- 64 bit encoding (long):
- first bit is sign
- next 11 bits are exponent
- last 52 bits are mantissa
- Steps to go from FP -> decimal:
- Append the bits of $M$ to implicit leading 1 to form the mantissa.
- Multiply the mantissa by $2^{\text{E - bias}}$
- Multiply the sign $(-1)^{\text{S}}$
- Multiply out the exponent by shifting the binary point.
- Convert from binary to decimal.
- Steps to go from decimal -> FP:
- Convert decimal to binary.
- Convert binary to normalized scientific notation.
- Encode the sign as $S$ (0 or 1).
- Encode $E$ as exponent + bias in unsigned.
- Encode the 23 bits after the binary point (padding with trailing 0’s as needed) into $M$
- Precision is the number of bits in a computer word to represent a value
- Gives capacity for accuracy
- Accuracy is a measure of the difference between the true and computer representations of a value
- If the exponent and mantissa are zeros, the float is zero
- If the exponent is all ones and mantissa is zero, it is positive or negative zero
- If the exponent is all ones and mantissa is not zero, it is not a number (
NaN
)
- The mantissa will store some value about what went wrong
- When the exponent is zero and the mantissa is not zero, we have denormalized numbers
- This fills in the gaps between 0 and a real number
- FP errors:
- Overflow: exp too large
- Underflow: exp too small
- Rounding: between possible numbers
Lecture 7 - July 7
- Instruction set architecture (ISA) is the processor spec necessary to write assembly code
- System state: all information that defines culmination of past instructions
- Instruction set: list and format of all instructions the CPU can execute
- The effect on the system state of each instruction
- The spectrum of ISAs falls between CISC and RISC:
- CISC: complex instruction set computer, easy for programmers but harder for hardware
- RISC: reduced instruction set computer, easier for hardware but low-level programming
- x86-64 is CISC, we will only look at integral data and use “AT&T” syntax
- In x86-64 assembly, instructions are specified using a short instruction name followed by 0-3 operands
- We will primarily look at instructions with 1-2 operands
- “AT&T” syntax typically follows:
- Main types of x86-64 instructions:
- Data transfer: copy data from one place to another
- Arithmetic and logical operations
- Control flow
- Each instruction includes a “size specifier” letter appended:
b
for byte
w
word (two bytes)
l
long word (4 bytes)
q
quad word (8 bytes)
addb
does integer addition using a byte data width
- Operands for x86-64 are one of three types:
- Immediates: constant integer data, uses
$
prefix
- Registers: name of any of the 16 general purpose registers, uses
%
prefix
- Memory: specific address, ususally dereferenced, parenthesis before a
%
- For binary instructions (two operands), the operations can be used in any combination except:
- Immediate cannot be used as a destination operand
- No memory to memory operation (both operations cannot be Memory)
- A register is a location in the CPU which stores small amount of data (word size) which can be accessed quickly
- Fixed number per architecture
- x86-64 has 16 with 8-byte word size
- Referred by name, different for smaller divisions
- Some registers are reserved
- Register names referring to smaller divisions always refer to least significant bytes of the register
%ax
refers to the lowest two bytes of %rax
- When a smaller register name is used, the upper bytes are untouched except:
- Any instruction that generates a 32-bit value for a register also sets the higher bits to all zeros (x86-64)
- General way to express Memory operand:
D(Rb,Ri,S)
D
: displacement value, must be an immediate or constant
Rb
: base register, name of the register which acts as the “base” of our address calculation
Ri
: index register, name of the register whos value will be scaled and added to the base
S
: scale factor, scales the value in Ri
by the specified number, must be 1, 2, 4, or 8
- The computed address is:
Reg[Rb] + Reg[Ri] \* S + D
where Reg[]
means “the value in the register specified”
- Most instructions will dereference this, giving the memory of this
- Default values of Memory operand:
D = 0
Reg[Rb] = 0
Reg[Ri] = 0
S = 1
- GCC and Clang can compile down to x86-64 or ARM instruction sets
- Assembly doesn’t have “data types”, only byte amounts
- “Moving” data actually is more similar to copying data
Lecture 8 - July 10
- Calling conventions: for assembly functions
- We store first argument in
%rdi
- We store second argument in
%rsi
- Return value is stored in
%rax
- Addressing modes:
- We can use parenthesis to dereference pointers i.e.
(%rax)
- Remember registers can store pointers as well!
leaq src, dst
: load effective address
- Breaks the rules :/
src
is address expression
dst
is a register
- Sets
dst
to the address computed by the src
expression
- Only does math, doesn’t do memory stuff
- As multiplication is an expensive operation, it is often broken down to bitshifts and additions
%rip
stores a pointer to the next instruction we will execute
- There are one bit “condition codes” which store one-bit of memory and are implicitly set
CF
carry flag
ZF
zero flag
SF
sign flag
OF
overflow flag
- Compare
cmpq a, b
sets condition codes by doing b - a
testq a, b
does the same, setting codes with a&b
testq a, a
allows us to test based on an existing value
jmp
is unconditional
set*
sets the lowest byte of a register to 0 or 1 based on current condition codes
Lecture 9 - July 12
- Extension instructions:
movz
and movs
- Similar to the
mov
command, but source operand is shorter than the dest operand
- Performs
z
ero extension and s
ign extension
- Requires two width specifier suffixes
- Conditionals are typically made up of two instructions:
- Logical instruction that sets/changes the condition codes (
cmp
, test
)
- A conditional jump instruction (
j*
)
- Operand to a jump instruction is a label
- Name followed by a colon, i.e.
main:
- Labels are symbolic, representing the location of the next instruction found
- Loops are made by using a jump statement to go backwards to beginning of loop body
- Conditionals can be at the top or bottom of the loop
- Switch statement branches condition depending on a value
- Need to
break
to prevent falling through cases
- Jump table has an “array” of pointers to code
- Those code blocks are contiguous
ja
: jump to the default case (unsigned case)
jmp *.L4(,%rdi,8)
: jump table, index starting at .L4
Lecture 10/11 - July 17
- The stack is a section of memory which takes up the highest possible address space
- Holds local variables and procedural context
- It grows downwards, occupying lower addresses as it needs more space
- The end of the stack is indicated by the address stored in the stack pointer
%rsp
- Stack pointer can be modified by
subq
and addq
, allocation and deallocation
- Doesn’t affect any data stored in memory
- Stack pointer can also be modified with
push src
and pop dest
push
decrements %rsp
then copies data from source to %rsp
pop
copies data from %rsp
to dest
, then increments %rsp
- Calling conventions: ensure proecdures can pass data and control one another
- We store a return value on the stack, pointer to what the caller should execute next
- To pass control to another procedure we use the
call label
instruction
- To return control we use the
ret
instruction
- To pass arguments we must use specific registers
- “Diane’s Silk Dress Cost $89” is a memonic
- All other arguments must be placed on the stack in reverse order
- The return value must be placed in
%rax
- This could be a pointer to the return value
- Stack frames are a way of enabling recursion, dividing our stack into chunks
- Each of these frames are for only one call
- Stack frame layout:
- return address from
call
- (optional)
%rbp
frame pointer, indicator of the beginning of the current frame
- Old register values need to be pushed, followed by local variables
- If this calls another procedure, more register values and arguments 7+
- As any procedure can overwrite registers, we have register saving conventions
- Calle-saved registers it is the callee’s responsibility to save the old value before using the register
- Restores the old value before returning
- Caller-saved registers it is the responsibility of the caller to save the old value before passing control
Lecture 12 - July 19
- A buffer is a region of memory used to temporarly store data
- A buffer overflow is writing past the end of the buffer into adjacent memory
- Stack smashing is writing past the end of a local array in the stack
- We can write back enough in the frame to overwrite the return address
gets()
can write user input past the end of an array
- Code injection:
- Write exploit code/binary and put it at the start of the buffer
- Pad to the end of the stack frame
- Replace return address with buffer address (exploit code)
- Good practice when exploiting someone’s code not to mess with past stack frames, so normal operation can be resumed
Lecture 13 - July 21
- Process of building and running an executable has four phases - CALL:
- Compiling
- Assembling
- Linking
- Loading
- Building an executable is the first three phases
- Compiling:
- Compiler translates a text file of source code to a text file of assembly
- In C, preprocessor step which runs command that start with
#
- Interpret the meaning of the code and convert to assembly
- We can supply compiler optimization flags to
gcc
to change certain attributes
- Assembly:
- Assembler converts a text file of assembly code to a binary object file
- These contain object code which is incomplete machine code
- The object code is incomplete because it lacks the addresses associated with the labels of the final executable
- In order to patch the object code, we create two tables:
- Symbol table: list of globally-accessible labels
- Relocation table: list of the addresses to be patched
- Linking:
- The linker stiches together the object and static library files necessary to build the final executable
- Links all the symbol and relocation tables
- “Unresolved symbols” (i.e. undefined function) will lead to failure
- Disassembling:
- Reading binary code and interpreting it as instructions is disassembling
ghidra
my beloved
- Loading:
- The loader takes an executable and starts a running process
- Sets up memory sections and initializes register values
- Primarily handled by the OS
- Arrays in assembly:
- The name of an array
arr
in C is turned into a placeholder for the starting address of an array
- Array subscription (
arr[idx]
) in assembly is: D(,Ri,S)
with D
the array name/label, Reg[Ri]
the index, and S
= sizeof(T)
- Multidimensional arrays are a contiguous chunk of memory for all arrays
- Accessing an element is an address calculation followed by a single memory acceess
- Multilevel arrays do not require a large contiguous block of memory, but extra memory accesses
- Stores pointers vs arrays, allows variable array sizes
Lecture 14 - July 24
- A struct in C is a user-defined, structured group of variables
- Defining a struct:
struct struct_tag {type1 field1; ...}
- Declaring a struct variable instance:
struct struct_tag my_struct_var;
- The initial
struct
does not need to be written if you typedef it
- Possible to also declare and intialize a struct at once, use
*ptr
to declare pointer
- Struct name can (typically) only be used after it was defined and in scop
- Used to allow compiler or program to understand the size and layout of the struct
- A struct can have fields of any type except itself (but can have pointers to itself)
- Fields are accessed with
.
operator or ->
from a pointer
- As two word data type names are unweildy, we often combine them with
typedef
:
- i.e.
typedef unsigned int uint;
- This can be done with structs after defintion or while defining:
typedef struct {type1 field1; ...} struct_tag;
- A primitive object of
K
bytes in memory is aligned if address % K = 0
- The struct layout follows the ordering of the fields with padding inserted such that each field is aligned
- The padding (unused space) is known as internal fragmentation
- The overall size of the struct also needs to be alligned:
overall_size % K_max = 0
where K_max
is the largest type in the fields
- This padding is called *external fragmentation`
Lecture 15 - July 26
- IEC prefixes use a base 1024 and refer to large numbers in computing
- Caches ($) are intermediate places of storing memory for a short time
- Data is transferred between caches and memory in blocks, machine-specified units larger than a word
- When accessing memory, the machine checks the cache first:
- If the data is in the cache, that is a cache hit
- Otherwise it must be fetched from memory and stored in the cache, a cache miss
- Placement policy determines where the block is placed
- Replacement policy determines what memory is removed from the cache
- The principle of locality: programs tend to use data at addresses near to what was used recently
- Temporal locality: referenced items are likely to be referenced in the near future
- Spatial locality: items with nearby addresses tend to be referenced close together in time
- Cache performance metrics:
- Hit Time: how long a cache hit takes
- Miss Penalty: how long it takes to fetch a block of data from memory
- Hit/Miss Rate: the fraction of memory accesses that result in a hit or miss respectively
- Average Memory Access Time =
hit time + miss rate * miss penalty
- There are typically multiple levels of caching
- Some are closer (quicker) to memory
- Block size might differ between cache levels
- AMAT is calculated for overall system caching
Lecture 16 - July 28
- Caches are one of the quickest parts of the memory hierarchy
- Cache size is the capacity of the cache, always a multiple of the block size
- The block number of
A
is simply: A // K
- The block offset of
A
is: A % K
- Direct-mapped placement strategy: hash and index the blocks
- Higher types of memory on the hierarchy cost more (money)
- Caches act as caches for each other on different levels
- On a CPU, each core has a d-cache and i-cache
- These are L1 level
- d-cache is for data, what we typically focus on
- i-cache is for instructions, such as
%rip
- We then have a L2 unified cache between the two
- Below, we have a L3 unified cache between all the different cores
- We hash addresses for caches
- The least significant bits from the address map to the cache
- As we overwrite the cache blocks with new ones, we need a way to determine which memory value it is from
- We use the leading (remaining) bits from the address and store them in the tag field
- The tag field is not part of the memory size
- TIO address breakdown:
- Index: tells you where to look in cache
- Tag: checks if the block is the correct block
- Offset: finds the byte you’re looking for
Lecture 17 - July 31
- Repeating patterns of block access which are offset can lead to no relevant data being stored in cache
- Associativity: we can now have an arbitrary number of blocks per set
- Kinda like a fixed-length linked list hashmap
- Direct-mapped caches are $E = 1$
- We need a valid bit to ensure that the data is not randomly initialized
- When replacing data in a cache, we typically replace the least recently used (LRU)
- On hardware, this is often not most recently used
- Three types of misses:
- Compulsory miss: first access to a block
- Conflict miss: multiple data objects map to the same slot
- Capacity miss: the active set of blocks of memory is larger than the cache
Lecture 18 - August 2
- There is the issue of writing to memory with caches
- Write-hit: data already in cache
- Write-through: write immediately to next level
- Write-back: defer write to next level until line is evicted
- Requires us to keep track of what has been modified, using the dirty bit
- Write-miss: data not in cache:
- Write allocate: load into cache, then execute the write-hit policy
- No-write allocate: simply write to next level immediately
- Caches are incredibly important for matmuls
- We can block the matricies into sub-matricies and matmul them - see linalg textbook
- We want to make sure that $3r^2 < C$
- Always use reasonably small set of data (spatial locality)
Lecture 19 - August 4
- Static global data: fixed size at compile time, exists for the entire runtime
- Stack allocated data: size known at runtime, no resizing
- It exists only for the lifetime of a function
- Dynamically allocated data: between the two extremes
- Use
malloc
and free
to do so
- Explicit allocation: programmer allocates and frees space
- C or other low-level languages
- Implicit allocator: programmer only allocates space
- Java, Python, high-level languages
%brk
points to the top of the heap, which grows up
- When we need more space, we increment
%brk
malloc
- Returns a
void*
: an arbitrary pointer
- Takes in a
size t size
, a long representing an amount of memory
- Returns
null
a pointer to allocated memory or NULL
if failed
- Good practice is:
(int*) malloc(n*sizeof(int));
calloc
: malloc
, but zeros all of the data
realloc
: changes the size of a previously allocated block
sbrk
: function to increase or shrink the heap (change %brk
)
free
- Takes in a pointer obtained by
malloc
- Returns the space to the heap
- No action occurs if you call
free
on NULL
malloc
allocates in words of 8 bytes
- Internal fragmentation happens when payload is smaller than the block
- External fragmentation: unused/unallocated memory between blocks in the heap
- We keep a header field for all allocations
- This stores the size of the block in a word
- As all of the sizes are multiples of 8, the last size bits are always zero
- Thus, we can use the last bit to store if it is free or not
Lecture 20 - August 7
- x86 typically uses 16-byte alignment
- One-word marker is the end of the list: size zero but allocated
- First fit allocation:
- Start at the beginning of the list, then jump until we have a free block of correct size
- Next fit allocation:
- Start the search where the previous search finished
- Best fit allocation:
- Search through the entire list, return the free block with closest size
- When we allocate in a free block with not excess space, we need to split it
- This means we add the correct size for remaining free blocks
- When freeing data, we could just zero the last bit
- This leads to the problem of false fragmentation
- Instead, whenever we free a block we coalesce with the next free blocks
- This works easilly with the next free blocks
- To coalesce backwards, we store the size in a word after the payload as well
- Explicit free lists:
- Allocated block stays the same
- Free blocks store a next and previous pointer in memory, where the payload would be
- This makes up a doubly-linked list
- The blocks are not necessarily sequentially linked in memory
- Explicit free list allocation policies:
- LIFO: insert block at the beginning of the free list
- Simple and constant time performance
- Bad caching performance
- Address-ordered policy: insert free list blocks in address order
- Requires linear time search
- Improved caching performance
- We don’t need the proceding boundary tag if the block is not free
- We can not use the tag by keeping an extra flag, is preceeding block free
- Comparisons between explicit and implicit free lists:
- Explicit free lists are faster when most of the memory is full
- Explicit free lists are more challenging to implement
- Implicit free lists require less pointers, leading to a smaller minimum block size
- Explicit free lists are most commonly used with segregated free lists
- These are different lists of multiple sizes or object types
- Minimum block size:
- When we split a block, we need to know if the leftover is worth it
- For an explicit free list, we need 32 bytes
- 4 words which are pointers and sizes for free blocks
Lecture 21 - August 9
- We use exceptional control flow to manage multiple processes at a time and low-level mechanisms
- Exceptions (hardware)
- Signals (OS)
- Typical control flow is sequential and reacts only to program state
- This can’t handle modern challenges, i.e. system interupts or program stalling
- Kernel: a core component of an operating system, privileged instructions
- The lowest layer and security mechanism
- For managing exceptions, we use an exception table
- This classifies different exceptions into their behaviors/responses
- Asynchronous exceptions:
- Interrupts: caused by events external to the processor
- e.g. Timer interrupt, a interrupt allowing the OS to take back control of programs
- Synchronous exceptions:
- Traps:
- Intentional way to transfer control to the OS to perform a function
- E.g. syscalls
- Returns control to the next instruction
- Faults:
- Unintentional but can be recovered
- E.g. divide-by-zero
- Either re-executes the instruction or aborts
- Aborts:
- Unintentional and unrecoverable
- E.g. hardware failure
- Aborts the current program
- Each program gets its own process, an illusion that a processor is entirely for itself
- A process is an interface between the program and hardware
- Programs get their own logical control flow
- It gives programs a private address space
- The OS tells each processor when to switch between programs
- To switch between processes, we need to store execution context
- This is context switching
- It saves current registers, schedules the next process, and loads the saved registers and address space
- Two processes run concurrently if their instructions run at the same time
- Kernel code manages processes as part of a user process, switching between them
- In Linux, when creating a new process:
- First, the current process is
fork()
‘ed
fork()
creates a copy of the current process
- It returns 0 to the child process and the child’s process ID (PID) to the parent
- Second,
exec*()
is called on the new process from the fork
exec*()
replaces a current process’ code and address space with that for another program
- A process graph is a way to visualize executions and processes
Lecture 22 - August 11
exec*()
functions loads a fresh instance of the specified program, the “Loader” from “CALL”
- Execution context:
- Data
- Static data and executables are copied from the program executable
- Code
- Copied from the program executable
- Heap
- Empty: no allocated memory could have yet been requested
- Stack
- Setup the
main
stack frame and command-line arguments
- Registers
main
takes two inputs, %rdi
and %rsi
%rip
and %rsp
need to point to main
and the stack
- Ending a process - three typical ways:
- Executing a
return
statement from main
- Calling the library function
exit()
- From an abort by the exception handler
- 1 and 2 provide a status code that can be read by the parent process
- A terminated process must be reaped to read the status code and delete resouces
- A process which is terminated but not yet reaped is a zombie process
- Reaping is done by the parent process can can be done explicitly or implicitly
- Explicitly: using
wait()
or waitpid()
calls
- Implicitly when the parent process terminates
- When a parent process terminates before a child process, responsibility for reaping is given to
init
or systemd
- Abstracting memory for specific programs is known as virtual memory
- Virtual address space: the amount of bytes a process thinks is available
- Physical address space: the amount of bytes that is physically available on RAM
- There is also extra space allocated on disk, called swap space for excess temporary memory
- The computer needs to map the virtual address space into physical address spaces
Lecture 23 - August 14
- The memory management unit (MMU) maps virtual addresses to physical addresses
- An n-bit CPU generated address is split into virtual page number and page offset
- The physical and virtual offsets are aligned
- We use a page table lookup table
- We replace virtual page numbers with physical page tables to generate physical addresses
- PTE: page table entry (PPN + management bits)
- Page tables store PPN and a valid bit (is the page in RAM?)
- If the valid bit is zero and the data is null, it is not mapped at all
- If it is zero and data is not null, the page is stored in disk (or other virtual memory)
- Each process has its own page table
- When we switch between processes, we also switch the page table (along with other parts of memory)
- Page hit: VM reference is in physical memory
- Page fault: VM reference is not in physical memory
- There are also read, write, and execute bits for the data
Lecture 24 - August 16