Skip to main content Link Menu Expand (external link) Document Search Copy Copied

CSE 332

Lecture 1 - June 21

  • What this course is:
    • Classic data structurs and algos
    • Queues, dicts, graphs, sorting
    • Parallelism and concurrency (new! exciting!)
  • Data structures are clever ways of storing information to perform efficent compututation
    • There are always tradeoffs with each option
  • Abstract Data Type (ADT) is a mathematical description of a “thing” with a set of operations
  • Data Structure: specific orginization of data and family of algos for implementing ADT
  • Implementation: the actual concrete code
  • We can use circular arrays or double-linked lists for queues
    • Both work!

Lecture 2 - June 23

  • What do we care about? (DSA)
    1. Correctness - does it work?
    2. Performance - speed! (and memory)
  • Evaluating the program is difficult - better to evaluate the algorithm
  • Analyzing code: counting code constructs:
    • Basic operations (indexing, assignment): $\text{constant time}$
    • Loops: $\text{num iterations} \times \text{loop body}$
    • Conditionals: $\text{time of condition} + \text{slowest branch}$
    • Function calls: $\text{timee of functions body}$
    • Recursion: $\text{solve recurrence equation}$
  • Complexity cases:
    • Worst-case complexity: max num of steps algo takes on “most challenging” input $n$
    • Best-case complexity: min num of steps algo takes on “easiest” input of size $n$
    • Average-case complexity: what does “average” mean?
    • Amortized-case complexity: we learn about this later
  • Linear search, best case: $O(1)$, worst case: $O(n)$
  • Asymptotic analysis: big $O$ notation
    • A function $f$ is in $O(\tilde{f})$ if $f$ and $\tilde{f}$ have the same asymptoic behavior

Lecture 3 - June 26

  • How to show $f(n)$ is in $O(g(n))$?
    • Pick a $c$ large enough to cover the constant factors
    • Pick a $n_0$ large enough to cover the lower order terms
  • Dropping coefficents is ususally okay, be careful otherwise
  • Upper bound: Big-O
  • Lower bound: Big-Omega, same as upper bound but for min
  • Big-Theta bound (tight bound): $f(n)$ is in the upper and lower bounds of $g(n)$
  • Worst/best-case vs asymptotic analysis
    • Think of it as comparing scenarios
  • Amortization: worst-case is too pessimistic
    • “what is the average run time of operations in the worst sequence?”
    • Not the average (average is an average input, this is average operations over a sequence of worst-case operations)

Lecture 4 - June 28

  • Priority Queue: Highest Priority, First Out
    • Holds comparable data
    • insert: adds an item at the end
    • deleteMin: finds, returns, and removes the minimum element in the queue
  • Heap: only pay for functionality needed (half-sorted)
    • Similar to trees!
    • Both operations are: $\Theta(log n)$
  • Tree terminology:
    • depth: how many levels from root to node
    • height: how far from “deepest” descendent leaf node
  • Binary tree: each node has max 2 children
  • n-ary tree: each node has max n children
  • Perfect tree: each row is completely full
  • Complete tree: each row is completely full except the bottom row
    • The bottom row must be filled up from left to right
  • (Binary Min-) Heap:
    • Complete binary tree structure
    • Every (non-root) node has a priority value greater than its parent
    • insert:
      • Preserve complete tree structure property
      • This might break heap order property
      • Percolate up to restore heap order property
    • deleteMin:
      • Remove root node
      • Move bottom side node to root
      • This might break heap order property
      • Percolate down to restore property
  • We can use an array to store the heap
    • We will skip index 0 to make the math easier
    • To get to the child nodes from node i, go to i*2 and i*2 + 1
    • Parent is integer division
  • To increase or decrease a key by an amount, we change it then percolate
  • To delete, we simply decrease the key by infinity, then delete min

Lecture 5 - June 30

  • Naively building a heap: insert for each element
    • This is $O(n * log n)$
  • We can do better: Floyd’s buildHeap
    • $O(n)$ runtime
    • Randomly add all elements into the array
    • Percolate down from each element one level above the leaves, up to the root
    • As each level increases the amount of operations but halves the number of iterations, it is $O(n)$
  • Counting recursive code:
    • Each recursive method call:
      • Perform some non-recursive work: $w(n)$
      • Call the method $T(n)$ on a smaller part of the list $T(n-1)$ (where $T(n)$ is the time or cost function)
      • Do some base case work: $b(n)$
  • Total cost: $T(n) = w(n) + T(n - 1)$ where some base case: $T(1) = b(n)$
    • This is basically the recurrence function/relation
    • General formula and closed form are ways to “solve” the function
  • How to solve a recurrence function?
  • Unrolling: substitution until we find a pattern
    1. Write a recurrence
    2. Find general formula: expand
      • Should still have a $T$ function on both sides
    3. Find closed form: find when base case occurs, then get to the base case
      • Now no $T$ or $i$: all constants or $n$
    4. Asymptotic analysis

Lecture 6 - June 3

  • Recurrence relation: Tree Method
    • We draw an actual tree of the work required for function calls
    • Allows us to find the amount of nodes quickly and the work per node
  • Common recurrences:
    • $T(n) = T(\frac{n}{2}) + c \in \Theta (log n)$: Binary search
    • $T(n) = 2T(\frac{n}{2}) + n \in \Theta (n log n)$: Merge sort
    • $T(n) = 2T(\frac{n}{2}) + c \in \Theta (n)$: Recursive binary sum
    • $T(n) = T(n - 1) + c \in \Theta (n)$: Recursive sum
  • Dictionary: associate associates key, value pairs
    • Set of unique key, value pairs
    • (For some) keys must be comparable
    • Operations:
      • insert(k, v): place k, v in dictionary
      • find(k): return the v associated with k
      • delete(k): delete and return the k, v
  • Set is just a dictionary with boolean values
  • Naieve implementation operations are $\Theta (n)$
    • find(k) is $\Theta (log n)$ for sorted arrays

Lecture 7 - June 5

  • Many data structure operations come down to preserving structure and order properties
  • Comparable keys allow for ordering in storage
  • Lazy deletion: we only “mark” as deleted - is the element there?
    • Allows for removing in batches
    • Increases find(k) runtime
  • Binary search trees (BST) have “average” $\Theta (log n)$ runtime, but could be $\Theta (n)$
    • All elements are either roots, left subtrees, or right subtrees
  • BST Delete:
    • Case 1: Leaf
      • Just remove the leaf
    • Case 2: One child
      • Connect parent to the child node
    • Case 3: Two children
      • We can replace with smallest element from right tree or largest from left tree, predecessor or successor
      • Find the max on the left subtree, swap with parent then delete
      • Find the min on the right subtree, swap with parent then delete
  • Worse case for BST occurs when the tree is unbalanced
  • Ideas to balance the BST: (while keeping fast runtime)
    • Left and right subtree of root have same number of nodes
      • Too weak
    • Left and right subtree of root have same height
      • Too weak
    • Left and right subtree of every tree have same number of nodes
      • Perfect tree, too strong
    • Left and right subtree of every tree have same height
      • Perfect tree, too strong
    • Left and right subtree of each node’s heights differ by at most 1
      • Ensures root height is $\Theta (log n)$
      • Quick: $\Theta (1)$ rotations
  • AVL Balance Property: balance(node) = height(node.left) - height(node.right)
  • AVL Tree: a BST with guaranteed balancing
    • BST with the added condition that for all nodes, -1 <= balance(node) <= 1
    • We need to keep track of the heights

Lecture 8 - June 7

  • AVL Operations:
    • Find: same as BST
    • Insert: BST insert, then check and fix balance
  • Let p be the problem node where an imbalance occurs
  • Two main cases: inserted node is in the outside or inside (e.g. right-right vs right-left)
    • Outside cases are solved with a “single rotation”
    • Inside cases are solved with a “double rotation”
  • Insert:
    • BST insert: afterwards, each node’s height in the path to the bottom might have changed
    • Recursive backtracking: calculate new height and detect any height imbalances
    • If imbalance, find case and rotate. Only the deepest p needs to be addressed
  • Single rotation: (left-left)
    • Move child of p to position of p
    • p becomes the “other” child
    • Old “other” child takes p’s new empty child
    • Other subtrees move accordingly
  • Double rotation:
    • Rotate into an outside case with p’s child and grandchild
    • Rotate p and p’s new child
      • These rotations are in different directions

Lecture 9 - June 10

  • Motivation behind B-Trees: $\Theta (1)$ memory access operations are not insignificant
    • Getting data from disk is many, many operations (even if linear time)
    • Thus, minimizing disk access is a goal
    • This is of course if data doesn’t fit in cache (i.e. databases)
  • If we’re accessing a byte in memory, might as well access a whole block and store it in cache
    • Temporal locality: if an address is referenced, it tends to be referenced
    • Spatial locality: if an address is referenced, addresses close by are more likely to be referenced soon
  • The amount of data moved from disk to memory is block size, memory to cache is line size
    • These are not under program control
  • Problem: large dictionaries requre that most of memory will be stored on disk
  • Desire: a balanced tree that is even shallower than AVL trees to minimize disk accesses
  • A key idea: increase branching factor of the tree
  • B Trees:
    • Two types of nodes
      • Internal nodes, having M-1 keys and M children
      • Leaf nodes, storing up to L sorted data items
    • Order property:
      • Subtree between keys a and b contains only data which is: a <= x < b
    • Structure properties:
      • Root: If tree has <= L items, root is a leaf, else has between 2 and M children
      • Internal nodes: have between ceil(M/2) and M children (i.e. at least half full)
      • Leaf nodes: all leaves at the same depth, have between ceil(L/2) and L data items
    • Any M > 2 and L will work, but we pick them based on disk-block size
  • Many trees stored in one internal node, binary search can be inconsequential compared to disk access
  • Only bring one leaf of data items into memory, no need for loading unnecessary items into memory
  • Adding to a leaf which is not already full is easy, just add like a sorted list
  • When a leaf is full, we split it into two leaves and add them to the parent
  • If the parent is full (so we cannot add another child), we split the parent
  • If the parents parent … we eventually split the root, where we create a new root with the two children
    • This is how a B-Tree gains height
  • Deletion:
    • First remove the item
    • Then if the leaf contains too few, adopt from a neighbor
      • Unless neighbor is at minimum, then combine to create new child
      • If parent is now at minimum, adopt from a neighbor
      • repeat…

Lecture 10 - June 12

  • Now switching from tree-based data structures to array-based ones
  • Hashing: take in a key, convert it into an integer
    • This allows us to index into an array with the integer
  • If two keys have the same hash, we have a collision
    • Always will happen: infinite keys, finite array length
  • Hash table runtime average is close to $\Theta (1)$
  • Hash tables don’t allow us to order well unlike trees
  • We expect the number of key-value pairs to be much less than the possible domain
  • Ideal hash function is fast and avoids excessive collisions
  • Often the case that the client implements some hash function, then program mods by the array length
    • In java, the client will override the hashCode method
  • Use prime number as a table size
    • Real life data tends to follow a pattern, primes don’t follow them
  • Try to use the most identifiable fields into the hash function
  • Trade-off between how many fields and amount of collisions
    • We use educated guesses
  • If the key space is an int, using int and modulo is intuitive
    • Even if every value has a distinct key, still collisions because mod
  • Number one step is to rely on expertise of others
  • Separate chaining:
    • All keys that map to the same table location are added to a list
  • Worst case runtime for find is $\Theta (n)$
    • Not worth optimizing if your hash function isn’t terrible
  • Load factor is the average number of elements per bucket
    • Want to keep load factor constant, prevent it from being too large
    • Typically if lambda is greater than one, increase the table size
  • delete is the same as find intuitively

Lecture 11 - July 17

  • Open Addressing: attempt to not store chains
    • Resolving collisions by trying a sequence of other positions in the table
  • Linar Probing: when inserting, if index x is full, then try x+1
    • Repeat as necessary…
    • We are “probing” the array “linarlly”
  • Probing: ith probe into array: (h(key) + f(i)) % TableSize
  • Open addressing is much worse as load factor increses
  • To find using probing, we follow probe sequence until we find the key or empty spot
  • Delete: use lazy deletion, replace value with a flag
  • Primary clustering
    • Linar probing tends to produce clusters of values
    • This leads to runtime issues
  • Quadratic probing: ith probe into array: (h(key) + i**2) % TableSize
    • Possible to have infinite cycles because of mod
  • If TableSize is prime and the load factor is less than 0.5, quadratic probing will find empty slot in maximum TableSize / 2 probe
  • Secondary clutering happens when the two keys hash to the same index, probing doesn’t help
  • Double hashing: f(i) = i*g(key)
    • Idea is that two hashes won’t collide much
    • Ensure g(key) != 0 otherwise infinite loop
    • Can still lead to infinite cycles
    • Important to pick good functions with guarantees
  • When we resize the array we need to rehash everything
    • Good to resize separate chaining when load factor approaches one, open addressing half
  • Good to make size twice as big, but that won’t be prime!
    • Keep a hardcoded list of primes at good intervals
    • If it continues to grow, say screw it and double (I think)
  • View slides or Effective Java for tips on how to hash different data types

Lecture 12

  • Sorting goals:
    • Stable: preserve original ordering in case of ties
    • In-Place: only use a constant amount of auxiliary storage
    • Fast: $O(n log n)$ and/or good constant factors
  • Simple algorithms: $O(n^2)$
    • Insertion sort
    • Selection sort
    • Bubble sort
  • Fancier algorithms: $O(n log n)$
    • Heap sort
    • Merge sort
    • Quick sort (avg)
  • Specialized algorithms: $O(n)$
    • Bucket sort
    • Radix sort
  • Insertion sort:
    • Maintain a sorted subarray, insert the unsorted elements into it
    • Stable!
    • In-place!
    • Slow :(

Lecture 12 - July 19

  • Selection sort:
    • Maintain a sorted subarray, find the smallest remaining element and append it to sorted
    • Stable!
    • In-place!
    • $O(n^2)$, but good constant factors
  • Bubble sort: (it sucks!)
    • $O(n^2)$ and bad constant factors
    • Never use
  • Heap sort:
    • Put all elements into a heap, then remove the elements
    • Not stable
    • Can be in-place by treating the inital array as a heap, then inserting into back of array
    • $O(n log n)$, but bad constant factors
  • AVL sort:
    • We pretend this doesn’t exist!
    • Not in-place, worse constant factors
    • Heap sort is strictly better
  • Merge sort: divide and conquer!
    • Sort left and right halves of elements recursivly, then combine into one list
    • Stable (always pick from left array first)
    • Not in-place: $O(n)$
    • $O(n log n)$, bad constant factors because of copying
  • Quick sort:
    • Make sure to use the 332 quicksort
    • Algorithm:
      1. Pick a pivot element, hopefully the median element
      2. Divide the array into two pieces, less than and greater than pivot
      3. Recursivly sort the two pieces
      4. Combine the sorted pieces
    • Picking a good pivot:
      1. Pick first or last element: fast to pick but likely worst-case
      2. Pick random element: good, but randomness is expensive
      3. Median of 3: pick median of first, middle, and last element, overall good
    • How to split into two pieces: Hoare partitioning
      1. Swap pivot with arr[lo] (move it out of the way)
      2. Use two pointers l and r starting at lo+1 and hi-1 and move inwards
      3. Put pivot back in middle: swap with arr[l]
    • Not stable
    • In-place!
    • $O(n log n)$ average case, worst constant factors
    • In practice, way, way better

Lecture 13 - July 21

  • Comparison sorting: CUTOFF strategy
    • Use comparison sort until the array is small enough, then use a sort with good constants
  • All comparison sorting algorithms have a lower bound of $O(n log n)$
  • Bucket sort:
    • Find the min and max value, make an auxillary array of that range
    • Iterate through the array and count how many of each number
    • Copy the auxillary array into the original array
    • Stable!
    • Not in-place
      • Possibly very large memory usage
    • Good speed $O(n + k)$ and constants
    • Non-integers: make each bucket a linked list of items
  • Radix sort:
    • For each digit (starting from LSD), implement bucket sort and repeat
    • As it is stable, this allows for ordering
    • Stable
    • Not in-place
    • Very fast, $O(d(n+k))$ and good constant factors

Lecture 14 - July 24

  • A graph is a mathematical representation of a set of objects connected by links
    • Verticies and edges!
  • Terminology:
    • Undirected graphs: edges have no direction, two-way, facebook friends
    • Degree of a vertex: number of edges containing that vertex (undirected)
    • Directed graphs: edges have a direction, pointing, instagram follows
    • In-Degree of a vertex: the number of in-bound edges
    • Out-Degree of a vertex: the number of outbound edges
    • Weighted graphs: each edge has a weight or cost, represting relationships
    • Walk: a sequence of adjacent vertices
    • Path: a walk that doesn’t repeat a vertex
    • Cycle: a walk that doesn’t repeat a vertex except the first and last vertex
    • Path length: number of edges in a path
    • Path cost: the sum of the weights in the path
    • Undirected graph connectivity: an undirected graph is connected if there exists a path between each pair of vertices
    • Undirected graph completeness: an undirected graph is complete if there is an edge between each pair of vertices
    • Directed strong connectivity: a directed graph is strongly connected if there is a path between each pair of vertices
    • Directed weakly connectivity: a directed graph is weakly connected if there exists a path between each pair of vertices ignoring edge directions
    • Directed graph completeness: a directed graph is complete if for all pairs of vertices there is an edge in each direction
  • Self-edges: we pretend they don’t exist… (in 332)
  • Trees are an undirected, acyclic, and connected graph
  • Trees are rooted graphs where we think of edges as directed
  • Directed Acyclic Graphs (DAGs):
    • Directed graph with no cycles
  • Maximum number of edges:
    • Undirected: $0 \leq \mid E \mid < \mid V \mid ^2$
    • Directed: $0 \leq \mid E \mid \leq \mid V \mid ^2$
    • So: $\mid E \mid \in O(\mid V \mid ^2)$
  • Sparse graph: $\mid E \mid \in \Theta (\mid V \mid)$
  • Dense graph: $\mid E \mid \in \Theta (\mid V \mid ^2)$
  • Graphs: the data structure
    • Common operations:
      • Is $(v, u)$ an edge?
      • “What are the neighbors of $v$?
    • Two standards:
      • Adjacency Matrix
      • Adjacency List
  • Adjacency Matrix: a 2D array of booleans representing the presence of an edge
    • Properties:
      • Get a vertex’s out-bound edges: $O(\mid V \mid)$
      • Get a vertex’s in-bound edges: $O(\mid V \mid)$
      • Decide if edge exists: $O(1)$
      • Insert an edge: $O(1)$
      • Delete an edge: $O(1)$
    • Space requirements: $O(\mid V \mid ^2)$
    • Adding vertices is not really possible
    • Better for dense graphs (you’re already using the memory and complexity)
    • Can be weighted by using non-booleans as type
      • Then you must define “not an edge”
  • Adjacency List: an array of vertices each with a linked list of edges
    • Properties:
      • Get a vertex’s out-bound edges: $O(d)$
      • Get a vertex’s in-bound edges: $O(\mid V \mid + \mid E \mid )$
      • Decide if edge exists: $O(d)$
      • Insert an edge: $O(1)$
      • Delete an edge: $O(d)$
    • Space requirements: $O(\mid V \mid + \mid E \mid )$
    • Better for sparse graphs (more operations depend on the number of elements)
  • Processing a node: “doing something” at a node
  • Visiting a node: exploring it

Lecture 15 - July 26

  • Basic idea: follow nodes and mark them to prevent cycles
  • Use slide code for the algorithms
  • DFS uses a stack, BFS uses a queue (that’s the main difference)
  • Iterative DFS:
    • Use a stack, we process after popping each node
  • Recursive DFS:
    • The “stack” is the program’s recursive stack
  • BFS:
    • Similar algorithm structure, just use a queue
  • Comparisons:
    • DFS
      • Typically uses much less memory
      • Applications: topological sorting, cycle detection
    • BFS
      • Typically uses more memory
      • Applications: shortest paths
    • Iterative Deep DFS (IDDFS)
      • Use DFS with increasing depth limits
      • Good memory and finds the shortest path
  • To get the final path, we include the predecessor node in the mark
    • To get path, we then backtrack
    • Often implemented with an array
  • Shortest paths for weighted graphs: we often assume no negative weights
  • Dijkstra’s Algorithm:
    • Initially, start node has cost 0 and all other nodes have infinite cost
    • At each step:
      1. Pick closesst unvisited vertex v
      2. Add it to the set of visited vertices
      3. Update distances for nodes with edges from v

Lecture 16 - July 28

  • Dijkstra’s is a greedy algorithm
    • At each step, it does what seems best at that step
    • It explores every node, so it must find globablly optimal path
    • Unoptimal runtime is: $O(\mid V \mid ^2 + \mid E \mid )$
  • Can make more optimal by using a minHeap: $O(\mid V \mid log \mid V \mid + \mid E \mid log \mid V \mid )$
    • $O( \mid V \mid log \mid V \mid )$ for a sparse graph
  • Parallelism
  • We typically have worked with sequential programming: one thing happens after another
  • Removing this assumption means many new things to think about:
    • How can we divide work between different threads and synchronize?
    • How can we parallelize algorithms?
    • How can we support concurrent access with datastructures?
  • This means we should be able to do multiple things once in one program
  • Parallelism: using extra resources to solve a problem faster
  • Concurrency: correctly and efficiently manage access to shared resources
  • Threads can communicate by writing to shared locations
    • Need to make sure that it doesn’t write in bad way!
  • Message-passing: threads communicate by sending and receiving messages
  • Wait for one thread to coordinate all of the worker threads
  • Java threading basics:
    • We will use java.lang.Thread to start off with, then use a better library later
    • Define and instantiate a subclass C of java.lang.Thread, overriding run`
    • Call the objects start method
      • run method runs in the current thread
  • Fork-Join Framework:
    • To start parallelism, we must use ForkJoinPool.invoke(RECURSIVE_ACTION)
      • This calls the object’s compute field
      • Each program should only have one ForkJoinPool, use commonPool to get static
    • Subclass RecursiveAction to use fork, compute, and join
      • compute is the code we want to run
      • fork starts a new thread running compute
      • join waits for the thread to finish
      • Better for code which don’t return anything
    • Subclasss RecursiveTask if you do want to return values

Lecture 17 - July 31

  • The fork-join pattern is quite common for solving many different types of problems
    • All that changes is the base-case and way we combine results
    • These are called reductions
    • It is required that the operation is associative
  • A parallel map is applying an operation on each input element independently
    • E.g. multiplying each element of an array by two
    • Better to subclass RecursiveAction as nothing is returned
  • To use divide-and-conquer parallelism, we need to be able to dive the problem up efficently
    • We can still use maps on sequential data to speedup the process, think HF DS maps
  • Analyzing Fork-Join algorithms:
    • $T_P$ is the time a program takes to run if there are $P$ processors available
    • $T_1$ is the time it takes to run if there is one processor, note: not sequential algorithm
      • This is called the work, the total run time of all parts of the algorithm
    • $T_{\infty}$ is the span, how long it takes to run on an infinite amount of processors
      • Also called: critical path length or computational depth
  • We can describe a parallel program execution as a DAG:
    • Nodes are constant-time pieces of work the program performs
      • The number of nodes adds up to $T_1$
    • Edges represent computational dependencies, what we must complete before moving onto another node
      • $T_{\infty}$ is the length of the longest path in the DAG
  • A parallel reduction is two balanced trees on top of each other
    • This allows $T_1$ and $T_{\infty}$ to become simple graph properties
    • $T_1$ is $O(2n)$, because two trees
    • $T_{\infty}$ is $O(log n)$, because that’s the height of the trees
  • The speedup on $P$ processors is: $\frac{T_P}{T_1}$
    • Note that this is often dishonest, as a true sequential algorithm might be faster
  • Perfect speedup is when $\frac{T_P}{T_1} = P$, and is rare
  • Perfect linear speedup is the case where doubling $P$ halves the running time
  • $\frac{T_1}{T_{\infty}}$ is the parallelism of an algorithm
    • Parallel reduction: $\frac{n}{logn}$, so there is an exponential speedup
  • The ForkJoin Framework has the expected time bound: $T_P$ is $O(\frac{T_1}{P} + T_{\infty}$
    • There is some randomness involved
    • This works as long as each thread is expected to be about the same amount of work
    • Also, all threads are expected to do a small but not tiny amount of work
      • This is approx 5000 (within an order of magnitude) computational steps
  • Amdahl’s Law:
    • $\frac{T_1}{T_P} = \frac{1}{S + \frac{1 - S}{P}}$
    • As $P$ goes to $\infty$, $\frac{T_1}{T_{\infty}} = \frac{1}{S}$
  • Takeaways from Amdahl’s Law:
    • This means that scaling computational resources leads to non-linear speedups
    • We cannot always rely on extra compute for speedups when there are chunks of sequential code
    • We can always find new algorithms work with the Law
    • We can still use parallelism to solve bigger problems, but not faster
      • This happens when some parallel parts grow at a faster rate then the sequential parts

Lecture 18 - August 2

  • Parallel-Prefix Sum:
    • We run this in two parts: first create a tree where each node is the sum of a portion of the array
      • The leaves are the sums of the one-element leaves (we would use sequential cutoff)
    • We run through the tree, passing along as an argument the sum of the values to the left of the node
    • This allows the “sequential” problem to have in $O(logn)$ span
    • This pattern is good for problems where you want to do operation to the left of i
  • Pack: produce an array with the same order where all elements satisfy some condition
    1. Perform a parallel map to produce a bit (binary) vector of elements that satisfy a condition
    2. Perform parallel prefix sum to get the number of indicies to the left which satisfy the condition
    3. Run a parallel map for each index to move the index value from the input map to the new output map
      • Use bitsum to get the output indicief
      • Use bitvector to check if the value satisfies the condition
  • Parallel Quicksort:
    • Parallelize sorting both partitions
    • Parallelize partitioning using an auxillary array
  • Parallel Mergesort:
    • We can parallelize recursivly sorting the halves easilly, but we need more for exponential speedup
    • To do so, we need to parallelize the merge operation:
      • Determine the median element of the larger array: $O(1)$
      • Binary search to find the position of the median in the smaller array: $O(logn)$
      • Recursivly merge the left half of the large array with the left portions of the small array
      • Recursivly merge the right half in the same way

Lecture 19 - August 4

  • Concurrent programming is about controlling access to shared resources
  • In concurrent programming, threads are largely doing their own thing
    • Operations are interleaved, happening before, after, or at the same time as other threads
  • We must enforce mutual exclusion on memory which might change
    • We do this by “hanging a sign” telling other threads to wait
    • Checking there is no sign and then hanging the sign must be one operation
    • The work done while the sign is hanging is a critical section
    • We typically do this using synchronization primitives within a language
  • A mutual-exclusion lock (lock or mutex) is an abstract datatype for concurrent programming
    • new creates a new lock which is “not held”
    • acquire takes a lock and blocks it until it is currently “not held”, then sets it to “held” and returns
    • release takes a lock and sets it to “not held”
  • It is important to ensure that exceptions do not prevent a lock from ever getting released
  • Locks which allow a thread re-acquire a lock it holds are called reentrant locks
  • We use the synchronized keyword in Java to evaluate, acquire, and release a lock
    • This seems somewhat similar to the Python with statement
  • As any object can be a lock in Java, we can just use the instance (this) as the lock
  • If the entire method should be synchronized, we can use that as a method keyword instead

Lecture 20 - August 7

  • A race condition is a bug in a program that makes program correctness dependent on the order that threads execute
  • There are two kinds of data races:
    1. When one thread reads and another writes at the same time
    2. When both threads attempt to write at the same time
  • Even if we think that it wouldn’t cause bad behavior, never have race conditions
  • When running programs, there are often hidden intermediate states
    • This is like times where the rep inv isn’t true
    • If a program reads during this intermediate state, things get weird
    • These are critical sections
  • One reason why data races are not allowed is because of compiler optimizations
  • The Grand Comprimise:
    • The programmer promises not to write data races
    • The compiler will make optimizations seem as if there was a global interweaving
  • We can also declare fields as volitile, making accesses not count as data races
    • They are good when there is only one shared field
  • Concurrency Programming Guidelines
  • Conceptually split memory into three parts: all memory should be at least one of the following
    1. Memory that is thread-local
      • It is typically better to copy data and have more thread-local memory
    2. Memory that is immutable (after initialization)
      • Make new objects instead of updating fields
    3. Memory that is synchronized, where locks are needed
  • Approaches to synchronization:
    1. Avoid data races
    2. Use consistant locking (and document it)
    3. Use more course-grained locking and only be more specific if performance is hurt
    4. Keep critical sections small and do not perform expensive operations in them
    5. Think of what needs to be atomic, then determine a locking strategy
    6. Always rely on the standard libraries
  • Deadlock: threads being blocked forever
    • This happens where there is a cycle of waiting
    • We can define an ordering for which locks are allocated

Lecture 21 - August 9

  • Topological Sort
    • Given a DAG, output all vertices in order such that no parent after children
      • Think of class pre-reqs
    • This can be used in backpropagation algorithms or other dependency graphs
    • There can be multiple legal permutations per DAG
  • Simple Topo-sort algorithm
    1. Find all the in-degrees for each vertex
      • These can be stored in a data structure
    2. While not all vertices are output:
      • Choose a vertex $v$ labed with in-degree of 0
      • Output $v$ and remove it from the graph (or add to an outside set)
      • For each vertex $w$ adjacent to $v$, decrement in-degree of $w$
  • By using a queue, we can lower the time complexity to $O(V + E)$

Lecture 22 - August 11

  • Minimum spanning trees: the minimal cost graph such that all nodes are connected
    • Undirected graph
  • Two main algorithms to find the MST:
    • Prim’s algorithm: similar to Dijkstra’s
      • Randomly pick a vertex, then use Dijkstra’s from there
      • Pick the vertices which are the lowest cost
      • The random pick of vertex leads to different possible MSTs (if they exists)
    • Kruskals’ algorithm: completely different
      • We make clusters of sub-MSTs, then they end up unifying
  • Prim’s:
    1. Pick random element
    2. Djikstra’s connecting unconnected nodes to connected nodes
    3. Create MST by connecting edges with their preds
  • Kruskals’:
    1. Sort all edges by weight
    2. Add edges in order if they connect nodes to MSTs they don’t already connect to
  • Disjoint Set ADT:
    • Union(x, y): combines the two sets which contain x and y
      • Constant time
    • Find(x): returns the name of the set containing x
      • Amortized constant time

Lecture 23 - August 14

  • A decision problem: a problem that takes some input and returns true or false
  • Halting problem: takes in some code and an input, returns if it terminates
  • Defintion of $P$ in $P = NP$, a problem is $P$ if it satisfies:
    1. It’s a decision problem
    2. There exists a polynomial time algorithm to solve it
  • Euler Circuit Problem:
    • Input: Undirected graph $G$
    • Output: True iff there is a path in $G$ that visits each edge exactly once
    • Polynomial time solution: true iff
      • Degree of every vertex is even
      • Connected graph
  • Definition of $NP$, non-deterministic polynomial: a problem is in $NP$ iff:
    1. It’s a decision problem
    2. For every input where the output is true, there exists some quick way to verify output is true
  • Every problem in $P$ is also $NP$, i.e. $P \subset NP$
  • Is $NP \subset P$? Is $P = NP$?
    • We don’t know…
    • Most people believe that $P \neq NP$
  • Hamiltonian Circuit Problem:
    • Input A graph $G$
    • Output: True iff there exists a cycle in $G$ that visits every vertex exactly once
    • Much more challenging than Euler Circut Problem
    • $NP$-Complete
  • $NP$-Complete: they are the hardest problems in $NP$
    • A problem is $NP$-complete if:
      1. It’s $NP$ itself
      2. Every problem in $NP$ is polynomial time reducible to it
    • This means that if any $NP$-complete problem is in $NP$, then $P = NP$
  • Reduction:
    • Consider two decision problems $A$ and $B$
    • $A$ reduces to $B$ if we can solve $A$ using a solution to problem $B$
    • Notation: $A \leq B$
    • We say that $A$ reduces to $B$ in polynomial time if we can solve $A$ using a polynomial number of calls to $B$
    • If $B \in P$ and $A \leq B$ then $A \in P$
  • The fact that there are thousands of $NP$-complete problems and we haven’t been able to find one that is in $P$, thus people generally believe that $P \neq NP$
  • $NP$-Hard: $NP$-complete problems which aren’t in $NP$
    • Example problem: $N \times N$ chessboard, verify a move is optimal
  • What to do when faced with a probelm that you think is hard?
    • Confirm that it is actually hard
      • Take an $NP$-complete problem, take your problem, show that the $NP$-complete problem reduces to your problem
      • If so, stop trying to find a general solution to your problem lol
      • Instead, possibly use an approximation algorithm (within a constant factor of true)
      • Possibly used a randomized algorithm, repeat using the randomized algorithm
      • Consider special cases for your problem

Lecture 24 - August 16

  • $NP$-Hard: every $NP$ problem can be reduced to it
  • $NP$-Complete: $NP$-hard and $NP$
  • If an $NP$-hard problem is in $NP$, then $P = NP$
  • 3-Colorable:
    • Input: undirected graph $G$
    • Output: true iff you can color each vertex of the graph such that no two adjacent verticies have the same color
    • $NP$-Complete
  • 2-Colorable: a related problem
    • Not $NP$-Complete
    • All verticies that are at distance $i+1$ from the start must be colored differently from those at distance $i$
    • Proof is to color it, return true if it works, else false
  • 2-Colorable reduction to 3-Colorable
    • Add a dummy vertex to each other node - this takes up one of the three colors
    • Thus, 3-Colorable on that graph will test if it is 2-Colorable
  • 3-SAT:
    • Terminology:
      • Literal: a boolean variable (or its negation)
      • Clause: a series of literals, OR’d together
    • Input: a series of clauses AND’d together, each clause has at most 3 literals
    • Output: true iff there is a setting of boolean variables such that the expression is true
    • $NP$-Complete
      • The general problem of satisfiability is reduciable to 3-SAT
  • Vertex Cover:
    • Input: undirected graph $G$, integer $k$
    • Output: true iff there is a set of verticies of size $k$ such that every edge has at least one vertex in the set
    • $NP$-Complete
    • “The set of vertices covers all edges”
    • Special cases:
      • If the graph is a tree, there is a linear time solution
      • If the graph is 2-Colorable, there is a polynomial time solution
    • Approximation algorithm for finding minimum vertex cover
      • At most $2x$ larger than optimal
      • While there are edges in $G$:
      • Pick an arbitrary one
      • Put both nodes in the vertex cover and delete nodes/edges connecting to those nodes