CSE 332
Lecture 1 - June 21
- What this course is:
- Classic data structurs and algos
- Queues, dicts, graphs, sorting
- Parallelism and concurrency (new! exciting!)
- Data structures are clever ways of storing information to perform efficent compututation
- There are always tradeoffs with each option
- Abstract Data Type (ADT) is a mathematical description of a “thing” with a set of operations
- Data Structure: specific orginization of data and family of algos for implementing ADT
- Implementation: the actual concrete code
- We can use circular arrays or double-linked lists for queues
- Both work!
Lecture 2 - June 23
- What do we care about? (DSA)
- Correctness - does it work?
- Performance - speed! (and memory)
- Evaluating the program is difficult - better to evaluate the algorithm
- Analyzing code: counting code constructs:
- Basic operations (indexing, assignment): $\text{constant time}$
- Loops: $\text{num iterations} \times \text{loop body}$
- Conditionals: $\text{time of condition} + \text{slowest branch}$
- Function calls: $\text{timee of functions body}$
- Recursion: $\text{solve recurrence equation}$
- Complexity cases:
- Worst-case complexity: max num of steps algo takes on “most challenging” input $n$
- Best-case complexity: min num of steps algo takes on “easiest” input of size $n$
- Average-case complexity: what does “average” mean?
- Amortized-case complexity: we learn about this later
- Linear search, best case: $O(1)$, worst case: $O(n)$
- Asymptotic analysis: big $O$ notation
- A function $f$ is in $O(\tilde{f})$ if $f$ and $\tilde{f}$ have the same asymptoic behavior
Lecture 3 - June 26
- How to show $f(n)$ is in $O(g(n))$?
- Pick a $c$ large enough to cover the constant factors
- Pick a $n_0$ large enough to cover the lower order terms
- Dropping coefficents is ususally okay, be careful otherwise
- Upper bound: Big-O
- Lower bound: Big-Omega, same as upper bound but for min
- Big-Theta bound (tight bound): $f(n)$ is in the upper and lower bounds of $g(n)$
- Worst/best-case vs asymptotic analysis
- Think of it as comparing scenarios
- Amortization: worst-case is too pessimistic
- “what is the average run time of operations in the worst sequence?”
- Not the average (average is an average input, this is average operations over a sequence of worst-case operations)
Lecture 4 - June 28
- Priority Queue: Highest Priority, First Out
- Holds comparable data
insert
: adds an item at the enddeleteMin
: finds, returns, and removes the minimum element in the queue
- Heap: only pay for functionality needed (half-sorted)
- Similar to trees!
- Both operations are: $\Theta(log n)$
- Tree terminology:
depth
: how many levels from root to nodeheight
: how far from “deepest” descendent leaf node
- Binary tree: each node has max 2 children
n
-ary tree: each node has maxn
children- Perfect tree: each row is completely full
- Complete tree: each row is completely full except the bottom row
- The bottom row must be filled up from left to right
- (Binary Min-) Heap:
- Complete binary tree structure
- Every (non-root) node has a priority value greater than its parent
insert:
- Preserve complete tree structure property
- This might break heap order property
- Percolate up to restore heap order property
deleteMin:
- Remove root node
- Move bottom side node to root
- This might break heap order property
- Percolate down to restore property
- We can use an array to store the heap
- We will skip index 0 to make the math easier
- To get to the child nodes from node
i
, go toi*2
andi*2 + 1
- Parent is integer division
- To increase or decrease a key by an amount, we change it then percolate
- To delete, we simply decrease the key by infinity, then delete min
Lecture 5 - June 30
- Naively building a heap:
insert
for each element- This is $O(n * log n)$
- We can do better: Floyd’s
buildHeap
- $O(n)$ runtime
- Randomly add all elements into the array
- Percolate down from each element one level above the leaves, up to the root
- As each level increases the amount of operations but halves the number of iterations, it is $O(n)$
- Counting recursive code:
- Each recursive method call:
- Perform some non-recursive work: $w(n)$
- Call the method $T(n)$ on a smaller part of the list $T(n-1)$ (where $T(n)$ is the time or cost function)
- Do some base case work: $b(n)$
- Each recursive method call:
- Total cost: $T(n) = w(n) + T(n - 1)$ where some base case: $T(1) = b(n)$
- This is basically the recurrence function/relation
- General formula and closed form are ways to “solve” the function
- How to solve a recurrence function?
- Unrolling: substitution until we find a pattern
- Write a recurrence
- Find general formula: expand
- Should still have a $T$ function on both sides
- Find closed form: find when base case occurs, then get to the base case
- Now no $T$ or $i$: all constants or $n$
- Asymptotic analysis
Lecture 6 - June 3
- Recurrence relation: Tree Method
- We draw an actual tree of the work required for function calls
- Allows us to find the amount of nodes quickly and the work per node
- Common recurrences:
- $T(n) = T(\frac{n}{2}) + c \in \Theta (log n)$: Binary search
- $T(n) = 2T(\frac{n}{2}) + n \in \Theta (n log n)$: Merge sort
- $T(n) = 2T(\frac{n}{2}) + c \in \Theta (n)$: Recursive binary sum
- $T(n) = T(n - 1) + c \in \Theta (n)$: Recursive sum
- Dictionary: associate associates
key
,value
pairs- Set of unique
key
,value
pairs - (For some) keys must be comparable
- Operations:
insert(k, v)
: placek
,v
in dictionaryfind(k)
: return thev
associated withk
delete(k)
: delete and return thek
,v
- Set of unique
- Set is just a dictionary with boolean values
- Naieve implementation operations are $\Theta (n)$
find(k)
is $\Theta (log n)$ for sorted arrays
Lecture 7 - June 5
- Many data structure operations come down to preserving structure and order properties
- Comparable keys allow for ordering in storage
- Lazy deletion: we only “mark” as deleted - is the element there?
- Allows for removing in batches
- Increases
find(k)
runtime
- Binary search trees (BST) have “average” $\Theta (log n)$ runtime, but could be $\Theta (n)$
- All elements are either roots, left subtrees, or right subtrees
- BST Delete:
- Case 1: Leaf
- Just remove the leaf
- Case 2: One child
- Connect parent to the child node
- Case 3: Two children
- We can replace with smallest element from right tree or largest from left tree, predecessor or successor
- Find the max on the left subtree, swap with parent then delete
- Find the min on the right subtree, swap with parent then delete
- Case 1: Leaf
- Worse case for BST occurs when the tree is unbalanced
- Ideas to balance the BST: (while keeping fast runtime)
- Left and right subtree of
root
have same number of nodes- Too weak
- Left and right subtree of
root
have same height- Too weak
- Left and right subtree of every tree have same number of nodes
- Perfect tree, too strong
- Left and right subtree of every tree have same height
- Perfect tree, too strong
- Left and right subtree of each node’s heights differ by at most 1
- Ensures
root
height is $\Theta (log n)$ - Quick: $\Theta (1)$ rotations
- Ensures
- Left and right subtree of
- AVL Balance Property:
balance(node) = height(node.left) - height(node.right)
- AVL Tree: a BST with guaranteed balancing
- BST with the added condition that for all
node
s,-1 <= balance(node) <= 1
- We need to keep track of the heights
- BST with the added condition that for all
Lecture 8 - June 7
- AVL Operations:
- Find: same as BST
- Insert: BST insert, then check and fix balance
- Let
p
be the problem node where an imbalance occurs - Two main cases: inserted node is in the outside or inside (e.g. right-right vs right-left)
- Outside cases are solved with a “single rotation”
- Inside cases are solved with a “double rotation”
- Insert:
- BST insert: afterwards, each node’s height in the path to the bottom might have changed
- Recursive backtracking: calculate new height and detect any height imbalances
- If imbalance, find case and rotate. Only the deepest
p
needs to be addressed
- Single rotation: (left-left)
- Move child of
p
to position ofp
p
becomes the “other” child- Old “other” child takes
p
’s new empty child - Other subtrees move accordingly
- Move child of
- Double rotation:
- Rotate into an outside case with
p
’s child and grandchild - Rotate
p
andp
’s new child- These rotations are in different directions
- Rotate into an outside case with
Lecture 9 - June 10
- Motivation behind B-Trees: $\Theta (1)$ memory access operations are not insignificant
- Getting data from disk is many, many operations (even if linear time)
- Thus, minimizing disk access is a goal
- This is of course if data doesn’t fit in cache (i.e. databases)
- If we’re accessing a byte in memory, might as well access a whole block and store it in cache
- Temporal locality: if an address is referenced, it tends to be referenced
- Spatial locality: if an address is referenced, addresses close by are more likely to be referenced soon
- The amount of data moved from disk to memory is block size, memory to cache is line size
- These are not under program control
- Problem: large dictionaries requre that most of memory will be stored on disk
- Desire: a balanced tree that is even shallower than AVL trees to minimize disk accesses
- A key idea: increase branching factor of the tree
- B Trees:
- Two types of nodes
- Internal nodes, having M-1 keys and M children
- Leaf nodes, storing up to L sorted data items
- Order property:
- Subtree between keys a and b contains only data which is: a <= x < b
- Structure properties:
- Root: If tree has <= L items, root is a leaf, else has between 2 and M children
- Internal nodes: have between ceil(M/2) and M children (i.e. at least half full)
- Leaf nodes: all leaves at the same depth, have between ceil(L/2) and L data items
- Any M > 2 and L will work, but we pick them based on disk-block size
- Two types of nodes
- Many trees stored in one internal node, binary search can be inconsequential compared to disk access
- Only bring one leaf of data items into memory, no need for loading unnecessary items into memory
- Adding to a leaf which is not already full is easy, just add like a sorted list
- When a leaf is full, we split it into two leaves and add them to the parent
- If the parent is full (so we cannot add another child), we split the parent
- If the parents parent … we eventually split the root, where we create a new root with the two children
- This is how a B-Tree gains height
- Deletion:
- First remove the item
- Then if the leaf contains too few, adopt from a neighbor
- Unless neighbor is at minimum, then combine to create new child
- If parent is now at minimum, adopt from a neighbor
- repeat…
Lecture 10 - June 12
- Now switching from tree-based data structures to array-based ones
- Hashing: take in a key, convert it into an integer
- This allows us to index into an array with the integer
- If two keys have the same hash, we have a collision
- Always will happen: infinite keys, finite array length
- Hash table runtime average is close to $\Theta (1)$
- Hash tables don’t allow us to order well unlike trees
- We expect the number of key-value pairs to be much less than the possible domain
- Ideal hash function is fast and avoids excessive collisions
- Often the case that the client implements some hash function, then program mods by the array length
- In java, the client will override the
hashCode
method
- In java, the client will override the
- Use prime number as a table size
- Real life data tends to follow a pattern, primes don’t follow them
- Try to use the most identifiable fields into the hash function
- Trade-off between how many fields and amount of collisions
- We use educated guesses
- If the key space is an int, using int and modulo is intuitive
- Even if every value has a distinct key, still collisions because mod
- Number one step is to rely on expertise of others
- Separate chaining:
- All keys that map to the same table location are added to a list
- Worst case runtime for
find
is $\Theta (n)$- Not worth optimizing if your hash function isn’t terrible
- Load factor is the average number of elements per bucket
- Want to keep load factor constant, prevent it from being too large
- Typically if lambda is greater than one, increase the table size
delete
is the same asfind
intuitively
Lecture 11 - July 17
- Open Addressing: attempt to not store chains
- Resolving collisions by trying a sequence of other positions in the table
- Linar Probing: when inserting, if index
x
is full, then tryx+1
- Repeat as necessary…
- We are “probing” the array “linarlly”
- Probing:
i
th probe into array:(h(key) + f(i)) % TableSize
- Open addressing is much worse as load factor increses
- To find using probing, we follow probe sequence until we find the key or empty spot
- Delete: use lazy deletion, replace value with a flag
- Primary clustering
- Linar probing tends to produce clusters of values
- This leads to runtime issues
- Quadratic probing:
i
th probe into array:(h(key) + i**2) % TableSize
- Possible to have infinite cycles because of mod
- If TableSize is prime and the load factor is less than 0.5, quadratic probing will find empty slot in maximum TableSize / 2 probe
- Secondary clutering happens when the two keys hash to the same index, probing doesn’t help
- Double hashing:
f(i) = i*g(key)
- Idea is that two hashes won’t collide much
- Ensure
g(key) != 0
otherwise infinite loop - Can still lead to infinite cycles
- Important to pick good functions with guarantees
- When we resize the array we need to rehash everything
- Good to resize separate chaining when load factor approaches one, open addressing half
- Good to make size twice as big, but that won’t be prime!
- Keep a hardcoded list of primes at good intervals
- If it continues to grow, say screw it and double (I think)
- View slides or Effective Java for tips on how to hash different data types
Lecture 12
- Sorting goals:
- Stable: preserve original ordering in case of ties
- In-Place: only use a constant amount of auxiliary storage
- Fast: $O(n log n)$ and/or good constant factors
- Simple algorithms: $O(n^2)$
- Insertion sort
- Selection sort
- Bubble sort
- Fancier algorithms: $O(n log n)$
- Heap sort
- Merge sort
- Quick sort (avg)
- Specialized algorithms: $O(n)$
- Bucket sort
- Radix sort
- Insertion sort:
- Maintain a sorted subarray, insert the unsorted elements into it
- Stable!
- In-place!
- Slow :(
Lecture 12 - July 19
- Selection sort:
- Maintain a sorted subarray, find the smallest remaining element and append it to sorted
- Stable!
- In-place!
- $O(n^2)$, but good constant factors
- Bubble sort: (it sucks!)
- $O(n^2)$ and bad constant factors
- Never use
- Heap sort:
- Put all elements into a heap, then remove the elements
- Not stable
- Can be in-place by treating the inital array as a heap, then inserting into back of array
- $O(n log n)$, but bad constant factors
- AVL sort:
- We pretend this doesn’t exist!
- Not in-place, worse constant factors
- Heap sort is strictly better
- Merge sort: divide and conquer!
- Sort left and right halves of elements recursivly, then combine into one list
- Stable (always pick from left array first)
- Not in-place: $O(n)$
- $O(n log n)$, bad constant factors because of copying
- Quick sort:
- Make sure to use the 332 quicksort
- Algorithm:
- Pick a pivot element, hopefully the median element
- Divide the array into two pieces, less than and greater than pivot
- Recursivly sort the two pieces
- Combine the sorted pieces
- Picking a good pivot:
- Pick first or last element: fast to pick but likely worst-case
- Pick random element: good, but randomness is expensive
- Median of 3: pick median of first, middle, and last element, overall good
- How to split into two pieces: Hoare partitioning
- Swap pivot with
arr[lo]
(move it out of the way) - Use two pointers
l
andr
starting atlo+1
andhi-1
and move inwards - Put pivot back in middle: swap with
arr[l]
- Swap pivot with
- Not stable
- In-place!
- $O(n log n)$ average case, worst constant factors
- In practice, way, way better
Lecture 13 - July 21
- Comparison sorting: CUTOFF strategy
- Use comparison sort until the array is small enough, then use a sort with good constants
- All comparison sorting algorithms have a lower bound of $O(n log n)$
- Bucket sort:
- Find the
min
andmax
value, make an auxillary array of that range - Iterate through the array and count how many of each number
- Copy the auxillary array into the original array
- Stable!
- Not in-place
- Possibly very large memory usage
- Good speed $O(n + k)$ and constants
- Non-integers: make each bucket a linked list of items
- Find the
- Radix sort:
- For each digit (starting from LSD), implement bucket sort and repeat
- As it is stable, this allows for ordering
- Stable
- Not in-place
- Very fast, $O(d(n+k))$ and good constant factors
Lecture 14 - July 24
- A graph is a mathematical representation of a set of objects connected by links
- Verticies and edges!
- Terminology:
- Undirected graphs: edges have no direction, two-way, facebook friends
- Degree of a vertex: number of edges containing that vertex (undirected)
- Directed graphs: edges have a direction, pointing, instagram follows
- In-Degree of a vertex: the number of in-bound edges
- Out-Degree of a vertex: the number of outbound edges
- Weighted graphs: each edge has a weight or cost, represting relationships
- Walk: a sequence of adjacent vertices
- Path: a walk that doesn’t repeat a vertex
- Cycle: a walk that doesn’t repeat a vertex except the first and last vertex
- Path length: number of edges in a path
- Path cost: the sum of the weights in the path
- Undirected graph connectivity: an undirected graph is connected if there exists a path between each pair of vertices
- Undirected graph completeness: an undirected graph is complete if there is an edge between each pair of vertices
- Directed strong connectivity: a directed graph is strongly connected if there is a path between each pair of vertices
- Directed weakly connectivity: a directed graph is weakly connected if there exists a path between each pair of vertices ignoring edge directions
- Directed graph completeness: a directed graph is complete if for all pairs of vertices there is an edge in each direction
- Self-edges: we pretend they don’t exist… (in 332)
- Trees are an undirected, acyclic, and connected graph
- Trees are rooted graphs where we think of edges as directed
- Directed Acyclic Graphs (DAGs):
- Directed graph with no cycles
- Maximum number of edges:
- Undirected: $0 \leq \mid E \mid < \mid V \mid ^2$
- Directed: $0 \leq \mid E \mid \leq \mid V \mid ^2$
- So: $\mid E \mid \in O(\mid V \mid ^2)$
- Sparse graph: $\mid E \mid \in \Theta (\mid V \mid)$
- Dense graph: $\mid E \mid \in \Theta (\mid V \mid ^2)$
- Graphs: the data structure
- Common operations:
- Is $(v, u)$ an edge?
- “What are the neighbors of $v$?
- Two standards:
- Adjacency Matrix
- Adjacency List
- Common operations:
- Adjacency Matrix: a 2D array of booleans representing the presence of an edge
- Properties:
- Get a vertex’s out-bound edges: $O(\mid V \mid)$
- Get a vertex’s in-bound edges: $O(\mid V \mid)$
- Decide if edge exists: $O(1)$
- Insert an edge: $O(1)$
- Delete an edge: $O(1)$
- Space requirements: $O(\mid V \mid ^2)$
- Adding vertices is not really possible
- Better for dense graphs (you’re already using the memory and complexity)
- Can be weighted by using non-booleans as type
- Then you must define “not an edge”
- Properties:
- Adjacency List: an array of vertices each with a linked list of edges
- Properties:
- Get a vertex’s out-bound edges: $O(d)$
- Get a vertex’s in-bound edges: $O(\mid V \mid + \mid E \mid )$
- Decide if edge exists: $O(d)$
- Insert an edge: $O(1)$
- Delete an edge: $O(d)$
- Space requirements: $O(\mid V \mid + \mid E \mid )$
- Better for sparse graphs (more operations depend on the number of elements)
- Properties:
- Processing a node: “doing something” at a node
- Visiting a node: exploring it
Lecture 15 - July 26
- Basic idea: follow nodes and mark them to prevent cycles
- Use slide code for the algorithms
- DFS uses a stack, BFS uses a queue (that’s the main difference)
- Iterative DFS:
- Use a stack, we process after popping each node
- Recursive DFS:
- The “stack” is the program’s recursive stack
- BFS:
- Similar algorithm structure, just use a queue
- Comparisons:
- DFS
- Typically uses much less memory
- Applications: topological sorting, cycle detection
- BFS
- Typically uses more memory
- Applications: shortest paths
- Iterative Deep DFS (IDDFS)
- Use DFS with increasing depth limits
- Good memory and finds the shortest path
- DFS
- To get the final path, we include the predecessor node in the mark
- To get path, we then backtrack
- Often implemented with an array
- Shortest paths for weighted graphs: we often assume no negative weights
- Dijkstra’s Algorithm:
- Initially, start node has cost 0 and all other nodes have infinite cost
- At each step:
- Pick closesst unvisited vertex
v
- Add it to the set of visited vertices
- Update distances for nodes with edges from
v
- Pick closesst unvisited vertex
Lecture 16 - July 28
- Dijkstra’s is a greedy algorithm
- At each step, it does what seems best at that step
- It explores every node, so it must find globablly optimal path
- Unoptimal runtime is: $O(\mid V \mid ^2 + \mid E \mid )$
- Can make more optimal by using a minHeap: $O(\mid V \mid log \mid V \mid + \mid E \mid log \mid V \mid )$
- $O( \mid V \mid log \mid V \mid )$ for a sparse graph
- Parallelism
- We typically have worked with sequential programming: one thing happens after another
- Removing this assumption means many new things to think about:
- How can we divide work between different threads and synchronize?
- How can we parallelize algorithms?
- How can we support concurrent access with datastructures?
- This means we should be able to do multiple things once in one program
- Parallelism: using extra resources to solve a problem faster
- Concurrency: correctly and efficiently manage access to shared resources
- Threads can communicate by writing to shared locations
- Need to make sure that it doesn’t write in bad way!
- Message-passing: threads communicate by sending and receiving messages
- Wait for one thread to coordinate all of the worker threads
- Java threading basics:
- We will use
java.lang.Thread
to start off with, then use a better library later - Define and instantiate a subclass
C
of java.lang.Thread, overriding
run` - Call the objects
start
methodrun
method runs in the current thread
- We will use
- Fork-Join Framework:
- To start parallelism, we must use
ForkJoinPool.invoke(RECURSIVE_ACTION)
- This calls the object’s
compute
field - Each program should only have one
ForkJoinPool
, usecommonPool
to get static
- This calls the object’s
- Subclass
RecursiveAction
to usefork
,compute
, andjoin
compute
is the code we want to runfork
starts a new thread runningcompute
join
waits for the thread to finish- Better for code which don’t return anything
- Subclasss
RecursiveTask
if you do want to return values
- To start parallelism, we must use
Lecture 17 - July 31
- The fork-join pattern is quite common for solving many different types of problems
- All that changes is the base-case and way we combine results
- These are called reductions
- It is required that the operation is associative
- A parallel map is applying an operation on each input element independently
- E.g. multiplying each element of an array by two
- Better to subclass
RecursiveAction
as nothing is returned
- To use divide-and-conquer parallelism, we need to be able to dive the problem up efficently
- We can still use maps on sequential data to speedup the process, think HF DS maps
- Analyzing Fork-Join algorithms:
- $T_P$ is the time a program takes to run if there are $P$ processors available
- $T_1$ is the time it takes to run if there is one processor, note: not sequential algorithm
- This is called the work, the total run time of all parts of the algorithm
- $T_{\infty}$ is the span, how long it takes to run on an infinite amount of processors
- Also called: critical path length or computational depth
- We can describe a parallel program execution as a DAG:
- Nodes are constant-time pieces of work the program performs
- The number of nodes adds up to $T_1$
- Edges represent computational dependencies, what we must complete before moving onto another node
- $T_{\infty}$ is the length of the longest path in the DAG
- Nodes are constant-time pieces of work the program performs
- A parallel reduction is two balanced trees on top of each other
- This allows $T_1$ and $T_{\infty}$ to become simple graph properties
- $T_1$ is $O(2n)$, because two trees
- $T_{\infty}$ is $O(log n)$, because that’s the height of the trees
- The speedup on $P$ processors is: $\frac{T_P}{T_1}$
- Note that this is often dishonest, as a true sequential algorithm might be faster
- Perfect speedup is when $\frac{T_P}{T_1} = P$, and is rare
- Perfect linear speedup is the case where doubling $P$ halves the running time
- $\frac{T_1}{T_{\infty}}$ is the parallelism of an algorithm
- Parallel reduction: $\frac{n}{logn}$, so there is an exponential speedup
- The ForkJoin Framework has the expected time bound: $T_P$ is $O(\frac{T_1}{P} + T_{\infty}$
- There is some randomness involved
- This works as long as each thread is expected to be about the same amount of work
- Also, all threads are expected to do a small but not tiny amount of work
- This is approx 5000 (within an order of magnitude) computational steps
- Amdahl’s Law:
- $\frac{T_1}{T_P} = \frac{1}{S + \frac{1 - S}{P}}$
- As $P$ goes to $\infty$, $\frac{T_1}{T_{\infty}} = \frac{1}{S}$
- Takeaways from Amdahl’s Law:
- This means that scaling computational resources leads to non-linear speedups
- We cannot always rely on extra compute for speedups when there are chunks of sequential code
- We can always find new algorithms work with the Law
- We can still use parallelism to solve bigger problems, but not faster
- This happens when some parallel parts grow at a faster rate then the sequential parts
Lecture 18 - August 2
- Parallel-Prefix Sum:
- We run this in two parts: first create a tree where each node is the sum of a portion of the array
- The leaves are the sums of the one-element leaves (we would use sequential cutoff)
- We run through the tree, passing along as an argument the sum of the values to the left of the node
- This allows the “sequential” problem to have in $O(logn)$ span
- This pattern is good for problems where you want to do operation to the left of
i
- We run this in two parts: first create a tree where each node is the sum of a portion of the array
- Pack: produce an array with the same order where all elements satisfy some condition
- Perform a parallel map to produce a bit (binary) vector of elements that satisfy a condition
- Perform parallel prefix sum to get the number of indicies to the left which satisfy the condition
- Run a parallel map for each index to move the index value from the input map to the new output map
- Use
bitsum
to get the output indicief - Use
bitvector
to check if the value satisfies the condition
- Use
- Parallel Quicksort:
- Parallelize sorting both partitions
- Parallelize partitioning using an auxillary array
- Parallel Mergesort:
- We can parallelize recursivly sorting the halves easilly, but we need more for exponential speedup
- To do so, we need to parallelize the merge operation:
- Determine the median element of the larger array: $O(1)$
- Binary search to find the position of the median in the smaller array: $O(logn)$
- Recursivly merge the left half of the large array with the left portions of the small array
- Recursivly merge the right half in the same way
Lecture 19 - August 4
- Concurrent programming is about controlling access to shared resources
- In concurrent programming, threads are largely doing their own thing
- Operations are interleaved, happening before, after, or at the same time as other threads
- We must enforce mutual exclusion on memory which might change
- We do this by “hanging a sign” telling other threads to wait
- Checking there is no sign and then hanging the sign must be one operation
- The work done while the sign is hanging is a critical section
- We typically do this using synchronization primitives within a language
- A mutual-exclusion lock (lock or mutex) is an abstract datatype for concurrent programming
new
creates a new lock which is “not held”acquire
takes a lock and blocks it until it is currently “not held”, then sets it to “held” and returnsrelease
takes a lock and sets it to “not held”
- It is important to ensure that exceptions do not prevent a lock from ever getting released
- Locks which allow a thread re-acquire a lock it holds are called reentrant locks
- We use the
synchronized
keyword in Java to evaluate, acquire, and release a lock- This seems somewhat similar to the Python
with
statement
- This seems somewhat similar to the Python
- As any object can be a lock in Java, we can just use the instance (
this
) as the lock - If the entire method should be
synchronized
, we can use that as a method keyword instead
Lecture 20 - August 7
- A race condition is a bug in a program that makes program correctness dependent on the order that threads execute
- There are two kinds of data races:
- When one thread reads and another writes at the same time
- When both threads attempt to write at the same time
- Even if we think that it wouldn’t cause bad behavior, never have race conditions
- When running programs, there are often hidden intermediate states
- This is like times where the rep inv isn’t true
- If a program reads during this intermediate state, things get weird
- These are critical sections
- One reason why data races are not allowed is because of compiler optimizations
- The Grand Comprimise:
- The programmer promises not to write data races
- The compiler will make optimizations seem as if there was a global interweaving
- We can also declare fields as
volitile
, making accesses not count as data races- They are good when there is only one shared field
- Concurrency Programming Guidelines
- Conceptually split memory into three parts: all memory should be at least one of the following
- Memory that is thread-local
- It is typically better to copy data and have more thread-local memory
- Memory that is immutable (after initialization)
- Make new objects instead of updating fields
- Memory that is synchronized, where locks are needed
- Memory that is thread-local
- Approaches to synchronization:
- Avoid data races
- Use consistant locking (and document it)
- Use more course-grained locking and only be more specific if performance is hurt
- Keep critical sections small and do not perform expensive operations in them
- Think of what needs to be atomic, then determine a locking strategy
- Always rely on the standard libraries
- Deadlock: threads being blocked forever
- This happens where there is a cycle of waiting
- We can define an ordering for which locks are allocated
Lecture 21 - August 9
- Topological Sort
- Given a DAG, output all vertices in order such that no parent after children
- Think of class pre-reqs
- This can be used in backpropagation algorithms or other dependency graphs
- There can be multiple legal permutations per DAG
- Given a DAG, output all vertices in order such that no parent after children
- Simple Topo-sort algorithm
- Find all the in-degrees for each vertex
- These can be stored in a data structure
- While not all vertices are output:
- Choose a vertex $v$ labed with in-degree of 0
- Output $v$ and remove it from the graph (or add to an outside set)
- For each vertex $w$ adjacent to $v$, decrement in-degree of $w$
- Find all the in-degrees for each vertex
- By using a queue, we can lower the time complexity to $O(V + E)$
Lecture 22 - August 11
- Minimum spanning trees: the minimal cost graph such that all nodes are connected
- Undirected graph
- Two main algorithms to find the MST:
- Prim’s algorithm: similar to Dijkstra’s
- Randomly pick a vertex, then use Dijkstra’s from there
- Pick the vertices which are the lowest cost
- The random pick of vertex leads to different possible MSTs (if they exists)
- Kruskals’ algorithm: completely different
- We make clusters of sub-MSTs, then they end up unifying
- Prim’s algorithm: similar to Dijkstra’s
- Prim’s:
- Pick random element
- Djikstra’s connecting unconnected nodes to connected nodes
- Create MST by connecting edges with their preds
- Kruskals’:
- Sort all edges by weight
- Add edges in order if they connect nodes to MSTs they don’t already connect to
- Disjoint Set ADT:
Union(x, y)
: combines the two sets which containx
andy
- Constant time
Find(x)
: returns the name of the set containingx
- Amortized constant time
Lecture 23 - August 14
- A decision problem: a problem that takes some input and returns true or false
- Halting problem: takes in some code and an input, returns if it terminates
- Defintion of $P$ in $P = NP$, a problem is $P$ if it satisfies:
- It’s a decision problem
- There exists a polynomial time algorithm to solve it
- Euler Circuit Problem:
- Input: Undirected graph $G$
- Output: True iff there is a path in $G$ that visits each edge exactly once
- Polynomial time solution: true iff
- Degree of every vertex is even
- Connected graph
- Definition of $NP$, non-deterministic polynomial: a problem is in $NP$ iff:
- It’s a decision problem
- For every input where the output is true, there exists some quick way to verify output is true
- Every problem in $P$ is also $NP$, i.e. $P \subset NP$
- Is $NP \subset P$? Is $P = NP$?
- We don’t know…
- Most people believe that $P \neq NP$
- Hamiltonian Circuit Problem:
- Input A graph $G$
- Output: True iff there exists a cycle in $G$ that visits every vertex exactly once
- Much more challenging than Euler Circut Problem
- $NP$-Complete
- $NP$-Complete: they are the hardest problems in $NP$
- A problem is $NP$-complete if:
- It’s $NP$ itself
- Every problem in $NP$ is polynomial time reducible to it
- This means that if any $NP$-complete problem is in $NP$, then $P = NP$
- A problem is $NP$-complete if:
- Reduction:
- Consider two decision problems $A$ and $B$
- $A$ reduces to $B$ if we can solve $A$ using a solution to problem $B$
- Notation: $A \leq B$
- We say that $A$ reduces to $B$ in polynomial time if we can solve $A$ using a polynomial number of calls to $B$
- If $B \in P$ and $A \leq B$ then $A \in P$
- The fact that there are thousands of $NP$-complete problems and we haven’t been able to find one that is in $P$, thus people generally believe that $P \neq NP$
- $NP$-Hard: $NP$-complete problems which aren’t in $NP$
- Example problem: $N \times N$ chessboard, verify a move is optimal
- What to do when faced with a probelm that you think is hard?
- Confirm that it is actually hard
- Take an $NP$-complete problem, take your problem, show that the $NP$-complete problem reduces to your problem
- If so, stop trying to find a general solution to your problem lol
- Instead, possibly use an approximation algorithm (within a constant factor of true)
- Possibly used a randomized algorithm, repeat using the randomized algorithm
- Consider special cases for your problem
- Confirm that it is actually hard
Lecture 24 - August 16
- $NP$-Hard: every $NP$ problem can be reduced to it
- $NP$-Complete: $NP$-hard and $NP$
- If an $NP$-hard problem is in $NP$, then $P = NP$
- 3-Colorable:
- Input: undirected graph $G$
- Output: true iff you can color each vertex of the graph such that no two adjacent verticies have the same color
- $NP$-Complete
- 2-Colorable: a related problem
- Not $NP$-Complete
- All verticies that are at distance $i+1$ from the start must be colored differently from those at distance $i$
- Proof is to color it, return true if it works, else false
- 2-Colorable reduction to 3-Colorable
- Add a dummy vertex to each other node - this takes up one of the three colors
- Thus, 3-Colorable on that graph will test if it is 2-Colorable
- 3-SAT:
- Terminology:
- Literal: a boolean variable (or its negation)
- Clause: a series of literals, OR’d together
- Input: a series of clauses AND’d together, each clause has at most 3 literals
- Output: true iff there is a setting of boolean variables such that the expression is true
- $NP$-Complete
- The general problem of satisfiability is reduciable to 3-SAT
- Terminology:
- Vertex Cover:
- Input: undirected graph $G$, integer $k$
- Output: true iff there is a set of verticies of size $k$ such that every edge has at least one vertex in the set
- $NP$-Complete
- “The set of vertices covers all edges”
- Special cases:
- If the graph is a tree, there is a linear time solution
- If the graph is 2-Colorable, there is a polynomial time solution
- Approximation algorithm for finding minimum vertex cover
- At most $2x$ larger than optimal
- While there are edges in $G$:
- Pick an arbitrary one
- Put both nodes in the vertex cover and delete nodes/edges connecting to those nodes