We will cover probability, combinatorics, and some applications
Counting is helpful because:
It helps with algo analysis
It is a building block for probability
A set’s cardinality (size) is the number of distinct elements in it
Sum Rule: if we want to choose an element between two sets with no overlapping elements, we have the sum of the cardinalities number of options
Product Rule: when we have a sequential process of choosing elements, the number of possibilities is the sum of the number of options at each step
This is equal to the size of the Cartesian Product of the sets
Note that is works independent of which are the elements being chosen from, as long as the number of options is always the same at a step
What is the size of the power set of $S$?
$2^{\mid S \mid}$
Like the product rule, as each element can be in or out of a subset
Lecture 2 - March 29
When parsing HW problems, look out for “sequence” and “distinct”
Remember $0! = 1$
K Permutation:
The number of $k$ element sequences of distinct symbols from a universe of $n$ symbols is: $P(n, k) = n (n - 1) \dots (n - k + 1) = \frac{n!}{(n - k)!}$
Said as “P n K”, “n permute k”, or “n pick k”
Permutation vs subsets: does order matter?
K Combination:
The number of $k$ element subsets from a set of $n$ symbols is: $C(n, k) = \frac{P(n, k)}{k!} = \frac{n!}{k!(n - k)!}$
Said as “n choose k” typically
Also written as: $\binom{n}{k}$
We can think of this as finding the number where ordering matters, then dividing by the number of possible orders for each subset
We can go back and forth from combinations to permutations by dividing and multiplying by the number of possible orders within a subset (typically $\mid S \mid$)
Path counting is really just combinations
Overcounting is cool! (We just have to correct for it)
e.g. anagrams of “SEATTLE”, we find all permutations, then divide by the product of the factorials of the number of each element
divide by $2! \cdot 2!$
Complementary Counting:
We count all the complements of a set (everything that wouldn’t be valid), then subtract that from the cardinality of the universe!
We have $xy$’s and $\binom{n}{i}$ determines how many there are
Useful when we need to plug in specific numbers for x and y
Principle of Inclusion-Exclusion: when we are trying to get the size of a set which is the OR of multiple conditions, we can take the size of the union of the sets which satisfy at least one condition
Sum rule for non-disjoint sets: $\mid A \cup B \mid = \mid A \mid + \mid B \mid - \mid A \cap B \mid$
Sum rule for three non-disjoint sets follows similar logic, subtracting the intersection of each pair of sets, then adding the intersection of all sets
Typically only use for two or three conditions (maybe four)
Lecture 4 - April 3
Pigeonhole Principle
If there are more items then spots, we know that one spot has more than one item
Strong Pigeonhole: If we have $n$ pigeons and $k$ pigeonholes, there is at least one pigeonhole with $\frac{k}{n}$ pigeons, rounded up
Placing dividers (stars and bars):
If we need to find the number of different groups comprised of different types of elements, use dividers
For n size groups of k types: $\binom{n+k-1}{k-1}$
Can think about it as a binary string permutation problem
Lecture 5 - April 5
Probability is a way of quantifying our uncertainty
Sample space, $\Omega$, is all the possible outcomes of an experiment
Single coin flip: $\Omega = \lbrace H, T \rbrace$
An event, $E \subseteq \Omega$, is a subset of all possible outcomes
A probability, $\mathbb{P}: \Omega \rightarrow [0, 1]$, is a function that maps an element of $\Omega$ to it’s likelihood of occurring
Other notation: $Pr[\omega]$ or $P(\omega)$
All probabilities must be between 0 and 1 and the sum of the probability for each element in the sample space must sum to 1
Probability of an event should be the sum of the probabilities of the elements in the event
A probability space is a pair of sample space and probability function
Uniform Probability Space: all events have the same likelihood
Events are “mutually exclusive” if they cannot happen simultaneously
Axioms:
Non-negativity: $\mathbb{P}(x) \geq 0$ for all $x$
More generally: $\sum_{\forall i}\mathbb{P}(E \mid A_i) \mathbb{P}(A_i)$
The probability of an event happening is equal to the sum of the probability of the event happening given another event happening, multiplied by the probability of the other event
A partition of the sample space is a family of subsets where each partition is distinct and they combine to be the entire sample space
Humans are very bad at understanding very large or small numbers - past a certain amount we ignore magnitudes
It is good to think of tests as an “update” to our priors, not a revelation of truth
When we condition on an event, we still have probability spaces: $B$ and probability measures: $\mathbb{P}(\omega \mid B)$
Do not condition on multiple events, only the intersection of them
Lecture 8 - April 12
(Statistical) independence is when the probabilities of two things don’t depend on each other:
$\mathbb{P}(A \cap B) = \mathbb{P}(A) \cdot \mathbb{P}(B)$
We can find this from: $\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} \rightarrow \mathbb{P}(A \cap B) = \mathbb{P}(A \mid B) \cdot \mathbb{P}(B)$
Conditional independence is independence of two events given another event
Lecture 9 - April 14
Medical terms for tests:
$\mathbb{P}(D)$ is “prevalence”
$\mathbb{P}(T \mid D)$ is “sensitivity”
$\mathbb{P}(\bar{T} \mid \bar{D})$ is “specificity”
ONE WEIRD TRICK:
Think of these problems with a large population (where at least one for each chance)
Intuition Trick: Bayes’ Factor:
When you test positive, multiply prior by the Bayes’ Factor: $\frac{\text{sensitivity}}{\text{false positive rate}} = \frac{1 - FNR}{FPR}$
Also called the “likelihood ratio”
It is an estimate of how much I should update the prior by
Better when the prior is quite low
When there are overwhelming differences between groups, response error can drown out small groups
Lecture 10 - April 17
We often implicitly define the sample space:
This commonly occurs when the size of the sample space is infinite
When working with sample spaces with infinite size:
We can use infinite sums of probabilities (and closed forms)
We can use complement events
Random Variables:
It is any function that has domain $\Omega$ and outputs a real number
$X(\omega)$ is a summary of $\omega$
It doesn’t change the problem, but it can simplify it!
One sample space can have many random variables
We always use capital letters as random variables
We commonly use lowercase variables as the values the random variable could take on
Random variables are a function
We do not use typical function notation, instead something like: $X = 2$
The support (the range) is the set of values $X$ can take on
The event the random variable takes on a value: $\lbrace \omega : X(\omega) = x \rbrace$
The probability of that event is: $\mathbb{P}(X = x)$
The function that gives us $\mathbb{P}(X = x)$ is the Probability Mass Function, or PMF
This is written as: $p_X(x)$ or $f_X(x)$
Think of the PMF as a piecewise function where values outside the support are zero because it must take in all real numbers as an input
The Cumulative Distribution Function (CDF) gives the probability $X \leq x$
Written as: $F_X(x) = \mathbb{P}(X \leq x)$
Lecture 11 - April 19
CDF will always be defined over all real numbers (we must support all of them)
We will often use floor functions when we only want to “support” integers
Typically has zero below the support and one above the support
The “expectation” of a random variable $X$ is: $\mathbb{E}[X] = \sum_k k \cdot
\mathbb{P}(X = k)$
The “weighted average” of values $X$ could take on
Weighted by the probability we actually see the value
Think about it as the expectation of a drive in football:
$X$ is the possible scores $0, 2, 3, 6, 7$
We multiply each score by the likelihood that it happens, sum all of
these for our expected score
Not the most common outcome
The expectation of the sum of two random variables is equal to the sum of the expectations for each variable
Pairwise independece: for each pair in the set, they are independent
Mutual independence: for each subset of events, the probability of the
intersection of all of them is equal to the product of all individual
probabilities of the events
Stronger than pairwise independence
Two random variables $X$ and $Y$ are independent if for all $k$ and $l$ (in
the supports), $\mathbb{P}(X = k, Y = l) = \mathbb{P}(X = k) \cdot
\mathbb{P}(Y = l)$
Note that commas are often used instead of $\cap$ for random variables
Pairwise independence is the intuition
Mutual independence of random variables: $X_1, X_2, \dots, X_n$ are mutually
independent if for all $x_1, x_2, \dots, x_n$ $\mathbb{P}(X_1 = x_1, X_2 =
x_2, \dots, X_n = x_n) = \mathbb{P}(X_1 = x_1) \cdot \mathbb{P}(X_2 = x_2)
\cdot \dots \cdot \mathbb{P}(X_n = x_n)$
We don’t need to check all subsets, but do need to check all possible
values in the ranges
While many equations might want all values (outside) the range of a random
variable to be checked or included, because the probability of it is zero they
often don’t need to be
Lecture 12 - April 21
Linearity of expectation: for any two random variables $X$ and $Y$,
$\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$
This extends to more than two random variables
Simple summation proof
$X$ and $Y$ do not need to be independent
Constants are also fine (works like algebra intuition!)
Indicator random variables are 1 if the event occurs and zero if it does not
$\mathbf{1}[A]$
The expectation of the indicator variable is the probability the event
occurs
How to compute complicated expectations:
Decompose the random variable into the sum of simple random variables
Apply linearity of expecation
Compute the simple expectations
Lecture 13 - April 24
Variance is another one-number summary of a random variable
We typically square values instead of using absolute values (think norms)