Skip to main content
#
CSE 446

##
Lecture 1 - Sept 27

- ML algorithms attempt to
*learn* decision rules, vs hardcoding them
- We reviewed basic prob
- IID: independent, identically distributed

- Maximum likelihood estimation (MLE): argmax of a log, helps with derivation and scaling
- MLE Gaussian:
- MLE of mean does not depend on var
- MLE of var depends on the real mean

##
Lecture 2 - Oct 2

- With normal assumptions, as the number of examples grows to infinity the params approach optimal
- MLE is a pipeline, we need a model for it
- Q: learn more about biased and unbiased estimator
- Somewhat model-agnostic
- AB testing: experiments are constant
- Customer segmentation - find the clusters of customer groups
- Data exploration - understand the latent dimensions of the DS
- Prediction, both classifcation and regression
- Linear regression!!!
- Collect training pairs of data
- Error is represented by loss…?
- Take the probability that the data is fitted (Gaussian PDF)
- Take log-likelihood, set equal to zero
- $\hat{W}
*\text{MLE} = (\sum*{i=0}^n x_i x_i^t)^{-1} \sum_{i = 0}^n x_i y_i$
- We can (
**and do**) use the matrix form as well
- $\text{arg min} (y - XW)^T (y - XW)$
- $\text{arg min} \vert \vert y - XW \vert \vert^2_2

- It is the same as least squares estimation
- This is
*somewhat* because of the errors being modeled by a Gaussian
- We can also model with Poisson, leads to absolute value error
- Let’s check this out later