CSE 446

Lecture 1 - Sept 27

ML algorithms attempt to learn decision rules, vs hardcoding them
We reviewed basic prob
- IID: independent, identically distributed
Maximum likelihood estimation (MLE): argmax of a log, helps with derivation and scaling
MLE Gaussian:
- MLE of mean does not depend on var
- MLE of var depends on the real mean

With normal assumptions, as the number of examples grows to infinity the params approach optimal
MLE is a pipeline, we need a model for it
Q: learn more about biased and unbiased estimator
Somewhat model-agnostic
AB testing: experiments are constant
Customer segmentation - find the clusters of customer groups
Data exploration - understand the latent dimensions of the DS
Prediction, both classifcation and regression
Linear regression!!!
- Collect training pairs of data
- Error is represented by loss…?
- Take the probability that the data is fitted (Gaussian PDF)
- Take log-likelihood, set equal to zero
- $\hat{W}\text{MLE} = (\sum{i=0}^n x_i x_i^t)^{-1} \sum_{i = 0}^n x_i y_i$
  - We can (and do) use the matrix form as well
  - $\text{arg min} (y - XW)^T (y - XW)$
  - $\text{arg min} \vert \vert y - XW \vert \vert^2_2
- It is the same as least squares estimation
  - This is somewhat because of the errors being modeled by a Gaussian
  - We can also model with Poisson, leads to absolute value error
  - Let’s check this out later