# CSE 446

## Lecture 1 - Sept 27

• ML algorithms attempt to learn decision rules, vs hardcoding them
• We reviewed basic prob
• IID: independent, identically distributed
• Maximum likelihood estimation (MLE): argmax of a log, helps with derivation and scaling
• MLE Gaussian:
• MLE of mean does not depend on var
• MLE of var depends on the real mean

## Lecture 2 - Oct 2

• With normal assumptions, as the number of examples grows to infinity the params approach optimal
• MLE is a pipeline, we need a model for it
• Somewhat model-agnostic
• AB testing: experiments are constant
• Customer segmentation - find the clusters of customer groups
• Data exploration - understand the latent dimensions of the DS
• Prediction, both classifcation and regression
• Linear regression!!!
• Collect training pairs of data
• Error is represented by loss…?
• Take the probability that the data is fitted (Gaussian PDF)
• Take log-likelihood, set equal to zero
• $\hat{W}\text{MLE} = (\sum{i=0}^n x_i x_i^t)^{-1} \sum_{i = 0}^n x_i y_i$
• We can (and do) use the matrix form as well
• $\text{arg min} (y - XW)^T (y - XW)$
• \$\text{arg min} \vert \vert y - XW \vert \vert^2_2
• It is the same as least squares estimation
• This is somewhat because of the errors being modeled by a Gaussian
• We can also model with Poisson, leads to absolute value error
• Let’s check this out later