Skip to main content Link Menu Expand (external link) Document Search Copy Copied

CSE 446


Lecture 1 - Sept 27

  • ML algorithms attempt to learn decision rules, vs hardcoding them
  • We reviewed basic prob
    • IID: independent, identically distributed
  • Maximum likelihood estimation (MLE): argmax of a log, helps with derivation and scaling
  • MLE Gaussian:
    • MLE of mean does not depend on var
    • MLE of var depends on the real mean

Lecture 2 - Oct 2

  • With normal assumptions, as the number of examples grows to infinity the params approach optimal
  • MLE is a pipeline, we need a model for it
  • Q: learn more about biased and unbiased estimator
  • Somewhat model-agnostic
  • AB testing: experiments are constant
  • Customer segmentation - find the clusters of customer groups
  • Data exploration - understand the latent dimensions of the DS
  • Prediction, both classifcation and regression
  • Linear regression!!!
    • Collect training pairs of data
    • Error is represented by loss…?
    • Take the probability that the data is fitted (Gaussian PDF)
    • Take log-likelihood, set equal to zero
    • $\hat{W}\text{MLE} = (\sum{i=0}^n x_i x_i^t)^{-1} \sum_{i = 0}^n x_i y_i$
      • We can (and do) use the matrix form as well
      • $\text{arg min} (y - XW)^T (y - XW)$
      • $\text{arg min} \vert \vert y - XW \vert \vert^2_2
    • It is the same as least squares estimation
      • This is somewhat because of the errors being modeled by a Gaussian
      • We can also model with Poisson, leads to absolute value error
      • Let’s check this out later