Overview
Official course description
Machine learning (ML) is about algorithms which are fed with (large quantities of) real-world data, and which return a compressed “model” of the data. An example is the “world model” of a robot: the input data are sensor data streams, from which the robot learns a model of its environment — needed, for instance, for navigation. Another example is a spoken language model: the input data are speech recordings, from which ML methods build a model of spoken English — useful, for instance, in automated speech recognition systems. There exist many formalisms in which such models can be cast, and an equally large diversity of learning algorithms. However, there is a relatively small number of fundamental challenges which are common to all of these formalisms and algorithms. The lecture introduces such fundamental concepts and illustrates them with a choice of elementary model formalisms (linear classifiers and regressors, radial basis function networks, clustering, neural networks, hidden Markov models). Furthermore, the lecture also provides a refresher of required mathematical material from probability theory and linear algebra.
Literature
Primary text:
Hastie, Tibshirani, Friedman: “The Elements of Statistical Learning” (Second Edition), Springer
Recommended reading:
Shalev-Shwartz, Ben-David: “Understanding Machine Learning: From Theory to Algorithms”, Cambridge University Press
Further useful references for the math background:
Linear algebra and probability reviews available at http://cs229.stanford.edu/syllabus.html
Grading
The grades for this lecture will be determined as follows:
- final exam (100 %)
There will be no other formal requirements.
Final exam
All rules, times, etc. are consolidated in the Final exam announcement.
Lecture style, tutorials, homeworks and further information
Online teaching
Online classes are carried out as follows:
- Video recordings of the class content that would have been normally presented in the lecture slot. The videos can be either watched via the embedded player or downloaded by clicking on the name of the video.
- Online quizzes that can be carried out and repeated at any time.
- The slides that were uploaded for each lecture anyway.
- Questions & Answer Video-Conferencing sessions
The Video conferencing sessions take place in Microsoft Teams in the respective course on
- Wednesdays, starting at 9:00
- Thursdays, starting at 17:30
The instructor will keep the meeting running for at least ten minutes. If no student shows up in these ten minutes, the meeting is stopped.
Online tutorials
- Offered via video conferencing in MS Teams in the “Team” of this lecture
- Weekly tutorial classes offered by TAs:
- Mondays, 15:45-17:00
- Wednesdays, 17:15-18:30
- Content:
- Repetition and discussion of lecture content
- Discussion of upcoming and graded homework
- No mandatory attendance.
→ attendance highly recommended in order to be successful
Homeworks
Flavour
- one assignment sheet per week, published on moodle
- contents of each assignment sheet:
- ∼3 tasks: theory (manually computing predictor, proving, …)
- ∼1 task: programming (implementing ML algorithms)
→ programming language C/C++
Submission
- weekly deadline: Friday, 12:00 (noon)
- submission format:
- theory: via moodle
- programming: via moodle
- submissions in groups of 1 – 3
→ depending on class size and homework participation
→ might be subject to adjustments
Grading (of non-mandatory homeworks)
- exercises graded with points by TAs
- The points that students receive for their homework will have no influence on the final grade, i.e. doing the exercises is not mandatory.
- However: Students that are not able to achieve at least 50% of the points from the exercises should expect that they have not got enough training in the content and therefore will most likely have issues in the final exam.
Code demos
- kNN regression
- kNN classification
- Linear regression (on 1d input)
- Linear regression (on 2d input)
- Bivariate Gaussian density
- Z score
- Training error
- Generalization error approximation by validation set approach
- Generalization error approximation by different methods for synthetic data
- Generalization error approximation by different methods for cancer data
- Bias – Variance tradeoff
- Linear vs. non-linear classification
- Linear vs. quadratic discriminant analysis
- kMeans clustering
- PCA compression
- PCA synthesis
- Regression using quadratic polynomial basis expansion (1D)
- Regression using quadratic polynomial basis expansion (2D)
- Regression using kernel model
- Gradient Decent methods applied in linear regression
- Neural Network regression in Keras
- Neural Network weight initialization
- Neural Network batch normalization
- Neural Network image classification with MLP and CNN URL
Lecture content
Content until March 12 (i.e. in-person teaching)
- Lecture slides of February 5
- Lecture slides of February 6
- Lecture slides of February 12
- Lecture slides of February 13
- Lecture slides of February 19
- Lecture slides of February 20
- Lecture slides of February 26
- Lecture slides of February 27
- Lecture slides of March 4 (repetition class)
- Lecture slides of March 5 (second part of repetition)
- Lecture slides of March 5
- Lecture slides of March 11
- Lecture slides of March 12
Material for March 18
- Lecture slides (part 1/3 and 2/3)
- Lecture slides (part 3/3)
- Video: Introduction to Bias vs. Variance
- Video: Proof of Bias-Variance decomposition
- Video: Bias-Variance decomposition for kNN regression
Material for March 19
- Lecture slides (part 1/6 and 2/6)
- Lecture slides (part 3,4,5,6 of 6)
- Video: Proof of Bias-Variance dec. for kNN regression
- Video: Demo for Bias-Variance Tradeoff in kNN regression
- Video: Introduction to estimation of prediction error
- Video: What is training error?
- Video: Training error by example
- Video: Why training error is still important
Material for March 25
- Lecture slides
- Video: What is generalization error?
- Video: Expected generalization and the strange T
- Video: Introduction to empirical error estimation
Material for March 26
- Lecture slides (part 1/4)
- Lecture slides (part 2,3,4 of 4)
- Video: More advanced generalization error estimators
- Video: Introduction to classification
- Video: Linear methods in classification
- Video: Classification by linear regression
Material for April 1
- Lecture slides
- Video: Introduction into Linear Discriminant Analysis
- Video: The theorem behind LDA
- Video: Proving the theorem behind LDA
- Video: The LDA algorithm
- Video: How to measure prediction error & further classification methods
Material for April 2
- Lecture slides
- Video: Introduction to unsupervised learning
- Video: Introduction to clustering
- Video: Towards efficient combinatorial clustering
- Video: Proof of the loss reformulation
- Video: The K-means clustering algorithm
- Video: Examples
Material for April 15
- Lecture slides
- Video: Introduction to PCA & Compression motivation
- Video: Compression motivation cont.
- Video: Compression example
- Video: Data predictor motivation
- Video: Data predictor motivation cont.
- Video: Data predictor – Simple example
Material for April 16
- Lecture slides (part 1,2,3/7)
- Lecture slides (part 4,5,6,7/7)
- Video: Data predictor – Digits example
- Video: How to compute PCA
- Video: PCA algorithms
- Video: Building more complex models by basis expansions
- Video: Least squares regression for basis expansion models
- Video: Quadratic polynomial model
- Video: General polynomial models
Material for April 22
- Lecture slides (part 1,2/5)
- Lecture slides (part 3,4,5/5)
- Video: Kernel-based models
- Video: Least squares regression for kernel-based models
- Video: Motivation for Ridge Regression
- Video: Theory of Ridge Regression
- Video: Kernel Ridge Regression
Material for April 23
- Lecture slides for April 23 File
- Video: Introduction to Neural Networks and the multilayer perceptron
- Video: Definition and graphical representation of the MLP
- Video: Activations and how to do regression with the MLP
- Video: How to do classification with the MLP
Material for April 29
- Lecture Slides
- Video: Some important remarks on MLPs
- Video:Our first “deep” network
- Video: Introduction to Gradient Decent methods
- Video: Understanding Gradient Decent File
Material for April 30
- Lecture slides
- Video: Stochastic Gradient Decent
- Video: Mini-batch Gradient Decent and some remarks
- Video: Example of using Gradient Decent for Linear Regression
Material for May 6
- Lecture slides
- Video: Introduction towards backpropagation
- Video: Forward propagation
- Video: A central statement on how to compute gradient entries
- Video: The backpropagation formula
Material for May 7
- Lecture slides (parts 1,2 / 5)
- Lecture slides (parts 3,4,5 / 5)
- Video: Proof of the backpropagation formula
- Video: The backpropagation algorithm
- Video: Training an FFNN
- Video: The vanishing / exploding gradients problem
- Video: The batch normalization layer (1)
Material for May 13
- Lecture slides
- Video: The batch normalization layer (2)
- Video: Overfitting in FFNN training
- Video: Motivation for CNNs
- Video: The convolutional layer
- Video: The pooling layer and CNN architectures
- Video: Image classification by MLP and CNN
Final exam discussion lecture (May 14)
The final lecture will be again a live lecture without lecture recording. It will take place in the original lecture slot.