Week 1
- Object detection: boundary of the object, what is the object, where is the object.
- Semantic Segmentation: labels different sections.
- Linear Classifier: draw a line in a space to classify different types of data.
- Overfitting: the model matches the training set too closely, resulting in the model failing to predict correctly on new data.
- Image Classification challenges:
- resolution of image
- variables
- It is common to have more training data than testing data.
- Class Imbalance: certain class only has limited amount of data.
- K nearest neighbor classifier: find closest resemblance.
- It is never used due to slowness, overfitting
- Hyperparameter: parameters that are fixed during training.
k
in K nearest neighbor classifier is a hyperparameter.- k is usually a odd number to avoid ties when it comes to voting.
- Linear Decision boundary: a straight line, plane, or hyperplane that separates different classes in a feature space.
Week 2
- Linear Regression: a line that separates different types of data.
- Under mild condition, linear regression has an optimal solution.
- Mean Squared Error (MSE): Average of the squared differences between observed and predicted values.
- Good for linear regression.
- Supervised Learning: train model with training set and maps input to output while minimizing errors.
- How to find the minimum with reference to
w
?- Differential MSE with w = 0
- Polynomial Regression: a curve line that separates different data.
- Machine Learn Assumption: training set is drawn from the same probability distribution as test data.
- Example: train a model based on the heights of 6 - 12 years olds, but the test data are the heights of 18 - 24 years olds. The model will not generalize well.
- Ultimate Goal: has as small errors as possible.
- Regularization is a technique used in machine learning to prevent overfitting by introducing additional constraints or penalties to the model’s loss function.
- Maximum Likelihood Estimation: find the parameter that maximizes the likelihood of the observed data under a given probabilistic model.
- MLE estimates often converge to the expected value of the true parameter.
- MLE is found by taking the derivative of the log-likelihood and solving for zero.
- MLE is asymptotically unbiased but may be biased in small samples.
- MLE has the lowest variance possible asymptotically (efficient estimator).
- MLE is equivalent to minimizing KL divergence(minimize between 2 distributions).
- Binary Classification: predicting between two classes.
- Cross-Entropy Loss: Measures the difference between predicted and actual labels. It ensures that high-confidence incorrect predictions get large gradients (forcing corrections).
- Squash Function (Sigmoid): Converts raw scores to probabilities (0 to 1).
- Divide each output by the sum of all outputs. What happens if the sum is negative? Exponential.
Week 3
- SoftMax: transforms outputs into probabilities, ensures probabilities sum is 1.
- ReLU(Rectified Linear Unit): ReLU(x) = max(x, 0). It removes any negative values and keep positive values.
- Goal: use neurual network to linear separate samples.
- The more hidden layers you have, a much larger set of problems you can approximate.
- Don’t put sigmoid functions in the middent of the hidden layers, but it can be used on output layers.
- Loss functions:
- MSE: regression
- BCE(Binary Cross Entropy): binary classification
- Cross Entropy: multi-class labels
- Several approach for training neurual networks:
- Batch Descent
- Stochastic gradient descent: one sample at a time(epoch one iteration), converge faster.
- Mini Batch: each epoch is limited to B samples.
- Computational Graph
Week 4
- Forward Pass: input data moves through data in a neurual network.
- Backward Pass (Backpropagation): computing gradients using the chain rule.
- Weight Updates: adjusting weights based on the gradients using optimization techniques like Stochastic Gradient Descent (SGD).
- Activation Functions: Non-linearity in hidden layers (e.g., ReLU, sigmoid).
- Batch Processing: concepts of minibatch to speed up the process.
Week 5
- CNN
- Cross-correlation v.s Convolution
Week 6
- Max pooling
- Stride
- Conv -> ReLU -> Pooling
- Regularization: L2 penalty
- Global average pooling can replace flattening.
- Backprop for CNNs
- Local derivative for max pooling.
- Local derivative for convolution layer. Similiar front convolutional operation, compute the downstream gradient and apply the filter downstream.
Week 7
- PyTorch
- CNN
Week 8
- Data preprocessing
- Batch Normalization is a technique that improves speed(especially if training is deep) and stability for training neurual networks.
- What is Normalization?
- Re-centering and re-scaling layer’s input.
- Parameter initialization
- Regularization
- L2 Regularization(weight decay)
- Dropout
- Hyperparameter search
- Grid search: train entire dataset with small epoch
- Random search
Week 9
- Image Analysis
- Object Localization
- Bounding box
- Height
- Model
- CNN -> flatten -> softmax classification
-
-> f.c linear activation
- Add cross-entropy loss of classification and mse loss * lambda(to adjust the loss)
- multitask learning
- Transfer Learning
- Limitation
- What if there are mutliple objects?
- R-CNN region based CNN
- Fast R-CNN
- Faster R-CNN
Week 11