View on GitHub

Kiya-Shiota Laboratory

Rinko Seminar (2021)

Chapter 1: Basics

The following 5 questions are just to warm you up in programming with Python.

Q.1: FizzBuzz

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Q.2: Quick Sort

Implement a quick sort (divide-and-conquer) algorithm. You can read about it here.

Sample Input: [9, 8, 7, 5, 6, 3, 1, 2, 4]

Sample Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Q.3: Missing Element

You are given two arrays. One is a shuffled version of another one, but missing one element. Write a program to find a missing element.

Sample Input: [2, 3, 4, 5, 6, 7, 5, 8], [6, 8, 7, 4, 5, 2, 3]

Sample Output: 5

Q.4: Pair Sum

You are given an array. Write a program to output all possible pairs that sum to a specific value k.

Sample Input: [1, 3, 2, 2], k = 4

Sample Output: (1, 3) (2, 2)

Q.5: Multiplication Table

Write a program that outputs a multiplication table like the following picture.

Multiplication Table

The following 5 questions are for NumPy. If you have used MATLAB, you will be just fine. The question are extracted from numpy-100 and you can practise more if you have time.

Q.6

Create a checkerboard 8x8 matrix using the tile function.

Q.7

Normalize a 5x5 random matrix.

Q.8

Multiply a 5x3 matrix by a 3x2 matrix (real matrix product)

Q.9

Given a 1D array, negate all elements which are between 3 and 8, in place.

Q.10

Consider two random array A and B, check if they are equal.

Chapter 2: Data Processing

In this chapter, we will do the first 10 knocks of image processing. If you don’t understand, please read the original image processing knocks which are originally in Japanese.

Q.1: Channel Swapping

Change the channel order from RGB -> BGR.

Input	Output

Q.2: Grayscale

Convert a color image to a grayscale one. The linear formula is

Y = 0.2126 R + 0.7152 G + 0.0722 B

Input	Output

Q.3: Binarization

Binarize an image given the threshold is 128.

Input	Output

Q.4: Binarization of Otsu

This is an automatic thresholding algorithm by minimizing intra-class intensity variance or maximizing inter-class variance.

Input	Output

Q.5: HSV Conversion

RGB -> HSV and HSV -> RGV

In this case, invert the hue H (add 180) and display it as RGB and display the image.

Input	Output

Q.6: Discretization of Color

Quantize the image as follows.

val = {  32  (0 <= val < 63)
         96  (63 <= val < 127)
        160  (127 <= val < 191)
        224  (191 <= val < 256)

Input	Output

Q.7: Average Pooling

Perform an average pooling of 128x128 image by 8x8 kernel.

Input	Output

Q.8: Max Pooling

Perform a max pooling of 128x128 image by 8x8 kernel.

Input	Output

Q.9: Gaussian Filter

Implement the Gaussian filter (3 × 3, standard deviation 1.3) and remove the noise of a noisy image.

Input	Output

Q.10: Median Filter

Implement the median filter (3x3) and remove the noise of a noisy image.

Input	Output

Chapter 3: Neural Networks

Q.1: Linear Regression

Generate a synthetic dataset containing 1000 examples with addictive noise. Each sample consists of 2 features drawn from the standard normal distribution. The true parameter generating the dataset is \(w = [2, -3.4], b = 4.2\). Sample noise from normal distribution with mean 0 and standard deviation 0.01. Synthetic labels will be \(y = Xw^T + b + \epsilon\). Plot the features, we can clearly see the linear correlation between features and labels.
Build a model \(\hat{y} = Xw^T + b\). The model will learn \(w\) and \(b\).
Define a squared loss function.
Initialize the model paremeters with random values drawn by normal distribution with mean 0 and standard deviation 0.01 and set the bias to 0.
Use minibatch stochastic gradient descent to train the model.
Print the errors between true parameters and learnt parameters.

Q.2: Softmax Regression

We will use Fashion-MNIST dataset with a batch size of 256 and build a single layer neural network.
Initialize the model parameters with Gaussian noise (normal distribution with mean 0 and standard deviation 0.01) and set the bias to 0.
Define the softmax operation.
Define the model. The only difference from the previous question is to apply the softmax operation to the output because now we are dealing with a classification problem.
Define a cross-entropy loss.
Train the model with minibatch stochastic gradient descent and plot the learning curve.
Calculate the accuracy of the model.

Q.3: Multilayer Perceptrons

We will build a neural network with a hidden layer (256 hidden units) and non-linear activation functions for Fashion-MNIST dataset.
Initialize the model parameters with random values drawn from normal distribution.
Define a model with ReLU activation function.
We will use the same cross-entrop loss.
Train the model with minibatch stochastic gradient descent and plot the learning curve.
Calculate the accuracy of the model.

Q.4: Regularization

We do not want the model to memorize the training data. We will use generalization techniques to improve generalization of the model. Implement the following reqularization techniques by using a synthetic dataset or any dataset you like. Compare training with regularization and without regularization.

Dropout
Early Stopping
Weight decay (L-2 regularization)

Chapter 4: Convolutional Neural Networks

Q.1: 2D Convolution

Implement a 2D convolution layer. There are 2 parementers, the kernel and scalar bias. You can initialize the kernel with random values, bias can be zero.

Q.2: Edge detection

Apply the convolution layer you implemented for edge detection in images by using edge detection kernels.

Q.3: Padding

Either implement your 2D convolution to be able to do padding or use Pytorch Conv2D and experiment the convolution with padding.

Q.4: Stride

Either implement your 2D convolution with stride or use Pytorch Conv2D and experiment the convolution with stride.

Q.5: Pooling

Implement a 2D pooling layer that can handle both max and average pooling.

Q.6: LeNet

Implement a LeNet in PyTorch.
Train LeNet for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Q.7: AlexNet

Comparison of LeNet and AlexNet Reference

Implement AlexNet in PyTorch.
Train AlexNet for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Q.8: VGG

VGG implements the idea of using blocks. Reference

Implement a VGG-11 in PyTorch. You can reduce the number of channels in VGG since we are dealing with a small dataset.
Train VGG for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Q.9: NiN Blocks

Implement a NiN model in PyTorch. Reference
Train NiN for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Q.10: GoogleNet (Inception)

Implement Inception blocks(GoogleNet) in PyTorch.
Train GoogleNet for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Q.11: Batch Normalization

Implement Batch Normalization
Apply Batch Normalization in LeNet.
Train LeNet for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Q.12: ResNet

Implement a ResNet-18 in PyTorch.
Train ResNet for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Q.13: DenseNet

Implement a DenseNet in PyTorch.
Train DenseNet for Fashion-MNIST dataset.
Plot the learning curve.
Calculate the accuracy of the model.

Chapter 5: Recurrent Neural Networks

Q.1 Text Preprocessing

Download The Time Machine
Read the lines from the downloaded text.
Write a function to tokenize each line (both word and character level).
Create a vocabulary (dictionary) to map each string token into numerical index. (i.e., Count the unique tokens from the corpus and assign a numerical index according to its frequency)
You can preprocess the time machine dataset as you like if you don’t want to follow above steps.

Q.2 Reading Sequence Data

Consider both random sampling and sequential partitioning.
Prepare data loader for the time machine dataset.

Q.3 Character-level Language Model by RNN

Read the dataset.
Define an RNN model in PyTorch.
Train and Predict.
You can also implement an RNN from scratch.
You might want to plot perplexity vs epochs.

Q.4 Gated Recurrent Units (GRU)

Explain GRU (what problem it is trying to address)
Use the same time machine dataset.
Implement a GRU.
Compare the results with Q.3
You might want to plot perplexity vs epochs.

Q.5 Long Short-Term Memory (LSTM)

Explain LSTM.
Use the same time machine dataset.
Implement a LSTM.
Compare the results with Q.3 and Q.4.
You might want to plot perplexity vs epochs.

Chapter 6: Generative Models

Q.1 Autoencoders

Explain what an autoencoder is.
Impelement an autoencoder.
Train the autoencoder by using MNIST dataset.
Visualize the results.

Q.2 Variational Autoencoders (VAE)

Note that VAE and AE are different.
Explain the difference.
Implement a VAE.
Train the VAE by using MNIST dataset.
Visualize the results.

Q.3 Generative Adversarial Networks (GAN)

Explain GAN.
Implement a basic GAN without any convolution by using MNIST.
Visualize the results.

Q.4 Deep Convolutional GAN (DCGAN)

Main features of DCGAN are:

Use convolution without using any pooling layer.
Use batchnorm in both generator and discriminator.
Don’t use fully connected hidden layers.
Use ReLU activation in the generator of all layers except the output (which uses Tanh).
Use LeakyReLU activation in the discrimination of all layers except the output (which does not use an activation).
Implement a DCGAN.
Train DCGAN by using MNIST dataset.
Visualize the results.

Q.5 Wasserstein GAN (WGAN)

Reference Paper: WGAN

What is mode collapse?
Problem with BCE loss
Wasserstein loss
Conditions of Wasserstein’s critic and 1-Lipschitz continuity enforcement
Implement WGAN and train it on MNIST and CIFAR-10 datasets.

Q.6 Conditional GAN

Reference Paper: Conditional GAN

What is the difference between conditional and uncondionalt GAN?
How conditional GAN works? (You can refer to the paper)
Implement conditional GAN on MNIST and CIFAR-10 dataset (or use cat dataset).
What is controllable generation?
What is the difference between conditional and controllable generation?

In the following task, you need to train a GAN with CelebaA dataset. Then use that GAN for controllable generation as follows. Here is an example of controllable generation with a pre-trained classifier.

Implement a controllable generation method (classifier gradient).
- Train a unconditional GAN with CelebaA dataset
- Train a classifier for a certain feature (e.g., binary classifier with/without eyeglasses)
- Use that pretrained classifier to change the z vector (run gradient ascent).
What is z-space disentanglement?
Find out methods for z-space disentanglement.

Q.7 Cycle GAN

Reference Paper: Cycle GAN

What is unpaired image-to-image translation?
Overview of Cycle GAN (Two GANs)
Explain different losses (Cycle consistency loss, least square loss, identity loss)
Implement a Cycle GAN.
You can try any dataset from here.

Q.8 GAN Evaluation

Explain how to evaluate GANs.

Explore the following metrics for evaluating GANs.

Inception Score (IS): Ref
Fr'echet inception distance (FID) Ref
Kernel inception distance (KID) Ref

Explain how they works and evaluate your previously trained GAN with CIFAR-10 with the above metrics.

Chapter 7: Attention Mechanisms

Q.1 Attention Pooling

Explain what attention mechanism is.
Why attention mechanism is different from fully connected layer or pooling layer?
Consider a function \(y_i = 2 sin(x_i) + x_i^{0.8} + \epsilon\). Generate a synthetic dataset for 50 examples.
Implement average pooling.
Implement a non-parametric attention pooling (Nadaraya-Watson kernel regression).
Implement a parametric attention pooling.

Reference code and example

Q.2 Attention Scoring Functions

Explain additive attention scoring function with a toy example.
Explain scaled dot-product scoring function with a toy example.

Reference code and example

Q.3 Bahadanau Attention

To understand this attention mechanism, we need to consider a language translation problem (encoder-decoder architecture).

Implement an RNN with Bahadanau attention
Dataset : French-English

Reference code and example

Q.4 Multi-Head Attention

Why do we need multi-head attention instead of single attention pooling?
Implement a multi-head attention with a toy example.
Verify the output tensor shape.

Reference code and example

Q.5 Self Attention and Positional Encoding

Explain self attention with an example.
Explain positional encoding with an example.

Reference code and example

Q.6 Transformer

Explain overview of the transformer model.
Implement a transformer
Train it with the previous French-English dataset.

Reference code and example

Q.7 Vision Transformer

Reference Paper: ViT

Explain overview of ViT.
Implent a ViT.
Train it with CIFAR-10 without using pre-trained weights.

Table of Contents

Chapter 1: Basics

The following 5 questions are just to warm you up in programming with Python.

Q.1: FizzBuzz

Q.2: Quick Sort

Q.3: Missing Element

Q.4: Pair Sum

Q.5: Multiplication Table

Q.6

Q.7

Q.8

Q.9

Q.10

Chapter 2: Data Processing

Q.1: Channel Swapping

Q.2: Grayscale

Q.3: Binarization

Q.4: Binarization of Otsu

Q.5: HSV Conversion

Q.6: Discretization of Color

Q.7: Average Pooling

Q.8: Max Pooling

Q.9: Gaussian Filter

Q.10: Median Filter

Chapter 3: Neural Networks

Q.1: Linear Regression

Q.2: Softmax Regression

Q.3: Multilayer Perceptrons

Q.4: Regularization

Chapter 4: Convolutional Neural Networks

Q.1: 2D Convolution

Q.2: Edge detection

Q.3: Padding

Q.4: Stride

Q.5: Pooling

Q.6: LeNet

Q.7: AlexNet

Q.8: VGG

Q.9: NiN Blocks

Q.10: GoogleNet (Inception)

Q.11: Batch Normalization

Q.12: ResNet

Q.13: DenseNet

Chapter 5: Recurrent Neural Networks

Q.1 Text Preprocessing

Q.2 Reading Sequence Data

Q.3 Character-level Language Model by RNN

Q.4 Gated Recurrent Units (GRU)

Q.5 Long Short-Term Memory (LSTM)

Chapter 6: Generative Models

Q.1 Autoencoders

Q.2 Variational Autoencoders (VAE)

Q.3 Generative Adversarial Networks (GAN)

Q.4 Deep Convolutional GAN (DCGAN)

Q.5 Wasserstein GAN (WGAN)

Q.6 Conditional GAN

Q.7 Cycle GAN

Q.8 GAN Evaluation

Chapter 7: Attention Mechanisms

Q.1 Attention Pooling

Q.2 Attention Scoring Functions

Q.3 Bahadanau Attention

Q.4 Multi-Head Attention

Q.5 Self Attention and Positional Encoding

Q.6 Transformer

Q.7 Vision Transformer