Adan Häfliger - Portfolio

(Please reload the page if math formatting is not working)

Deep Semantic Mapping Between Images:

In this project I tried to map similar element of two images based on the feature maps of a pretrained Convolutional Neural Network. The gist of the method consists of:

  1. For two images $I_1, I_2$ we feed each image to a pretrained ConvNet (like vgg19)
  2. From the ConvNet we extract two feature maps (tensors) $L_1, L_2$ respectively of dimension $(H_1$, $W_1$, $C)$ and $(H_2$, $W_2$, $C)$ where $H_1$ is the height and $W_1$ is the width of the feature map $L_1$. Note that $C$ is the same for both images and we take feature maps after the ReLU activation ensuring that all values are positive.
  3. We flatten $L_1$ and $L_2$ to turn them into matrices of dimensions  $(H_1 * W_1$, $C)$ and $(H_2 * W_2$, $C)$
  4. We can now multiply $L_1$ with $L_2^T$ to compute the "Affinity Matrix" $A$ (see here). $A \in (H_1 * W_1, H_2 * W_2)$  

Here $A$ is a matrix that encodes, according to $C$ channels, how each values at a certain position are correlated across all channels. This can be used directly as an indicator to know which pixel are similar (since the feature map maps to a region of pixels, it's just down-scaled).

To illustrate this "pixel correlation", imagine we compute $A$ for two images. By applying a softmax to every rows of $A$ we compute which value of $L_1$ is the most similar to all other values of $L_2$ which can be mapped back to pixels in the down-scaled image. If we have a class label for one of the image, we can use it to perform segmentation on the other image. We can also do the same with multiple $A$ to get better matches.

Segmenting one image of a cat using a variable number of segmented images of cats by computing Affinity matrices with the original image and using the most similar pixel's class label

Now we can use $A$ to actually compute a mapping between both images in the following way:

  1. Apply Non Negative Matrix Factorization to A (it is fully positive so we can do it) giving two matrices: $ LR^t \approx A$  where $L \in (H_1 * W_1, K)$ and $R \in (H_2 * W_2, K)$  
  2. Reshape $L$ and $R$ to get $K$ soft mask pairs that maps semantically similar elements in both images

NMF supposedly leads to a "part based" representation which in our case will extract semantically similar elements from $A$! You can see how we are able to map the face of a cat to the face of another cat without mapping the dog, in an unsupervised manner:

A nice property is that it works for almost any image types since the model was pretrained on a dataset with a thousand classes (ImageNet):

Finally I was interested to use this mapping to "localize" Neural Style Transfer. I re-implemented the algorithm and changed the loss term to take into account which part of the image should be transferred using my mask pairs as mapping. Here is a test case where both masks were the square on the left.

The results were interesting but not perfect (here the loss also includes a photo realistic term from https://arxiv.org/pdf/1703.07511.pdf):

(The PhD student that supervised me during this project pushed this further and figured out that we didn't need to compute the affinity matrix at all but simply applying NMF to the feature maps of many images could lead to great semantic separation here)

Slides, github, report.

Deep Learning on Japanese Macaques:

In this project I collected many images and videos of Japanese Monkey thanks to a collaboration with Prof. Koda from the Primate Research Institute Kyoto University. Then I used my data to train classifiers for different tasks and a model to localize individuals automatically. To localize specimens, I fine-tuned a pretrained  Mask R-CNN model to detect and segment Japanese Monkeys using the my own data and annotations.

You can see my Mask R-CNN based model in action (I used code from here, each frame is processed independently hence the colors changes):

Key points included figuring out how to deal with occlusions and holes when creating the polygon annotations as well as which dataset to pre-train the model on. Turns out that coco outperformed imagenet, probably due to a lot of animal classes. Next step would be to use the temporal dimension as well.

I also visualized how an Age and Sex classifier made their prediction using the GradCam algorithm. I also checked the effect of different dataset. The goal is basically to know which part is important for those classification:

Turns out that with my data, the classifiers aren't perfect, but the age model trained with dataset2 (DS2 on the above figure) is promising as it mostly highlights the face and the ear of the monkey for example.

Finally I made an attempt at using a lot of the unlabeled data to fine-tune the model in a semi-supervised way, but it wasn't effective at all, it was harder than I thought. The key things I get out of this project is the difficulty of creating and managing a dataset. Slides.

Monkey annotator app:

For the above project, I needed to get videos annotated by a specialist with the name of the specimen being filmed. For this reason, I built a web app based on Django, PostgresSQL and ReactJS to get someone to annotate the videos for me online. I wanted to explore the single page application concept and learn about recent technologies. Building this app teached me how to deploy an nginx web-server, how to get Django to work as my backend through an API using the Django Rest framework, how to use React.js to display my data after I queried my backend and how to store and retrieve videos stored in an AWS S3 bucket. Although simple, this project got me to practice deployment and development.

The interface looked like this:

League of legends Forecaster:

In this project for the ADA course, we crawled publicly available League of Legends matches information using the Riot API. For every match we generated player features based on 20 past matches of each player. Then, we fed those features to a neural network predicting the winning chance of each team based on the players' features and the role they played. The distribution of features used to tune the model were also shown on another page. Below are screenshots taken from the app. github

NHL-Inspector:

In this data visualization project we used D3.js to develop an interactive website displaying statistics about the National Hockey League. The main concepts were to use the official colors of the teams, have two interchangeable modes: one to show an overview of the competition and one with more detailed statistics. Also a timeline where the user can change the date and an animation would update the visualization.

Screencast, Process Book, Website

Visual Defect Inspection:

This time I collected a dataset and trained a Convolutional Neural Network to classify surfaces of a certain product into either a "can be sold" or "can't be sold" class. Here since images sizes were quite large (too big for my GPU at the time) and the surfaces were circles, I rotated and cut the images to augment my dataset (also removed the non-uniform background which was a bit tricky because of the varying lighting conditions):

I can't say too much about this project.

Digit Reader:

I made a small program to extract digits from video filming a seven segment displays for a specific machine. An image is worth a thousand words in this case:

We classify the digit in each green square using a CNN. The system could be made without machine learning, but this was kind of a warm up project. The algorithm to find and extract the digits (green squares) is made with traditional thresholding techniques and OpenCV. The red color of the display was a big help.

Game Development:

CockpitVR:

In this project we developed a VR game for the HTC Vive using Unity3D. We introduced interesting controls using the Vive controller. The player controls a vehicle (hope was a giant robot) and has to reach the end of the level without falling off. The controls uses the pitch - yaw and roll of the two controllers to rotate and move the vehicle. Also by pressing the trigger, the player can control the arm and shoot a projectile, wheeew!

We also added an asymmetric multiplayer mode where one player used a tablet to interfere with the player using the HTC Vive:

minimap.png

The game was minimalist and mostly a proof of concept. Report, github, please take a look at the video (yes, I lost weight and hair).

Stray Souls in Wonderland:

We made this game in one weekend for a Game Jam. Our entry can be found here. And here is a play-through of someone trying the game (beware of rude language).

Aerial Cascade:

I started a mobile game developed with UE4 where you control a flying spaceship using touch controls, more soon.

Hexagon Tactics:

I ported Theliquidfire's Tactical RPG for hexagonal tiles, more soon.

Androfoot:

Game developed for android which was awarded top 3 project (we won free chocolate!) of the software engineering class back in 2014. I made the user-interaction code and made the game multiplayer through local wifi. Github.

Computer Graphics:

Knife Game:

In the Advanced Computer Graphics course we implemented a ray tracer. Then, as a final project, we were tasked by generating an image for the theme "Flirting with Disaster". For my image, I added textures, bump mapping, a Torrance sparrow BRDF (for the metals) but I couldn't get subsurface scattering for the hand to work:

Terrain Generation:

github, slides.