Algorithms

Algorithms

PCA - A technique for dimensionality reduction that performs a linear mapping of the data to a lower-dimensional space maximizing the variance of the data in the low-dimensional representation. We created a wrapper that around the scikit-learn implementation that offers two usage modes: Strategy TF (pca_s1) computes PCA independently for each timestep. Strategy G (pca_s4) works by grouping all timesteps and computing PCA once. The terminology was borrowed from the dt-SNE paper.

t-SNE - This method converts the nD distances between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional mD embedding and the high-dimensional nD data. This usually results in good neighborhood preservation. Our implementation is based off the scikit-learn implementation and the perplexity is set as default (30).

UMAP - This recent DR technique has a mathematical foundation on Riemannian geometry and algebraic topology. According to recent studies [EMK ∗ 19, BMH ∗ 19], UMAP offers high quality projections with lower computational cost and better global structure preservation than t-SNE, being thus an interesting competitor in the DR arena. We consider in our evaluation both the global (G-UMAP) and per-timeframe (TF-UMAP) variants of this technique.

dt-SNE - This method extends t-SNE to deal with dynamic data by adding a stability term (lambda) to the cost function.

Autoencoders - In the context of dimensionality reduction, we take a (usually) hourglass-shaped neural network and train it to reconstruct the input. After training, the middle layer acts as a compact (latent) representation of the original data. The middle layer has to have a number of neurons equivalent to the dimensionality of the space we want to project out data into. We tested four different “types” of autoencoders:

AEDense autoencodersFully connected layers.
C2AEConvolutional autoencodersUsed only on the image-based datasets.
VAEVariational autoencoders with fully connected layerstrying to get better internal representations by avoiding overfitting.
C2VAEVariational autoencoders with convolutional layerspossibly better of both worlds regarding input reconstruction ability.

We experiment with different optimizers, architectures, training routines, etc. To decode the names to understand the number of layers, neurons per layer, epochs of training, etc.

The notebooks/script associated with each run can be found at https://github.com/EduardoVernier/dynamic-projections/tree/master/Models. Information about replication and tests with new datasets can be found in the Replication section.