Excerpt from "Chemical Data Visualization and Analysis with Incremental Generative Topographic Mapping: Big Data Challenge", https://pubs.acs.org/doi/full/10.1021/ci500575y?src=recsys
However, both PCA and SOM have some clear drawbacks.
PCA, as a linear method of dimensionality reduction, may process poorly
nonlinear data. In some cases, a small number of principal components
explains only a small part of data variance. As noted by Bengio et al.(16)
“the expressive power of linear features is very limited: they cannot
be stacked to form deeper, more abstract representations since the
composition of linear operations yields another linear operation”. This
hampers drastically the ability of PCA to reveal disentangled factors
responsible for data variation, especially in the case of Big Data.
Another problem comes from the low information richness of PCA plots,
resulting from the tendency to concentrate most of the data points in a
certain region in the form of a Gaussian cloud, while leaving the rest
of the plot poorly populated.(17)
This behavior could be explained with the help of the probabilistic
interpretation of PCA, which casts it as a factor analysis based on a
single multivariate normal distribution function.(18)
SOM
is a nonlinear dimensionality reduction method. Due to its
topology-preserving character, SOM provides more information-rich plots
than PCA. However, SOM suffers of its purely empirical nature and lacks
solid statistical foundations.(19)
As a result, the output information is truncated to the assignment of a
molecule into its residence node, and the indication of how well it
fits into this node. SOM tools, by default, would not report whether
other nodes might have hosted a molecule as well, at only slightly
higher quantization errors (mean dissimilarity between each molecule and
the code vectors of its residence neuron). Since SOM does not define
any probability distribution function, any powerful tool of statistical
analysis and inference cannot be applied. The training algorithm for SOM
does not optimize an objective function(20)
and, therefore, does not guarantee convergence. The choice of SOM
parameters (learning rate and width of neighborhood functions) proceeds
essentially in an empirical manner, without any statistical
justification.
The above is the key issue prompting Bishop et al.(21)
to suggest generative topographic mapping (GTM) as a probabilistic
extension of SOM. GTM overcomes most of the limitations of SOMs without
introducing disadvantages. GTM is a probabilistic topology-preserving
dimensionality reduction method,(21) which projects the D-dimensional
chemical space onto a two-dimensional space. It has been shown that GTM
could be used not only as a chemical data visualization tool(17, 22, 23) but also to build classification(17, 22) and regression(24) structure–property models.