This report aims at illustrating some of the work I did during my research internship at the Information and Network Dynamics Lab of EPFL. The work consisted for the first part of the analysis of high-dimensional political data using dimensionality reduction techniques such as PCA and t-SNE and for the second part of predictive models for vote results.
The Swiss Confederation is a semi-direct democracy that regularly gives its citizens the opportunity to vote on a variety of issues on every political level. Several times a year, Swiss citizens vote in referenda that are mandatory for constitutional amendments and optional for contestations of new or revised laws. As part of the open government initiative, Swiss authorities publish data about the federal votes, which offers a unique opportunity for gaining a better understanding of the political landscape in Switzerland. Figure 1 illustrates the vote results of the referendum on EEA membership (Arrêté fédéral sur l'espace économique européen). In the linear color scale, red represents the rejection and green the acceptance of the proposal.
Figure 1: Vote results of the Swiss Referendum on membership in the European Economic Area (EEA)
Another interesting approach in data analysis is dimensionality reduction. In its essence, it is a transformation of high-dimensional data to a lower-dimensional vector space that ideally corresponds to the intrinsic dimensionality of the oberserved data and can reproduce most of its variability. The most common dimensionality reduction technique, Principal Component Analysis (PCA), constructs a lower-dimensional linear subspace by using an orthogonal transformation to convert a set of random variables to a set of values of linearly independent variables, called principal components, onto which the variance retained under projection is maximal.
In the context of Switzerland's political landscape, we use PCA to identify certain ideological patterns. The outcomes by municipality of the federal votes from June 1981 to May 2019 form our underlying dataset, excluding the results of municipalities that have been merged during the period we are considering. The dataset can then be represented as a high-dimensional vector space where each dimension corresponds to a federal vote and each municipality to a data point in the space. We then perform a PCA using Singular Value Decomposition (more info here) to extract the first two principal components that can be interpreted as concepts capturing most of the variablity of the original complete set of federal votes. Figure 2 shows a scatter plot of the data points that were projected onto the principal components, clustered by language. It can actually be shown that the axes refer to traditional ideological seperations (left against right and liberal against conservative). 1 In the visualisation in Figure 3, we use a color gradient to assign municipalities to colours that correspond to their position in the two-dimensional space. Municipalities that are ideologically similar share a similar colour. The cultural difference between French-speaking municipalities and German-speaking ones is very apparent on the map and is commonly referred to as the "Röstigraben".
Figure 2: Scatter plot of PCA
Figure 3: Voting patterns in Swiss municipalities
T-distributed stochastic neighbor embedding (t-SNE) is another dimensionality reduction technique that, in contrast to PCA, has the ability to capture non-linear relationships in the voting data. Simply explained, it works by minimising the divergence between the distribution that measures pairwise similarities of the input objects and the distribution that measures pairwise similarities of the corresponding points in the lower-dimensional embedding. In line with the principal component analysis, the scatter plot in Figure 3 illustrates the clustering in voting behavior by the language spoken in each municipality.
Figure 4: Scatter plot of t-SNE
Etter V, Herzen J, Grossglauser M, Thiran P. 2014 Mining Democracy.
Etter V, Khan ME, Grossglauser M, Thiran P. 2016 Online Collaborative Prediction of Regional Vote Results.