Interactively Exploring Patterns of
Dialect Syntax

AVML - Advances in Visual Methods for Linguistics
24-26 September 2014, Tübingen

Thomas Mayer / thomas.mayer@uni-marburg.de

Main goals

  • Present some new and interesting ways of visualizing language data
    • interaction
    • different color spaces
    • cartograms
  • Stress the importance of visualizations for educational purposes

Overview

Syntax Hessischer Dialekte (SyHD)

  • The SyHD project (http://www.syhd.info, project leader Jürg Fleischer) aims to systematically investigate the dialect syntax of German exemplified with data from one regional state (Hesse).
  • For this purpose, the project gathers data from approx. 160 locations within Hesse (various NORM informants for each location).
  • The data presented here is taken from the project (thanks to Stephanie Leser and Jürg Fleischer)

Hessen Dialekterkenner

  • Basic idea: test user's input against the data from all informants for all locations
  • Take the best matching location
  • Similar to Adrian Leemann's Dialäkt Äpp for Swiss German dialects (but with syntactic features rather than phonological/morphological)

Structure of the Dialekterkenner

Structure of the Dialekterkenner

Three main widgets:

  1. Current question widget: displays the current question and provides buttons to click
  2. All questions widget: gives an overview over all questions and the answers that are already given
  3. Map widget: shows the map of Hesse

Structure of the data

  • 65 different features
  • 850 informants
  • 160 locations

Too many categories for a decision-tree modeling. Thus we decided to pick the most interesting features manually.

Decision tree

Created with the Scikit learn package for Python

Implementation

Bostock, Michael, Vadim Ogievetsky and Jeffrey Heer. 2011. D3: Data-driven documents. IEEE Transactions on Visualization & Computer Graphics (Proc. InfoVis), 17(12), 2301–2309.

Dougenik, James A, Nicholas R. Chrisman and Duane R. Niemeyer. 1985. An algorithm to construct continuous area cartograms. Professional Geographer, 37(1), 75-81.

First version with Google Maps

Second version with D3.js

Intermezzo

Color coding and the usefulness of interaction in visualization

Color coding

  • A two-dimensional input can be mapped to the three-dimensional L*a*b color space using the formula below

function cl2pix(c,l){
    var TAU = 6.2831853
    var L = l*0.61 + 0.09;
    var angle = TAU/6.0 - c*TAU;
    var r = l*0.311 + 0.125
    var a = Math.sin(angle)*r;
    var b = Math.cos(angle)*r;
    return [L,a,b];
};
                        
The code was adapted from the GNU C code by David Dalrymple ( http://davidad.net/colorviz/, accessed on January 25, 2014) and translated into JavaScript.

Color coding (cont'd)

    • The actual HTML color code is generated with the function d3.lab from the D3 library, which takes the three values for [L,a,b] as input.
    • The main reason for choosing the L*a*b* color space is a smoother transition between different color hues without any visible boundaries.

HSV vs L*a*b color space

Color scale

MDS representation with different color spaces

URL: http://bl.ocks.org/tmayer/9225521 (Data from Jelena Prokić)

MDS representation with different color spaces

URL: http://bl.ocks.org/tmayer/9354953 (Data from Jelena Prokić)

Some distortion techniques

Fisheye distortion

Fisheye distortion

Cartesian distortion

Cartesian distortion

Cartograms

  • Cartograms obtain more space for regions with a high point density by distorting regions such that their size corresponds to a statistical feature (Bak et al. 2009).
Bak, Peter, Matthias Schaefer, Andreas Stoffel, Daniel Keim and Itzhak Omer. 2009. Density equalizing distortion of large geographic point sets. Journal of Cartographic and Geographic Information Science (CaGIS), 36(3):237–250.

Cartogram of Hesse

Hessen Dialekterkenner
http://th-mayer.de/syhd/

Cartogram of the world's languages

Conclusions

  • Interactive visualization methods help to detect patterns in the data
  • Cartograms might be helpful in showing language data on a map
  • L*a*b color space shows smoother transitions between color hues
  • Visualizations are useful for presenting data

Thank you for your attention!

thomas.mayer@uni-marburg.de