Providing a basis for query design for a systematic review

The past few weeks I’ve been working on the query design for my systematic review on algorithmic accountability. I encountered two problems:

  1. ‘Algorithmic accountability’ is a relatively new term, whereas the problems, themes, concepts which are connected with it, are ofcourse touched upon in earlier work in various disciplines;
  2. I wanted to find a systematic way to approach the query design, which also accounts for the diversity of the fields and terms used to discuss matters related to algorithmic accountability.

I eventually settled on the following approach, using computational methods. Out of the material that was identified as relevant prior to the review, only the articles (27) which included keywords were selected for this exploration.From these articles the collocations of the keywords were extracted.


Collocation mapping
In charting the keywords’ collocations, first the individual keywords were related to the other keywords of the article. For instance if the keywords of an article are ‘big data’, ‘algorithms’, and ‘accountability’, then the relations would be mapped as follows:

big data          –> algorithms
big data          –> accountability
algorithms      –> big data
algorithms      –> accountability
accountability –> big data
accountability –> algorithms


After the relations were prepared, these collocations were mapped in Gephi.

The nodes with the most incoming/outgoing connections (degree >= 35) were then filtered out.

This value of mapping these keywords, is that it gives some perspective on what terms are used in what fields (or, more accurately: with what other kinds of terms, thereby hinting at the field). Four (very rough) clusters could be detected by modularity: one revolving around governance (e.g. government, governance, accountability – though there are also smaller nodes refering to, for instance, journalism). The second cluster deals mainly with legal aspects (e.g. GDPR, right to explanation), the third deals with more general data-related issues (e.g. regulation, automation, surveillance). The last is predominantly dealing with ethics. The interesting thing about this last cluster is that aside from the ethics node, all other nodes in this cluster are from 1 paper (this paper had a lot of keywords, thereby constituting its own cluster) – which also hints at the limitations of this method on its own.

While the mapping provides some insight into which terms are used in what kinds of debates, it doesn’t really point as of yet to what combinations might be fruitful for query design. Thus, subsequently, the edges table was exported from Gephi, and the edge weight was used as a measure to determine the strength of the relations between keywords. The double relations (a –> b / b –> a) were resolved and their edge weight was added together.


Now, I have a systematic basis for deciding upon my query, for I can demonstrate which terms seem to be more strongly connected. Which doesn’t mean that likely it’s still going to be hard, but atleast I have some more grounding!

Summer 2018

This summer has so loaded with great events and activities, it’s been hard to keep track. Here’s an overview of some of my academic summer’s highlights.

Data visualization in society seminar

I’m really thrilled to be writing a chapter together with Daniela van Geenen for Helen Kennedy and Martin Engebretsen’s book project Data visualization in Society. Part of the project was a great seminar at the University of Agder’s study centre Metochi on Lesbos. This was a great way to get feedback on one’s work, and to streamline the book in its entirety. I feel this is going to be a really great book for practitioners, students and academics.

Gephi Field Notes plugin, developments and talk at KCL

Another thing I’m quite passionate about is the development of the Gephi Field Notes plugin. Together with the Digital Humanities Lab and the Gephi developers we’re trying to finetune the plugin. Moreover, we got to give a talk at King’s College London about the plugin. Really great to be meeting so many people who – like we – are really exicted about this project.

Arduino Workshop at the Datafied Society

Karin van Es organized an Arduino Workshop by Creative Coding Utrecht at the Datafied Society, and it was absolutely great. It was so much fun to solder again, and to tweak with the little wires, LEDs and resistors. Going to be doing a lot more if this I hope. In any case, we want to develop some Arduino setups for our students to use for data gathering on data walks.

Doing research

Of course, summers are mainly for research. Thus, I’ve been doing a *lot* of writing these past weeks. See also my Research projects to get an idea of what I’m currently working on. Spoiler: it’s a lot.

Meeting my new colleagues

With so many great things going on, you’d almost forget that I’m starting my Ph.D. project in September! I was really happy to meet my new colleagues during a Ph.D. meetup at the end of June. We got to do some fun activities and get to know one another (and the city) a little better.

Upcoming UDS Summer School

Before starting my Ph.D. I have the pleasure to teach in the Utrecht Data School Summer School. We’re really excited that we’re opening up the summer school to external students as well, and we’re really happy with the turnout. I’m looking forward to teaching a lot of (future) students a crash course in digital methods.