Last week I had the chance to give a talk at Barcelona Unseminar in Bioinformatics, which is an initiative from a group of researchers in bioinformatics aiming to promote the unseminars culture in Barcelona as a tool to share our knowledge (and ignorance!), experiences and interests around biological research and open science.

dv_03
The topic of the talk was ‘Data Vizz – When science meets art’, and the aim of the unseminar was to give to researchers tools to communicate better their results and to improve how they can visualize data. For that reason we introduced concepts related to the fields of design, infographics, typography and of course, data visualization.

Last weekend of March 12-13th in Barcelona took place the event Procomuns, an encounter which aims to highlight the relevance of the commons-oriented approach of peer production and collaborative economy, while proposing public policies and providing technical guidelines to build software platforms for collaborative communities.

As part of the program there was a data contest were participants should develop visualizations which allow identification of new relations among data gathered around the Commons Collaborative Economy in the P2P value project.The data consist on a dataset containing more than 300 cases of Commons Based Peer Production, it can be found here.

I participated with a data visualization displaying a network connecting these cases, based on how similar the cases can be, considering that projects that share a concrete number of tags, then they can be connected each other. My proposal was basically a quick shot, and much of the effort was on the data curation phase. I am relatively happy with the data visualization, as I expected to detect communities of projects sharing specific subsets of tags, but the resulting network was a highly connected graph that serves more as a data exploration tools. Despite of that, I’ve got the first prize which was a happy surprise. Anyway, I am considering to dedicate more time on the data so I can get better results on the data visualization side.

winning-dataviz

You can also find the source code on Github.

I am a user of last.fm since 2005 and I’ve always tried to scrobble as much music as possible (34.711 songs scrobbled at present). Last.fm let’s you have an historical archive of your music listening habits, which is cool but I always missed to have an overview from the point of view of the artist. Is a musician being listened more than last year? Is an artist more listened after an album release? What happens when they are on tour? Could we visualize a change on a listener’s behavior due to specific events?

One effect I wanted to visualize is a likely increase of the artist’s playcounts after he dies. One can assume that the number of playcounts increase, but how long does this effect? Which artists get the greatest increase on their playcounts? So I built a data visualization in order to get some insight about the effects of an artist’s death:
last-fm

My latest developments which involve exploring multidimensional data are ending up with one of the dataviz techniques that I like most: small multiples. This is a concept introduced by Edward Tufte and is described as:
“Illustrations of postage-stamp size are indexed by category or a label, sequenced over time like the frames of a movie, or ordered by a quantitative variable not used in the single image itself.”

To summarize, it’s the use of the same basic graphic or chart to display difference slices of a data set.

Here below two samples of small multiples we are doing at Siris Academics. The first one is the display of several regions from Italy, quantified by multiples metrics. The nice thing here is that these metrics are grouped by different concepts, such as demography, economics, etc… so based on the shape you are able to detect if a region is performing well in an specific category or not.
snapshot_multiples_A4

I definitively had a lot of fun building this piece:
pulsar-plot

If you talk about Data Visualization and album covers in the music culture there is a cover that is one of if not the most representative of these two fields. We talk about the cover for Joy Division’s Unknown Pleasures album, who became an iconic image to represent the band.
Unknownpleasures

If, like me, you are a runner is quite likely you use any of the multiple apps to get track of your workouts. If you want to plot your activities all together is quite easy thanks this post on Flowing Data blog. A more recent post from Visual Cinnamon blog shows also how to add a map overlay, so I decided to do a quick try in order to plot my running tracks and proudly gaze at my humble workouts.

I have all my activity tracked in Strava, at least the last 3 years (for previous years I was using another service, which was closed suddenly without having the chance to retrieve my data, too bad). Anyway, Strava has a nice feature, a bulk activity export so all your data is available.

With my data ready to be used, I just added some slightly modifications based on the code in R from Visual Cinnamon post, as I wanted to differentiate workouts from a year to another, so here it is the code:

library(plotKML)
library(ggplot2)
library(ggmap)


#generate a list with available years from data
years = c(2012:2014)
files = vector(mode="list", length=length(years))
names(files) = years

#read files from each year's folder
for (i in years)
files[[i]]=dir(path=paste(as.character(i),"/",sep=""),pattern="\\.gpx")

#routes list to store all the tracks by year
routes = vector(mode="list", length=length(years))
names(routes) = years


#store all routes in a data frame, by year
index <- c()
latitude <- c()
longitude <- c()
for (y in years) {
	for (i in 1:length(files[[y]])) {
	     
	     route <- readGPX(files[[y]][i])
	     location <- route$tracks[[1]][[1]]
	     
	     index <- c(index, rep(i, dim(location)[1]))
	     latitude <- c(latitude, location$lat)
	     longitude <- c(longitude, location$lon)
	 }
	routes[[y]] <- data.frame(cbind(index, latitude, longitude))
}

At this point in time we can generate the plot for all the workouts, as shown below:

tracks20122015

I just want to plot workouts from my town, so I just center the map to that location (data also contains tracks from different races, but I have discarded them because I would need to zoom out the map to show a wider geographical area, and my interest is only on my local workouts):

#get map from my location
mymap <- qmap(location = "Sant Vicenç dels Horts", 
zoom = 13, color="bw", legend="none")

And finally add a layer with the paths, so we can plot together map and tracks, once per year:

colors = c("#ffff00", "#ff0000", "#00ffff")
for (i in 1:length(years)
  mymap + geom_path(aes(x = longitude, y = latitude, 
  group = factor(index)), colour=colors[i], 
  data = routes[[years[i]]], alpha=0.5)

tracks_year_2014

I try to be up-to-date on data section in digital newspapers, because of data driven journalism is one of my interests, being an exciting field which mixes diferent disciplines. Recently La Vanguardia newspaper has started a data analysis journalism section called VangData, which looks quite promising.

Today they have published a chart to show the average height on OECD countries, here a snapshot:
vangdata_20150410

Although the chart fulfills its purpose which is showing a rank of the countries based on its population’s height, here one of the principles in effective data visualization, the Data-Ink ratio, has definitely a room for improvement.

Pages: 1234
About me

Data Visualization · Interactive · HCI · Open Data · Data Science · Digital Humanities


More info here and here: