360Giving has launched a visualization challenge to answer questions to help funders to maximize their impact on the charitable causes they support. Such questions intend to give a better understanding of the grantmaking sector. Here we look at one specific question, which tackles thematic trends:
Who has funded what over the years?
That question is open in the sense of what a theme is, since there isn't a standard categorization in the grantmaking sector. Despite the richness provided by the 360giving dataset regarding grantmaking sector, there is not a direct categorization of to what theme a grant belongs.
Discovering themes from the data itself
The 360giving dataset gives a fairly detailed information of a grant through several fields within the dataset, offering information about the grant, the funding organization, the recipient, dates, amount awarded, etc... The fields describing each grant provides a large corpus of text documents regarding endowments, therefore some Natural Language Processing techniques can be applied in order to automatically discover the hidden thematic structure in our corpus of grant document.
More specifically, Topic Modelling has been applied in order to discover our themes from the collection of grants. With this text mining approach, given a corpus of unstructured text documents (e.g. news articles, tweets, etc) and without a prior annotation, Topic Modelling outputs a set of topics, each of which is represented by a set of top-ranked terms for the topic and associations for documents relative to the topic. In our case, here the text documents are the text fields that describe each grant (title and description). From that textual information, themes will emerge as well as the relativeness of the grants to each theme.
Considering that with Topic Modelling a grant can potentially be associated with multiple themes, for each theme only a subset of its grants is taken into account. More specifically, for each theme, only the grants with a level of relativeness statistically significant are considered. This means that not all the grants are computed and visualized, but a subset of those more representative for each theme.
Let's show examples of some themes emerged from Topic Modelling and the terms that describe them:
Above is a very specific theme of a series of grants from a period comprised between 2004 and 2006, all belonging to the 'Heroes Return programme' and funded by the Big Lottery Fund.
This theme granting World War II veterans is quite specific and also is its period. But other discovered themes have a wider coverage regarding its meaning and its time span. Below three following examples:
There isn't a fixed number of topics to use, neither there is a fixed number of terms associated to a topic. There should be enough topics to be able to distinguish between highly representative themes in the text but not so many topics that they lose their interpretability. After some qualitative iterations, 15 topics and 20 terms per topic seem quite satisfactory given the obtained results.
Having some context regarding how themes are obtained, let's visualize the data.
Funding of discovered themes, throughout the years
A view to grantmaking activity since 1998 - timeline of the themes, sized by relativeness of its grants, the amount of funding, or the number of grants.
The timeline above shows most related grant's contributions to the different themes, but it's also interesting to have a view from a different point of view:
Instead of looking at overall contributions to the themes and explore main contributors for a specific theme, it is worth to start from a funding organization and display those grants with the strongest relation to themes, so its grantmaking activity can be grasped in an overview.
Funding organization: contributions to themes
Search for a funding organization and get which of its grants are related to the discovered topics