We usually think of the journalist as the essential part of a newspaper. Indeed they are, but other professional groups are important in the production of a newspaper. In former years workers on type settings were essential for the production of a newspaper. Now a new type of profession pops up in the news business: the data analytic computer nerd.In Denmark the company called Kass & Mulvad brands itselves as being specialists in finding news and patterns in complex data. The two guys behind the company have a background in journalism, but in one of their articles they are not afraid of mentioning Python, the web framework Django, Google Fusion and Google Chart. They run a course: “Django for journalists”! Kaas & Mulvad points to a couple of computer-supported journalism (“computer-assisted reporting”) efforts, e.g., the controversial Tampa Bay Mug Shots showing the faces and names of people booked in the last 24 hours in a few counties. The website is associated with St. Petersburg Times and extracts data from public information (county sheriff’s website). In Denmark, the newspaper Information has been at the forefront in datajournalism with web developer Johannes Wehner working – not in Django – but in Drupal. Information was the only Danish media to receive the 391.832 documents Wikileaks War Logs corpus. They write (with my poor translation):
Information has also published material from the Afghanistan leak. Wehner publishes analyses of the different material on the datablog with plots and maps. For data analysts he also has published a comma-separated values file with the threat reports from Afghanistan. My plot displays a simple histogram of the Afghanistan threat reports data (somewhat similar to one of Wehner’s plots). This plot shows an unfortunate increase in the number of threat through the years (until 2009). Danish foreign ministry has a website giving an overview of Danish achievements in Afghanistan. This is mostly positive, e.g., five million returning refugees, landmine clearing, two million girls in school. I suppose that this is not a Danish achievement alone(!), but a result of the effort of a number of countries, United States and United Kingdom, The Netherlands etc. as well as Afghanistan itself. Comparing the threat reports with the information from the Foreign Ministry there seems to be a discrepancy between negative and positive news from the different sources. Some of the discrepancy can be explained by the a type of threat: The threats of the Talaban against schools. As schools for girls become more widespread the nasty Talaban has wider opportunity to target schools. But whether these threats form a major part of the total number of threats I do not know. Information only shows around ten.
To find a path in the enormous amount of information we first and foremost constructed a searchable database, where it was possible to search in a large number of different ways, both on individual words in the text, on certain dates, on the type of report, on topics and geographical coordinates, regions, etc.