14 Apr 2011 • on digital tools DH computer

Spring of DH - THATCamp Florence

Considerate la vostra semenza:
fatti non foste a viver come bruti,
ma per seguir virtute e canoscenza.
Dante Alighieri, Inferno, Canto XXVI
(English translation here)

Which better place to follow Dante's wise words than THATCamp Florence? Driven by this idea, as well as by an insane curiosity to understand at last how a THATCamp works in practice (after having read about it on Twitter for over a year), I decided to go and see. Lucky me, my decision to undertake this experience was generously supported by a Mellon - THATCampFlorence Grant.

For once high expectations were not set too high: a Camp is really a place were you can discuss openly and freely, meet new people, improve your digital toolbox, or simply enjoy a highly stimulating environment. All this in Fiesole, near Florence, thus with the added benefits of great food, better wine, amazing panoramas and a welcoming spring weather. It was particularly exciting to see how THATCamp ateliers get democratically selected and organised in the plenary session. It felt like being on the agora or, to keep some geographical coherence, on the piazza of a Medieval commune. As an addon during the THATCamp I also got confirmed that in June I will be starting my new job at the digital project of the Rachel Carson Center in Munich. The only downside of my THATCamp experience was that my (allegedly) cheap&hip downtown ho(s)tel overbooked and sent me to a cheap (fullstop) hotel a couple of blocks away.

Choosing what to attend was one of the most difficult tasks, but in the end during the BootCamp I attended five workshops on such diverse topics as WordPress, Omeka, Zotero for advanced users, online stats tools for historians, copyrights issues.

The WordPress and Zotero workshops were very useful in giving me further hints on how to use tools that are already central in my DH workflow. In particular, I gathered a long list of useful plugins that may help me to integrate the two tools and improve the look and feel of the website of the Nature&Nation network. I have already installed a COiNS plugin to make the site readable by Zotero and the ZotPress plugin to publish the network's group library on the site. I have also attempted to use the ScholarPress Workshop plugin, that allows you to gather applications and papers for a workshop and have them delivered directly to your Zotero library. Unfortunately I have some kind of problems with my webhost and have not been able yet to get it working. I am also playing with the idea to implement CommentPress in a subsection of the site, as a means to set up a kind of e-journal.

As regards Zotero's power as an analysis tool I've instead learnt of the existence of ZoteroMaps (that gives you the ability to map places of publication or those cited in attachments) and of Timeline (allowing you to visualize the chronological distribution of sources). Both could become very useful in the analysis of large archival records, but I've yet to figure out if the ZoteroMaps geographical data are then easily exportable to desktop GIS tools for further analysis.

Moreover, the Omeka workshop really helped me to understand what Omeka is and start playing with it on Omeka Playdate, and how it will be possible to put it at use, particularly within my upcoming job at the Rachel Carson Center.

I was less impressed by the workshop on online statistical tools for historians, but just because I already use R (as a desktop tool) for my statistical analyses and do not, at the moment, feel the need for an online facility. The guys at the Sorbonne however seem to do really great stuff in the analysis of historical data and in setting up an "historian-friendly" interface to R's calc power. I've also learnt of the existence of a new R IDE (rstudio.org) that could become my new software of choice. Moreover, I have to think if Business Intelligence (now that I know what it is) may be, as suggested, an answer to some of the needs of the historian's workflow. Obviously i haven't an answer yet).

Very similar were the ups and downs of the statistical analysis atelier I attended during the actual THATCamp. Probably one of the most interesting issues at stake is the need for sharing data within the historical community (a not very common practice yet). As stated during the two sessions in fact historians are like ogres: always hungry for data. What would be needed are open historical data repositories, but maybe there could be major copyright issues with archives and other right-holders (an issue however that was not discussed in the BootCamp workshop about copyright). The other major issue is that traditional statistical methods do not consider (yet) the fuzzyness and frequent unreliability of historical data; maybe we should think about implementing Bayesian models in our analyses.

The mentioned workshop about copyright was probably the less useful, but at least I learnt that in any case I will need a lawyer to tackle the complicated EU copyright legislation. Which is already more than the initial feeling that virtually you cannot publish any source, ever.

Extremely interesting was instead the atelier about data visualization, where I became aware of whole bunch of nice tools that allow you to visualize with ease large dataset (you may find the list on the site). During the atelier we also discussed the need for different visualization tools for outreach, analysis and education and for standardzation among these different tools as to allow for the possibility to easily transfer data from one to the other.

I am sure that I will bring to good use also the discussion we had during the crowdsourcing atelier. Crowdsourcing promises to be one of the best ways to gather and annotate historical material beyond traditional archives and obtain more and better sources for our research. The main issue we debated here was how to involve as much people as possible in a crowdsourcing project (even if it seems that crowdsourcing may work well also with a small, selected community): probably the best solution is to give back to the community and keep all the gathered data free/open/libre as to foster cooperation. Another topic that has been discussed was the need for filtering data, and the possibility to give reward also to people active in the filtering task. In this case it seemed that a small community could be more of an asset (at this regard I've read on twitter in the same days also of clubsourcing, that is "training est'd communities of interest to help w/ crowdsource solutions. Via Peter Hedlund of VFH" — via @edmj Fri, 25 Mar 2011 17:57:41).

Also the text mining atelier proved very useful, as it provided an extensive list of interesting tools. I am not sure however that I will be able to implement these tools in my workflow, since I am not sure how to digitize and OCR all the data I need being on a tight budget. Probably I will have to wait for interesting data to be digitized by someone else...

It was a pity that no coordinator was found for the foreseen GIS BootCamp session, but luckily my proposal for a GIS atelier got enough votes to get organised. I had thus the exciting opportunity to discuss with other similarly minded people (including a couple of future colleagues) about the problems that arise when trying to use GIS and online mapping tools in the humanities. I even gathered a list of useful tools that may help me in geolocating correctly historical and disused toponyms, one of the main problems that I have recently faced in my work. Nonetheless, I still feel that as historians we should open up our data and, if we accomplish to localize a disused toponym, share the coordinates with the community. An option could be to add them to existing tools such as geonames.org; an other to set up a dedicated service for historical places, since it seems that geonames.org does not allow for embedding also the much needed temporal data (i.e. when was the toponym last used). Further discussion would be needed at this regard; hopefully we will have further occasions in future.

The Camp was also truly interactive: it was virtually possible to attend more than one session at time thanks to a sustained and informative twitter stream and to extensive session reports posted on the blog. The latter may help also who has not had the opportunity to attend to have an idea of what has been discussed and done. It will also be useful to me as a set of resource for the course in "Digital History" that I've been asked today to teach at the MA in history of the University of Trento. As for the twitter stream, I have a copy of it on my disk, and will post it here as soon as I find a way to parse and subsitute automatically handles and hashtags with actual links. If you have ideas at this regard, please contact me @wilkohardenberg.