Lightning Talk: Humanities Projects and Google Summer of Code
Classical Language Toolkit
Since 2005, Google has offered summer stipends designed to introduce students to open-source software development through a program called Google Summer of Code (GSoC). While several academically oriented projects participated in the most recent iteration of GSoC, these tended largely to represent STEM fields. In contrast, I have served in recent years as a mentor (and former student) for an open-source humanities project and would like to raise awareness of the program in the Digital Humanities community. This lightning talk sets out to accomplish the following: 1. introduce GSoC to a Digital Humanities audience; 2. give a brief overview of my experience participating both as a student and as a mentor in GSoC; and 3. encourage open-source Digital Humanities projects to consider applying for GSoC 2020.
On the Feedback Loop Between Digital-Pedagogy Research and Digital-Humanities Researchers in DH Tool Building Practices
Indiana University, United States of America
The line between digital-humanities research and digital-humanities pedagogy often seems impermeable. From edited collections to conference submissions, research and pedagogy are structurally separate (Eichmann-Kalwara, Jorgensen, Weingart, 2017), and the tools we use to enact digital analysis are bifurcated along similar lines. This presentation tracks the design and functionality choices that shaped a network-analysis tool, Net.Create, over several cycles of use by both digital-history and digital-pedagogy research teams. The tool was initially developed by the first presenter to support a team of 5 researchers engaged in simultaneous synchronous entry of network-analysis capta (Drucker, 2009) from open-prose text for a digital-history project. Its next iteration supported video and screen-capture data of undergraduate students as a research team explored how network analysis can support history reading comprehension in large active-learning undergraduate classrooms. Two more feedback cycles between these environments resulted in further changes to the tool. The presentation will detail some of the design changes that brought the tool into line with 8 of the 11 features of an ideal network analysis tool that Scott Weingart recently proposed at GHI (Weingart, 2018). We found that research-driven feature choices requested by the digital-history teams fostered more robust student learning. More surprisingly, analysis of the pedagogy-research videos identified several network-theoretical and digital-history-methods supports that directly improved the capta process and coding-schema clarity for the digital-history research teams. Simultaneous entry and live visualization in particular were fostered by the tool’s movement between research and teaching environments. These indirect interactions between digital-history researchers undergraduate history learners make a case for better integration of disciplinary-expert research and rigorous in-classroom pedagogy research in tool-building practices for digital humanists.
Quantifying the Degree of Planned Obsolesce in Online Digital Humanities Projects
1Electronic Textual Cultures Lab, University of Victoria, Canada; 2King’s College London; 3Center for the Study of Digital Libraries, Texas A&M University
Many of the online projects in the digital humanities have an implied planned obsolesce –which means that they will degrade over time once they cease to receive updates in their content and tools. We presented papers in Digital Humanities 2017 and 2018 that explored the abandonment and the average lifespan of online projects in the digital humanities and contrasted how things have changed over the course of a year. However, we believe that managing and characterizing the degradation of online digital humanities projects is a complex problem that demands further analysis.
In this proposal, we dive deeper into exploring the distinctive signs of abandonment to quantify the planned obsolesce of online digital humanities projects. In our workflow, we used each project included in the Book of Abstracts that is published after each Digital Humanities conference from 2006 to 2018. We then proceed to periodically create a set of WARC files for each project, which are processed and analyzed using Python (Rossum, van, 1995)and Apache Spark (Apache Software Foundation, 2017)to extract analytics that we used in our statistical analysis. More specifically, our analysis incorporates the retrieved HTTP response codes, number of redirects, DNS metadata and detailed examination of the contents and links returned by traversing the base node. This combination metrics and techniques allow us assess the degree of change of a project over time.
Finally, this study aims to answer three questions. First, can we identify the signals of abandoned projects using computational methods? Second, can the degree of abandonment be quantified? And third, what features are more relevant than others when identifying instances of abandonment? In the end, we intend this study to be a step forward towards better preservation strategies for the planned obsolesce of digital humanities projects.
Apache Software Foundation(2017). Apache Spark: Lightning-fast cluster computing http://spark.apache.org (accessed 11 April 2017).
Rossum, G. van(1995). Python Tutorial, Technical Report CS-R9526. Amsterdam: Centrum voor Wikunde en Informatica (CWI) https://ir.cwi.nl/pub/5007/05007D.pdf.
Paratexts from the Early English Book Trade: A Work-in-Progress Database
York College/CUNY, United States of America
By the seventeenth century, the print marketplace was a ubiquitous presence in the lives of both authors and readers, influencing the ways they understood and produced new works. Printers, publishers, and booksellers quickly understood how to properly frame books so that readers would not only know where to buy a book, but would learn to identify categories such as genre and authorship as measures of quality and good taste. These agents of print were responsible for translating this new technology into a recognizable format, and using it to establish relationships with new and returning readers through prefaces, dedications, and even in more structural elements like tables of contents and errata. While Leah Marcus notes that “the printer and the publisher play a striking part [in creating] a strong authorial presence” (193) for printed books, many paratexts play a much larger role in setting up a connection between the stationer as a textual gatekeeper and the reader as a potential customer.
We currently have access to a wide range of digital projects for the study of early modern literature—e.g. Early English Books Online (EEBO), Database of Early English Books (DEEP), and English Broadside Ballads Archive (EBBA), to name a few—as well as to a growing number of projects that aid in the study of the book trade—e.g. The London Book Trades Database, the British Book Trade Index, and the (not publically available) Stationers’ Register Online. While larger repositories tend to make the study of paratexts difficult (either by inconsistently cataloguing information or omitting it altogether), historical databases typically limit themselves to the already-daunting task of tracing individuals’ biographies and social networks. Book history scholars still need tools that consider how social and labor networks played out not only historically and legally, but also textually through the authorship of dedications, epistles to the reader, and errata. This 15-minute presentation will introduce audience attendees to a new database for researching paratextual materials authored by early modern stationers. This database will demonstrate the value of considering paratexts from the early book trade as a unique genre, encourage researchers to find new research questions, and invite users to contribute to growing the existing dataset.
Following a discussion of the project’s rationale and an overview of the database in its current iteration, this presentation will conclude by reflecting on the challenges and lessons of working on digital humanities projects without much institutional support or collaborators. Why should early career researchers undertake digital projects? What are the realities of pursuing such projects in light of other institutional demands, and what kinds of realistic timelines and setbacks should be taken into consideration? The audience for this presentation should include researchers interested in book history, project management, and database development, particularly but not limited to those studying early modern England.
Jupyter Notebooks and Reproducible Research in DH
George Mason University, United States of America
With the use of computational methods and digital sources in the humanities, there is growing concern regarding the need for increased transparency and reproducibility of workflows and code as part of digital scholarly activity. From Alan Liu’s work on reproducible workflows as part of the WhatEvery1Says project to Matthew Burton’s analysis of the digital humanities landscape, scholars are increasingly drawing attention to the need for systematic transparency about data handling and computational methods. Fortunately, the digital humanities are not alone in this endeavor. Researchers in the sciences have been developing processes and technologies around the production and dissemination of code as part of the scientific publishing process. As a result, there are multiple tools and communities of practice already in development which humanities scholars can adapt to document, execute, and publish their computational analysis.
In this presentation, I approach the challenge of reproducible research from the perspective of an individual scholar. Drawing on examples from my dissertation, I highlight strategies for and advantages of integrating narrative, code, and visualizations within interactive documents such as Jupyter notebooks. I will discuss ways scholars can adapt the existing tools and processes around reproducible research to the context of individual digital humanities projects, with attention to the different platforms available and current research on reproducible research in scholarly communication and the sciences. By showing how code can be integrated with visualizations and written analysis using existing tools and software, this presentation argues for the importance of documented code and methods as part of the scholarly output of computational analysis in the digital humanities and the necessity of expanding the paradigm of humanities publishing to support such work.