#SI4: Digital Textuality Paper Session 3
Friday, 26/Jul/2019:
9:00am - 10:30am

Session Chair: Emily Sherwood
Location: Marquis B, Marriott City Center
capacity 42

Frankenstein Variorum: Finding Insights in Comparisons

Emma Ruth Slayton1, Elisa Beshero-Bondar2, Jack Quirk1, Scott Weingart1, Avery Wiscomb1

1Carnegie Mellon University, United States of America; 2University of Pittsburgh, United States of America

Frankenstein is arguably one of the most influential works of science fiction, but the novel exists in multiple versions, with considerable controversy over which is the "best" text. Mary Shelley re-wrote Frankenstein several times, leading to major changes that are difficult to track from the first manuscript through three print editions, and one set of handwritten edits. To track the changes across these five versions, the Frankenstein Variorum project publishes all the editions in the standard language of the Text Encoding Initiative (TEI) and displays variant readings between editions, highlighting “hotspots” of variation. By enabling evaluation of these changes, the Variorum helps researchers come to new understandings of the text and Mary Shelley’s intentions during her writing and editing process. While past presentations of the Variorum at ADHO and Balisage have discussed our team's efforts in building the TEI of the project, this presentation concentrates on the design of an accessible visual interface, contextual framing of annotations, and geographic orientation applied to the Variorum edition.

Since the Variorum tracks changes in Frankenstein over time, it aims to provide a distinctive experience for its contextual annotations, including a GIS component that not only identifies locations mentioned in the novel but also follows Mary Shelley's travels in the years that she wrote and revised the text. The annotations, too, yield contextual insight into the most significant "hotspots" of variation—insight as yet undeveloped in previous print and digital editions of Frankenstein. The integration of an ESRI Story Map, which maps both the novel and the author’s travels, will enable researchers interested in the physical location of events in Frankenstein, as well catalog changes in the mention of places between different versions of the novel, to more fully explore these and other issues. By comparing animated spatial journeys against the text, we add a new layer of context for those interested in exploring the complexities of the writing process in relation to space and place. Key technological challenges for our team involve connecting digital contextual work, made in the ESRI story map, to the TEI of the Variorum edition. We will also discuss how the use of online interactive map can showcase the geographic context of the work, and invite participation and response from the audience in interacting with the storymap.

The Impact of Literature on Early AI Research

Avery Jacob Wiscomb, Daniel Evans

Carnegie Mellon University, United States of America

At a time when tech visionaries and engineers are calling for serious moral reflection on the future of machine learning and artificial intelligence (AI), we trace one thread in the history of thought about the sciences of the artificial. This text analytics project compares the complete papers of Herbert A. Simon (about 1.1GB of plain text) to 408 books taken from his personal library. Our work in progress attempts to identify some of the literary or philosophical sources present in Simon’s early discussions of AI and “the artificial brain” in the 1950s and 60s. Despite his later reputation as a computer scientist, Simon was trained as a student of political science at the University of Chicago, where he took courses in philosophy, biology, and economics. His research was acclaimed for its interdisciplinary nature, and it spanned across the fields. So we were curious if ideas from the books Simon read also appeared in his writings about AI and related technologies; there seems to be uniqueness in the way traditional literature colors Simon’s views of contemporaneous affairs, and his vision for the future of the machines we have in our pockets and homes today. Throughout his life, Simon also maintained that the computer, the organization, and even the individual human mind was a being “species of the genus information processor,” which some have labeled a fundamentally anti-humanist position, at odds with a scholar who read Proust, Shakespeare, and Aristotle. To map the interplay of ideas in Simon's work, we use Word2Vec and other software to compare word embeddings in his writings against some of the books that were in his library and later donated to Carnegie Mellon University where he taught for more than 50 years. We also report on our early exploration of Simon's corpus using the text-mining application Sifaka, which is built on top of the open-source search engine Lucene. By tracing Simon's influences, we seek to inform essential problems concerning the future of machine learning and AI by turning to its literary and philosophical past. We argue that similar analyses could help situate and ground historical sources for the digital in the history of humanistic inquiry from which AI springs.

Institutional Challenge: Text Encoding Rare 19th Century Job Printer Volume

Lisa Hermsen, Rebekah Walker

Rochester Institute of Technology, United States of America

This paper will describe the use of TEI to offer text analysis for a rare category of print–products of industry or business printing, the jobbing printers that produced it, and the clients who used it. We have identified five objects from a job printer’s firm, dated 1885-1920: a prices volume; an address book; a library catalogue; and a Volume no. 3 (a work manual, and the focus of this project). The rarity of this collection and the messiness of its creation requires text encoding to accurately analyze the contributions of this printing house and its place within a larger network of business practices of the time. TEI will be used as a way to manage collection transcription, and provide the most promising access to the original source material.

Volume no. 3 is a work manual overflowing with print job details regarding vellum binding, cat gut, durable paper, and much more. It also includes passages regarding the ethics of good work, assignment of duties, and the firm’s observation of Sundays and Holidays. The volume is particularly interesting in that it highlights the firm’s reputation for manufacturing accounting books in what the printer described as “the Bankers Way.” This volume describes special binding materials and methods necessary to throw the accounting book flat and recommends ruling for the pages suited for double-entry bookkeeping. While double-entry accounting had existed for centuries, this practice is thought to have transformed new industry, wage labor, and capital investment in England in the nineteenth century. This collection, therefore, includes valuable information about not only book and print history, but also how printing influenced and affected the history of finance and accounting.

The organization of Volume no. 3 presents a difficulty for transcription. It is loosely organized by an alphabetical index of topics and accompanying page numbers. Over time, topics were inserted, pages perhaps glued in place, and page numbers crossed. Many topics appear with multiple page numbers that may or may not be accurate. Within the volume, there are sketches, receipts, and price lists. Often the script is interrupted by insertions noted with a later date and initialed.

We will use text encoding to cull information from the collection, primarily working with undergraduate classes and capstone students starting in Spring 2019. We will use TEI not only to make the text machine-readable, but also provide a scheme for transcribing all five volumes in the collection. Our ultimate goal is to publish an indexed, searchable website containing full-text transcription and analysis for easier and quicker access to more people: the current scholars working on the project; students working with the materials; and future, specialized researchers.

This paper session, presented by the primary scholar working with the collection and the digital librarian helping shape and foster classroom engagement, will present work in progress for what is a seminal digital humanities project at our institution: initial analysis of encoded texts and student work inspired by the encoded text.

The Conglomerate Era

Dan Sinykin

University of Notre Dame, United States of America

I analyze how the conglomeration of US publishing changed literature. In this talk, I report on my findings based on new corpus of book reviews and on computational modeling performed over my Random House and nonprofit press corpora in my data capsule in Hathi Trust.

With a collaborator, I developed a corpus that indicates where, out of more than five hundred possible publications, each of more than a million titles was reviewed between 1950 and 2000. We produced from this a corpus of the 1% most-reviewed US novels with metadata on the race and gender of the author and the publisher. Through social network analysis, I show how publishers compare in terms of the immediate reception of their novels in the period, with particular attention to disproportionate gender distributions.

In my computational modeling, I adapt, with another collaborator, the model he used in an essay in Cultural Analytics to model genre in post-WWII US literature to discover latent patterns in Random House's list from 1950-2000. The model includes topic modeling, stylistic features, and extra-textual features including race and gender.

Additionally, I use the machine learning technique of text classification to analyze the difference with regard to literary form of conglomerate and nonprofit novels. This is work-in-progress on which I will report at the conference.

I bring these two lines of analysis—social network analysis of review reception and computational modeling of full text—to illuminate how women and writers of color at Random House and nonprofit presses responded formally through autofiction to interpersonal misogyny, structural patriarchy, and racism in publishing.

