#SC5: Analysis of Visual Corpora with Deep Learning Panel
Analysis of Visual Corpora with Deep Learning
1Cornell University; 2University of Richmond; 3Yale University
Neural networks have revolutionized computer vision, and are beginning to be applied in humanities contexts. There are significant practical difficulties in working with these methods, but also exciting opportunities. Can we repurpose tools developed in other contexts to answer humanities questions in creative new ways? What can we do with these new technologies?
We can now access massive new digitized image collections, but it is difficult to analyze them. Unlike text, which can be broken into independently meaningful words, pixels are only meaningful in their original context. Deep learning models, specifically convolutional neural networks, have started to bridge this gap. Neural networks are not a panacea, however, and come with computational and interpretive challenges.
This panel presents three case studies, in which scholars use neural networks to investigate large corpora of visual materials. In addition to showing how these methods are being used to address humanities questions, we also discuss the computational and explanatory challenges of working with neural networks. These case studies are:
1. Formal elements of moving images. This paper shows how face detection and recognition algorithms, applied to frames extracted from a corpus of moving images, are able to capture many formal elements present in the media. Locating and identifying faces makes it possible to algorithmically extract time-coded labels that directly correspond to concepts and taxonomies established within film theory. Knowing the size of detected faces, for example, provides a direct link to the concept of shot framing. The blocking of a scene can similarly be deduced knowing the relative positions of identified characters within a series of shots.
2. Machine-reading the avant garde. We used neural networks to transform page images from modernist journals into numerical representations ("computational cut-ups"). We then used those cut-ups as input for classifiers to answer two separate questions: which pages contain music, and which pages are from a Dadaist journal? The successes and failures of these computational cut-ups illustrate the workings of the neural network, and allow us to question the boundaries between established categories.
3. Visual clustering for collection-scale analysis. Scholarship often proceeds by finding surprising connections between apparently different materials, but searching for connections between images has been difficult. In this paper we present a method that uses pre-trained convolutional neural networks and new methods of dimensionality reduction to create a navigable space of visual similarity for large image collections. Advanced WebGL programming allows the user to explore and interact with these semantic clusters among hundreds of thousands of images in a web browser.
The panel is for anyone interested in computational approaches to visual and material culture. While this panel is not a hands-on tutorial, it will be accessible to those unfamiliar with image processing. We will highlight open-source code and tutorials for applying all three of these approaches to new corporal.