Literalism: Reading Machines Reading
Literalism traces the advent of today's large, statistical models of language back to a pivotal moment in the history of computing. Models ranging from topic distributions to deep learning networks have their roots in the 1950s, when researchers broadly began to think of computers not just as number crunchers, but as generalized symbolic processors that have access to meaning. A key aspect of this paradigm shift is the way it was supported by an entire scriptural economy, the emergence of which I chart through a combination of media archaeology, close textual analysis, and NLP methods. In doing so, Literalism shows how various textual formalisms have been integral to language modeling from the start. From blocky, machine-readable typefaces to n-grams, document-term matrices, and more, researchers have used these formalisms to remediate natural language into computationally tractable forms. If, today, language modeling has become all but pervasive in its disciplinary reach, concomitant with this expansion is an equally broad proliferation of these strange, intermediary representations of text.
Such representations, I argue, comprise a unique kind of textuality. They therefore mark a significant location of semiotic activity that is open to analysis from the vantage of the material text tradition, avant-garde poetics, and media theory. In this sense, NLP requires its own media history. My intent, then, with Literalism is to do just this, demonstrating how computational approaches to language modeling must be extended and enriched by these other disciplinary frameworks, particularly when it comes to assessing the claims on meaning that such models can and cannot support.