Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule
Shaw, Ryan. Assistant Professor, School of Information and Library Science, University of North Carolina at Chapel Hill
Title: Breaking the Historian’s Code: Finding Patterns of Historical Representation
Abstract: Historical narrative is a rich and complex form of knowledge representation. In The Savage Mind Lévi-Strauss described what he called “the historian’s code” (p. 259): the recursive conceptual structure that enables historians to represent the past as broadly or as narrowly as they wish. This structure fades into the background when we fall under the spell of a good historical narrative, and we feel that we are experiencing the past “as it happened” rather than a representation of it. This can blind us to the possibility of other representations of the past. The traditional remedy for this blindness has been to study more history: reading multiple overlapping narratives is what enables us to locate the specific point of view in each one (Ankersmit, 1983, p. 219). By comparing narratives that select different sets of events at different levels of specificity, the historian’s code can be made visible. New techniques for “distant reading” of digitized texts promise to offer new ways of seeing the contours of difference in perspective that distinguish historical narratives. I am currently exploring the use of natural language processing (NLP) techniques to identify events in historical narratives and group them into narrative chains at different levels of specificity. The goal is to help readers understand historical discourse by deriving alternative representations that can be more easily manipulated, visualized and compared than the original narratives. In this initial stage the project is focused on two sets of documents related to the civil rights movement: 300 interview transcripts from the Southern Oral History Program1 and the full text of 87 books on the civil rights movement published by the UNC Press.2 The specific NLP techniques being employed are named entity recognition, event extraction, and event chain mining. Named entity recognition involves identifying named entities (people, organizations, events) in texts and linking them to authoritative identifiers in databases containing additional facts about those entities. Event extraction involves identifying sentences that communicate some event, e.g. a strike, a protest, or a legislative act. Specifically, event extraction involves training a classifier to match sentences to a semantic frame, a conceptual structure that describes a particular type of event along with its participants and setting. To identify passages narrating more complex events we must extract not just individual events, but chains of events from texts. The procedure of identifying commonly occurring event chains from a global set of extracted event frames is known as event chain mining. The identified event chains can then be used as schemas or story templates for exploring the corpus, or event chains drawn from different parts of the corpus can be compared and contrasted (e.g. those drawn from oral histories versus those drawn from scholarly monographs).
Lévi-Strauss, Claude. The Savage Mind. Chicago: University of Chicago Press, 1966.
Ankersmit, Frank R. Narrative Logic: A Semantic Analysis of the Historian’s Language. The Hague: M. Nijhoff, 1983. 1 www.sohp.org/ 2 lcrm.lib.unc.edu/voice/
Recent Comments