Category Archive: Abstracts: Papers

Jul 21

Making the most of free, unrestricted texts–a first look at the promise of the Text Creation Partnership

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Welzenbach, Rebecca. Text Creation Partnership Project Outreach Librarian, MPublishing, University of Michigan Library

Title: Making the most of free, unrestricted texts–a first look at the promise of the Text Creation Partnership

Abstract: In April 2011, the Text Creation Partnership announced that 2,231 transcribed and SGML/XML encoded texts from the Eighteenth Century Collections Online (ECCO) corpus were freely available to the public, with no restrictions on their use or distribution. This is the first set of TCP texts to have all restrictions lifted. We have already seen significant interest in studying, manipulating, and publishing these texts, which has given us a peek at what might happen in a few years, when the much larger EEBO-TCP also archive becomes available to the public. The release was met with enthusiasm by power users who were eager to work directly with the XML files, but frustration by those who expected a full-service platform to interact with the texts. This presentation will discuss the mixed reactions to the release of the ECCO-TCP texts; offer examples of how people are starting to work with them; and highlight some of the questions, challenges, and opportunities that have arisen for the TCP as a result.

Jul 21

From Uncertainty to Virtual Reality: Knowledge Representation in Rome Reborn

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Stinson, Philip. Assistant Professor, Department of Classics, University of Kansas

Title: From Uncertainty to Virtual Reality: Knowledge Representation in Rome Reborn

Abstract: Graphic representations of ancient Rome have become more visually powerful in the late twentieth and early twenty-first centuries with the innovations afforded by digital technologies, but the use value of these images is under debate today. This paper explores the interplay among different types of knowledge representation, an under-theorized area of research in the digital humanities, in the acclaimed Rome Reborn project, now also known as Ancient Rome 3D in Google Earth. Rome Reborn is perhaps the largest and most complex visualization endeavor in the digital humanities to date. The author of this paper belonged to the original project team (UCLA 1999-2001) and is on the Scientific Committee of the current iteration (UVA). Rome Reborn incorporates distinct classes of knowledge—historical sources, archaeological remains, and deductive logic or inference—as a basis to reconstruct the appearance of ancient Rome’s monuments (mainly temples, public buildings and residential structures), urban infrastructure (streets, aqueducts), and topography (hills of Rome, Tiber River). All forms of knowledge utilized in the making of Rome Reborn are represented by the medium of an interactive virtual reality model consisting of millions of polygonal surfaces with applied colors, textures and simulations of light and shadow effects. This paper will perform autopsy on Rome Reborn and expose its interwoven visual representations of historical, archaeological, and conjectural knowledge. The relationships of secure knowledge representations, which are sparse in the model, to the more prevalent conjectural or speculative knowledge representations will be clarified with the aim of identifying Rome Reborn’s underlying epistemological structure. Analysis of Rome Reborn in this manner holds the potential to advance the methodological discourse in the digital humanities for the visual representation of knowledge when multiple forms of knowledge require systemization and when levels of uncertainty are high.

Jul 21

The hermeneutics of data representation

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Plenary Session
Sperberg-McQueen, Michael. Black Mesa Technologies (

Title: The hermeneutics of data representation

When we consult a file on disk, or receive a data stream on a network port, we see a sequence of bits. What does it mean? And can we tell the difference between a meaningful sequence of bits and garbage? Any work involving the machine-readable representation of knowledge must consider both how to validate the representation mechanically (to detect and possibly recover from transmission or storage errors) and how to verify the information semantically and reason about it systematically. The talk will survey some possible approaches to each of these problems and point to current technologies that seem promising in addressing them. At another level, however, data representation has another kind of meaning. Like any cultural artifact, a data representation tells a story about the culture that made it. What do our choices of data representation say about our culture? And what does XML have to do with Kant’s definition of enlightenment?ls a story about
the culture that made it. What do our choices of data representation say about our culture? And what does XML have to do with Kant’s definition of enlightenment?

Jul 21

Breaking the Historian’s Code: Finding Patterns of Historical Representation

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Shaw, Ryan. Assistant Professor, School of Information and Library Science, University of North Carolina at Chapel Hill

Title: Breaking the Historian’s Code: Finding Patterns of Historical Representation

Abstract: Historical narrative is a rich and complex form of knowledge representation. In The Savage Mind Lévi-Strauss described what he called “the historian’s code” (p. 259): the recursive conceptual structure that enables historians to represent the past as broadly or as narrowly as they wish. This structure fades into the background when we fall under the spell of a good historical narrative, and we feel that we are experiencing the past “as it happened” rather than a representation of it. This can blind us to the possibility of other representations of the past. The traditional remedy for this blindness has been to study more history: reading multiple overlapping narratives is what enables us to locate the specific point of view in each one (Ankersmit, 1983, p. 219). By comparing narratives that select different sets of events at different levels of specificity, the historian’s code can be made visible. New techniques for “distant reading” of digitized texts promise to offer new ways of seeing the contours of difference in perspective that distinguish historical narratives. I am currently exploring the use of natural language processing (NLP) techniques to identify events in historical narratives and group them into narrative chains at different levels of specificity. The goal is to help readers understand historical discourse by deriving alternative representations that can be more easily manipulated, visualized and compared than the original narratives. In this initial stage the project is focused on two sets of documents related to the civil rights movement: 300 interview transcripts from the Southern Oral History Program1 and the full text of 87 books on the civil rights movement published by the UNC Press.2 The specific NLP techniques being employed are named entity recognition, event extraction, and event chain mining. Named entity recognition involves identifying named entities (people, organizations, events) in texts and linking them to authoritative identifiers in databases containing additional facts about those entities. Event extraction involves identifying sentences that communicate some event, e.g. a strike, a protest, or a legislative act. Specifically, event extraction involves training a classifier to match sentences to a semantic frame, a conceptual structure that describes a particular type of event along with its participants and setting. To identify passages narrating more complex events we must extract not just individual events, but chains of events from texts. The procedure of identifying commonly occurring event chains from a global set of extracted event frames is known as event chain mining. The identified event chains can then be used as schemas or story templates for exploring the corpus, or event chains drawn from different parts of the corpus can be compared and contrasted (e.g. those drawn from oral histories versus those drawn from scholarly monographs).

Lévi-Strauss, Claude. The Savage Mind. Chicago: University of Chicago Press, 1966.

Ankersmit, Frank R. Narrative Logic: A Semantic Analysis of the Historian’s Language. The Hague: M. Nijhoff, 1983. 1 2

Jul 21

Prosopography and Computer Ontologies: towards a formal representation of the ‘factoid’ model by means of CIDOC-CRM

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Pasin, Michele. Research Associate, Kings College, London

Title: Prosopography and Computer Ontologies: towards a formal representation of the ‘factoid’ model by means of CIDOC-CRM

View slides:

Abstract: Structured Prosopography provides a formal model for representing prosopography: a branch of historical research that traditionally has focused on the identification of people that appear in historical sources. Pre-digital print prosopographies, such as Martindale 1992, presented its materials as narrative articles about the individuals it contains. Since the 1990s, KCL’s Department of Digital Humanities (formerly known as Center for Computing in the Humanities) has been involved in the development of structured prosopographical databases, and has had direct involvement in Prosopographies of the Byzantine World (PBE and PBW), Anglo-Saxon England (PASE), Medieval Scotland (PoMS) and now more generally northern Britain (“Breaking of Britain”: BoB), and is currently in discussions about others. DDH has been involved in the development of a general “factoid-oriented” model of structure that although downplaying or eliminating narratives about people, has to a large extent served the needs of these various projects quite well.

DDH’s factoid-oriented prosopographical model are currently all expressed using the entity-attribute-relationship model of the relational database. The structure formally identifies obvious items of interest: Persons and Sources, and extends to related things like Offices or Places. In our prosopographical model the Factoid is a central idea and represents the spot in a primary source where something is said about one or more persons. It links people to the information about them via spots in primary sources that assert that information. By creating “factoids” which assert things about what the source says about people, the factoid approach prioritises the sources, and our historians’ reading of them. Our data about a person is not, then, so much a narrative that presents a summary written by the prosopographer (as it was in the articles about persons included in pre-digital prosopography) as a collection of information about what the sources say about him/her, and can represent the multiple, perhaps contradictory, voices of the different sources simultaneously: one saying she is a Saxon, but another saying that she was from Northumbria.

Bradley and Short (2005) has a more complete overview of the factoid model than what there is room for here. Recent development, particularly the WWW, and its related technologies around the Semantic Web have promoted the possibility to both interconnecting dispersed data, and allowing it to be queried semantically. Central technologies to support this approach are ontology languages such as RDF and OWL. Modelling the work of prosopography in a framework such as OWL is in many ways similar to relational database modelling: including the idea of classes and slots which correspond quite closely to entities and attributes in RDB modelling, and in the handling of relationships between data. So far, the DB structured approach has served our approach to prosopography well – producing digital resources that are well received by the research community, and the broader public. What, then, are the advantages of rendering our prosopographical material which is already highly structured into an ontology instead? First, ontology systems such as OWL, by being a part of the Semantic Web initiative, are designed to inter-operate between independent resources. Our recently started project called The Breaking of Britain – which will produce separate prosopographical databases of people in Scotland and Northern England, will be our first experience of the need for substantial linking and searching between them. Second, ontology systems provide a mechanism to engage more heavily first order logic in the search. (There is something of an introduction to this issue in Zöllner-Weber 2009, although in the field of literary scholarship). First order logic could help with genealogical data (where an asserting that A was son of B, also can be interpreted as stating that B was mother or father of A), and can facilitate the management of the complex date searching needs of materials from Medieval times. Finally, we expect that a full ontological expression of our approach to prosopography will provide a richer and more transparent formal expression of the semantics attached to our approach than we have at present. As a result, such a prosopographical ontology will embody a shared conceptualisation of the field useful to both computers and people (cf. Gruber, 2000).

One of the central approaches to modelling in the Semantic Web world is to develop a model that contains elements of other, compatible, schemas. Borrowed elements must not only match structurally, but the semantics of the classes and slots in the shared model have to match conceptually as well. At present, the existing ontology that we know about that best matches our interests is CIDOC-CRM (CIDOC 2006 and Doerr 2003), and it has been mapped to both RDFS and OWL (Schiemann 2010) – but it is aimed at the needs of the museum and archive community as a way of representing cultural heritage materials.
CIDOC-CRM has several advantages as the base for a prosopographical ontology: it is sympathetic with an historical view of the materials it represents, and identifies relevant entities: including persons, places, events and sources. Page 11 of the CIDOC-CRM specification points out that “The CRM does not propose a specific form to support reasoning about possible identity”, but we are not asking the ontology for these projects to do that. The identity assertion is the work of our historian partners, and the point of the ontology is not to derive the identity of persons from the ontological-expressed data for us, but merely to express what the historians assert about their materials (including assertion of identity of individuals) in ways that support sophisticated searching.

There has been a stream of argument about the black-and-white nature of assertions made through computer ontologies, implying that this bi-polar nature is a significant flaw when applied to Humanities materials. Indeed, we expect that much of the discussion about computer ontologies – centered often on still the relatively simple problems within science and engineering (see, for example, the discussion in Gruber 1993) that have been used as examples – have put Humanists (even Digital Humanists) off. Veltman 2004 provides this kind of argument when he claims that the preservation of culture requires the dealing of changing meanings over different places and times, and that computer ontologies try to “create data structures that assume a single world-view”. (p. 7). Now, this would indeed be a significant concern and, not accidently, fit with, say, Louch’s (Louch 1969) reasoning about why narrative, with its subtlety of expression, remains for many historians the main vehicle for research output. However the story need not be as pessimistic as Veltman seems to believe, since ontology modelling need not mean that a single view of the material is an inevitable result.

Indeed, our factoid approach can show that formal structuring if designed correctly need not impose, as Veltman implies, a single perspective on the data it models, but is capable of accommodating a range of views from the different sources. In the factoid model statements about the people are not made baldly: “Alfred the Great learned his Latin from Plegmund, Asser, Grimbald and John”. Instead, the statement records what a contemporary source says: “The source of the West-Saxon Version of Pope Gregory the Great’s ‘Pastor Care’ (section 7.18) asserts that Alfred the Great learned his Latin from Plegmund …”. By introducing the source as an intermediary, the model can also accommodate contradictory statements from different sources.

CIDOC provides classes meant to facilitate the representation of this iatus between a fact and the reading (expressed by the source, or by us in virtue of our editorial role) of that fact. On page 8 of the CIDOC specifications, the E13-Attribute Assignment class is presented as what allows the “documentation of how the respective assignment came about, and whose opinion it was”. By using this class all the “properties assigned in such an action can also be seen as directly attached to the respective item or concept, possibly as a collection of contradictory values”. Examples of how to employ this mechanism to the end of constructing a reified description of reality can also be found in the context of the ‘exhibition problem’ discussion (Eide, 2009). An alternative solution could be instead extending CIDOC by using ad hoc classes; in the context of building ontologies for the philosophical domain, an approach of this kind has been previously documented by one of the authors (Pasin, 2009, p.7). Here a purposedly created ‘Interpretation Event’ class extends CIDOC and allows different users to organize networks of philosophical ideas in a subjective manner. Despite some few early examples available, the issue of how to properly map our structured factoid model into an OWL ontology based on CIDOC-CRM remains quite a difficult and interesting one. In this paper we will report in some detail on our most recent attempts to address this problem, by reviewing the pros and cons of the existing approaches and exploring in some depth aspects of the mapping that required significant enrichment or extension of CIDOC to accomplish.

Bradley, John and Harold Short (2005). “Texts into databases: the Evolving Field of New-style Prosopography”. In Literary and Linguistic Computing Vol. 20 Suppl. 1:3-CIDOC 2006. The CIDOC Conceptual Reference Model. International Council of Museums.

Doerr, Martin (2003). “The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata”. In AI Magazine Vol. 24 No. 3. Online version available at

Eide, Oyvinde (2008). “The Exhibition Problem. A Real-life Example with a Suggested Solution”. In Literary and Linguistic Computing vol. 23 (1) pp. 27-37

Gruber, Tom (1993). “Towards Principles for the Design of Ontologies Used for Knowledge Sharing”. In Nicola Guarino and Roberto Poli (eds.) Formal Ontology in Conceptual Analysis and Knowledge Representation. Kluwer Academic publishers.

Gruber, Tom (2000). “Every Ontology is a Treaty”. Interview for Semantic Web and Information Systems SIG of the Association for Information Systems. SIGSEMIS Bulletin vol. 1 (3).

Louch, Alfred .R. (1969). “History as Narrative”. In History and Theory Vol. 8 No 1. pp. 54-70.

Martindale, J.R. (1992). The Prosopography of the Later Roman Empire, 3: A.D. 527-641. Cambridge: Cambridge University Press. 1992.

Pasin, Michele and Motta, Enrico (2009). “Ontological Requirements for Annotation and Navigation of Philosophical Resources”. In Synthese. Online at

Schiemann, Bernard, Martin Oischinger and Günther Görz (2010). Erlanger CRM/OWL. Online at

Veltman, Kim H. (2004). “Towards a Semantic Web for Culture”. In Journal of Digital Information. Vol 4 No 4. Online at

Zöllner-Weber, Amélie (2009). “Ontologies and Logic Reasoning as Tools in Humanities”. In Digital Humanities Quarterly Vol 3 No 4. Online at

Jul 21

Representing Geographic Knowledge: Opportunities and Challenges from the Atlanta Maps Project at Emory University

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Page, Michael. Geospatial Coordinator, Robert W. Woodruff Library, Emory University;
Varner, Stewart. Digital Scholarship Coordinator, Robert W. Woodruff Library, Emory University

Title: Representing Geographic Knowledge: Opportunities and Challenges from the Atlanta Maps Project at Emory University

Abstract: Printed maps have long been a means to take a survey of an area, create inventories, and provide tools for navigation or reference. Maps create superficial representations of a space but they often tacitly record the more complex social and political history of a place in the process. As a result, maps are attractive scholarly resources for a wide range of researchers. Emerging geographic technology such as GIS offers new opportunities to humanities scholars interested in understanding how meaning is created, and contested, spatially. For example, scholars are now able to represent spatially situated changes overtime in a much clearer way than what would have been possible before. In doing so, these maps may reveal information that was hidden in text-based scholarship. Furthermore, geolocation projects are able to grow and evolve in ways that more static resources never could. However, GIS also presents some important challenges. No matter how dynamic and interactive maps becomes they will always be representations. As such, they will always highlight some aspects of a place while neglecting others. A city is a chaotic, organic space where power and resistance to power shape each other and the spaces they occupy. A map, on the other hand, organizes spaces and freezes the chaos at particular moments and from particular perspectives. The ability of a map to present the city as a controlled and orderly space is what makes it both useful and potentially deceptive. The scientific authority of GIS adds to the danger that the line between geographic data and politically situated perspectives could be blurred.

This presentation will illustrate these opportunities and challenges with experiences gained from developing a digital map of Atlanta at the Emory University Libraries. Using GIS to produce a rich digital representation of Atlanta, Georgia, the first phase of this project produced a digital map based on an atlas of the city from 1928. The map is so intricately detailed it includes everything from roads and railways to building footprints and manhole covers. Because construction all but ceased in the city shortly after this atlas was published due to, first, Great Depression, and then by World War Two, the map provides a relatively reliable image of the city as it was for the two decades leading up to the Civil Rights Era. Using this digital map as a foundation, the second phase of the project will involve building a geo-database in which the geometric features will be given both descriptors (name, type, ownership, etc.) and linkages to other digital objects (photographs, audio, maps, etc.). For example, the library holds extensive historical records from a local African American funeral home. These records document the address, age and cause of death for thousands of individuals. By representing this information geographically on the digital map, scholars may be able to find new patterns that illuminate a relationship between utility infrastructure and public health or enhance our understanding of racial segregation. Eventually, this map could be used to expose a wide variety of library resources and enhance numerous research projects. We hope that our presentation will inspire similar projects, elicit suggestions for improvement and become part of a discussion about the proper use of GIS in humanities work.

Jul 21

Viral Venuses: The Potential of Digital Pedagogy in Feminist Classrooms

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Hill, DaMaris. Doctoral Student, English-Creative Writing Program, University of Kansas

Title: Viral Venuses: The Potential of Digital Pedagogy in Feminist Classrooms

View presentation (Prezi)

Abstract: The legacy of Baartman’s exploitation has yet to be resolved by the reemergence of humanist values or the illusions of gender neutral digital environments. In 2008 the National Endowment for the Humanities defined digital humanities as an umbrella term used to describe the different activities surrounding technology and humanities scholarship. A digital humanities approach to this humanities based feminist studies course seemed eminent, particularly when one considers the influence of social media and digital mediums on popular culture. Teaching this course challenged me to align libratory pedagogy, feminist instructional theory, and digital technology, digi-feminist pedagogical practice.

This course aims to understand how the body –the female body in particular –has figured into philosophy, cultural studies, history, literature, and visual culture to include digital spaces. The course includes analysis of how these standards of beauty change across time and cultural groups, and the impact of these standards on women as individuals and on social and political outcomes. The connections between social media and physical appearance are easily recognized by many students. Social media outlets feature large quantities of photographs and images that influence standards of beauty on a very personal level. Additionally, digital devices are largely considered as accessories that accentuate attractiveness and act as indicators of personality in our culture. The Internet, instantly accessible via cell phone, laptop and iPad, has become the initial site of exploration and research for most students. One of the pedagogical aims of the course is to facilitate learning and generate outcomes in digital forms. In addition to class discussion and small group exploration of feminist issues, some of the digital tools used to facilitate learning include:

• Spencer Art Museum Digital Archives
• Individual online course conferences
• Blackboard
• YouTube Clips and live streams
• Final projects facilitated using webpages • Electronic database searches
• Digital music and audio files
• Modified identity boxes, Photoshop image design
• E-readers (Nook, iPad, ect.)

The assignments are in digital formats and the final project includes developing a webpage.

Therefore, the assignments are easily shared with peer groups outside of the classroom environment using Facebook, Tumbler, Twitter, MySpace, blogs, and other social media outlets. An unarticulated pedagogical experiment is being conducted as a result. Will the students share what they learned with others using social media outlets? I suspect that the 33 students that enrolled in the course will facilitate interests in feminist studies and the Venus Hottentot by sharing their assignments and webpages with an audience that extends our physical class. Will Venus studies go viral? Will their perspectives of feminist issues appear on webpages, email, tumblers, tweets, posts, and blogs that are read and visited by many Internet users around the world?

Jul 21

Literature Unbound: Networks of scholarly communication and knowledge creation in digital literary magazines

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Green, Harriett. English and Digital Humanities Librarian, University of Illinois at Urbana-Champaign

Title: Literature Unbound: Networks of scholarly communication and knowledge creation in digital literary magazines

Abstract: Online-only, or “born digital” literary magazines and journals are proliferating faster than ever before: Once considered transitory upstarts and publications of last resort, they are now a well-numbered and thriving branch of literary publishing that promotes a rich lode of literature from both emerging and lauded writers. This paper examines a selection of digital literary journals to analyze what the publication records reveal about the role and status of digital literary journals for scholarly communications and evaluative scholarship in creative writing and writing studies. The initial study presented in the paper examines a selection of digital literary journals that have been published on a regular basis for a minimum of the past five years. These titles include Blackbird, The Cortland Review, Mudlark, Painted Bride Quarterly, 2RiverView, and Cerise Press. Collected data analyzed in the paper includes the genres of works published in the journals, formats of the works published, the frequency of publication of the different genres of works, the affiliations of authors, and the structure of the editorial processes in each journal. The author will analyze this data to explore how the dissemination of literary works has been transformed by digital literary publishing: How do digital literary journals exploit their digital platforms to publish works in new types of formats? How does the journal’s publication frequency, distribution of genres, and affiliations of authors begin to reflect their status as a journal for disseminating a scholar’s creative and critical works? As part of this examination, the author will compare the publication frequency of these digital journals to a selection of established print journals over a similar timespan, including Antioch Review, Kenyon Review, Sewanee Review, Ploughshares, and Paris Review. The study’s analysis will then explore how these literary magazines’ editorial structures and processes establish themselves as legitimate arbiters and evaluators of creative literary scholarship. Ultimately, this study seeks to open a dialogue on how digital literary magazines are becoming established conduits in the networks of scholarly communication for creative writing and writing studies, and how their innovative publishing methods that exploit their digital media platforms challenge us to re-consider how writing is presented and consumed.

Jul 21

Employing Geospatial Genealogy to Reveal Residential and Kinship Patterns in a Pre-Holocaust Ukrainian Village

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Egbert, Stephen. Department of Geography, University of Kansas; Roekard, Karen. Independent Scholar

Title: Employing Geospatial Genealogy to Reveal Residential and Kinship Patterns in a Pre-Holocaust Ukrainian Village

Abstract: By incorporating data from a variety of historical records into geographic information systems (GIS), we are conducting research into visualizing what can be learned about residential and kinship patterns in the mixed-ethnic settlements of pre-Holocaust Eastern Europe. We have termed this process – the linkage of records traditionally used for family history research with GIS –“geospatial genealogy.” Our prototype is the town of Rawa Ruska, Ukraine, located on the Rata River near the Polish border. It was founded in the mid-fifteenth century and was a “mixed” town of Jews, Poles, and Ukrainians. Over time its governance shifted among Austria-Hungary, Poland, Nazi Germany, the USSR, and now Ukraine. During WWII the Jews of Rawa Ruska were murdered in various “actions” at nearby mass gravesites or gassed at the Belzec extermination camp, 14 kilometers away. Our reconstruction, based on an 1854 cadastral map, utilizes house numbers listed on the map and cross-references them as they are used elsewhere, e.g. in vital records, tax and residence rolls, Tabula register contracts, etc. from the late 1700s to the early 1900s. Thus, they provide a key link in establishing spatial patterns. Mapping residence patterns permits, for example, the examination of clustering or dispersion over time by ethnic group and relative wealth, or the degree of clustering around focal points such as the town square or places of worship.

Jul 21

Sounding it out: modeling orality for large-scale text collection analysis

Representing Knowledge in the Digital Humanities (Saturday, September 24, 2011)
Conference Schedule

Clement, Tanya. Assistant Professor, School of Information, University of Texas

Title: Sounding it out: modeling orality for large-scale text collection analysis

Abstract: Many scholars and poets have written about the remarkable experience of hearing Gertrude Stein’s texts read aloud. “Language poets” who emerged in the 1960s and 1970s and who form important scholarly communities today have adopted Stein as an early influence and a model. In part, the nature of this relationship has been ascribed to the indeterminacy and the manner of language play that Majorie Perloff and others see evinced in Stein’s writing, but the extent to which prosody and rhythm has also influenced these artists goes undocumented.

Further, very few scholars have had the means to investigate the speech patterns (whether African American or German or French) that may have influenced Stein. This paper will discuss a use-case study in which I am using data mining to examine clusters of patterns in Stein’s poetry and prose compared to those in non-fiction narratives and oral histories as well as those present in contemporary poetry. Taking advantage of pre-existing research and development with the Mellon-funded SEASR (The Software Environment for the Advancement of Scholarly Research) application, this work has included identifying OpenMary XML (a text-to-speech system that uses an internal XML-based representation language called MaryXML) output as a base analytic, producing a tabular representation of the data for clustering and predictive modeling that includes phonemic and syntactic elements, creating a routine in MEANDRE (a semantic-web-driven data-intensive flow execution environment) that produces this data and allows future users to produce similar results, and developing a user-interface for seeing these comparisons across collections of texts. Access to large-scale repositories of text opens larger questions about how literary scholars can use such repositories in their research. John F. Sowa writes in his seminal book on computational foundations, that theories of knowledge representation are particularly useful “for anyone whose job is to analyze knowledge about the real world and map it to a computable form” (xi). Similarly, Sowa notes that knowledge representation is unproductive if the logic and ontology which shape its application in a certain domain are unclear: “without logic, knowledge representation is vague, Sowa writes, “with no criteria for determining whether statements are redundant or contradictory,” and “without ontology, the terms and symbols are ill-defined, confused, and confusing” (xii). Knowledge representation is the work of all scholars in digital humanities and these scholars must help determine the logics and ontologies that shape how we access this data. Charles Bernstein has written that “[t]he relation of sound to meaning is something like the relation of the soul (or mind) to the body. They are aspects of each other, neither prior, neither independent (17). Scholars have not had the ability to analyze the features of text that correspond to orality—their phonemes and prosodic elements—much less compare these features with similar features across collections. To incorporate this kind of study in digital humanities, it is time we considered the logics and ontologies of orality in the computational environment.

Bernstein, Charles. Close Listening: Poetry and the Performed Word. Oxford University Press, 1998. Print.

Perloff, Marjorie. The Poetics of Indeterminacy: Rimbaud to Cage. Princeton, N.J: Princeton University Press, 1981. Print.

Sowa, John F. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA: Brooks Cole Publishing Co., 2000. Print.

Older posts «

Skip to toolbar