Prosopography and Computer Ontologies: towards a formal representation of the ‘factoid’ model by means of CIDOC-CRM

Pasin, Michele. Research Associate, Kings College, London

Title: Prosopography and Computer Ontologies: towards a formal representation of the ‘factoid’ model by means of CIDOC-CRM

View slides:

Abstract: Structured Prosopography provides a formal model for representing prosopography: a branch of historical research that traditionally has focused on the identification of people that appear in historical sources. Pre-digital print prosopographies, such as Martindale 1992, presented its materials as narrative articles about the individuals it contains. Since the 1990s, KCL’s Department of Digital Humanities (formerly known as Center for Computing in the Humanities) has been involved in the development of structured prosopographical databases, and has had direct involvement in Prosopographies of the Byzantine World (PBE and PBW), Anglo-Saxon England (PASE), Medieval Scotland (PoMS) and now more generally northern Britain (“Breaking of Britain”: BoB), and is currently in discussions about others. DDH has been involved in the development of a general “factoid-oriented” model of structure that although downplaying or eliminating narratives about people, has to a large extent served the needs of these various projects quite well.

DDH’s factoid-oriented prosopographical model are currently all expressed using the entity-attribute-relationship model of the relational database. The structure formally identifies obvious items of interest: Persons and Sources, and extends to related things like Offices or Places. In our prosopographical model the Factoid is a central idea and represents the spot in a primary source where something is said about one or more persons. It links people to the information about them via spots in primary sources that assert that information. By creating “factoids” which assert things about what the source says about people, the factoid approach prioritises the sources, and our historians’ reading of them. Our data about a person is not, then, so much a narrative that presents a summary written by the prosopographer (as it was in the articles about persons included in pre-digital prosopography) as a collection of information about what the sources say about him/her, and can represent the multiple, perhaps contradictory, voices of the different sources simultaneously: one saying she is a Saxon, but another saying that she was from Northumbria.

Bradley and Short (2005) has a more complete overview of the factoid model than what there is room for here. Recent development, particularly the WWW, and its related technologies around the Semantic Web have promoted the possibility to both interconnecting dispersed data, and allowing it to be queried semantically. Central technologies to support this approach are ontology languages such as RDF and OWL. Modelling the work of prosopography in a framework such as OWL is in many ways similar to relational database modelling: including the idea of classes and slots which correspond quite closely to entities and attributes in RDB modelling, and in the handling of relationships between data. So far, the DB structured approach has served our approach to prosopography well – producing digital resources that are well received by the research community, and the broader public. What, then, are the advantages of rendering our prosopographical material which is already highly structured into an ontology instead? First, ontology systems such as OWL, by being a part of the Semantic Web initiative, are designed to inter-operate between independent resources. Our recently started project called The Breaking of Britain – which will produce separate prosopographical databases of people in Scotland and Northern England, will be our first experience of the need for substantial linking and searching between them. Second, ontology systems provide a mechanism to engage more heavily first order logic in the search. (There is something of an introduction to this issue in Zöllner-Weber 2009, although in the field of literary scholarship). First order logic could help with genealogical data (where an asserting that A was son of B, also can be interpreted as stating that B was mother or father of A), and can facilitate the management of the complex date searching needs of materials from Medieval times. Finally, we expect that a full ontological expression of our approach to prosopography will provide a richer and more transparent formal expression of the semantics attached to our approach than we have at present. As a result, such a prosopographical ontology will embody a shared conceptualisation of the field useful to both computers and people (cf. Gruber, 2000).

One of the central approaches to modelling in the Semantic Web world is to develop a model that contains elements of other, compatible, schemas. Borrowed elements must not only match structurally, but the semantics of the classes and slots in the shared model have to match conceptually as well. At present, the existing ontology that we know about that best matches our interests is CIDOC-CRM (CIDOC 2006 and Doerr 2003), and it has been mapped to both RDFS and OWL (Schiemann 2010) – but it is aimed at the needs of the museum and archive community as a way of representing cultural heritage materials.
CIDOC-CRM has several advantages as the base for a prosopographical ontology: it is sympathetic with an historical view of the materials it represents, and identifies relevant entities: including persons, places, events and sources. Page 11 of the CIDOC-CRM specification points out that “The CRM does not propose a specific form to support reasoning about possible identity”, but we are not asking the ontology for these projects to do that. The identity assertion is the work of our historian partners, and the point of the ontology is not to derive the identity of persons from the ontological-expressed data for us, but merely to express what the historians assert about their materials (including assertion of identity of individuals) in ways that support sophisticated searching.

There has been a stream of argument about the black-and-white nature of assertions made through computer ontologies, implying that this bi-polar nature is a significant flaw when applied to Humanities materials. Indeed, we expect that much of the discussion about computer ontologies – centered often on still the relatively simple problems within science and engineering (see, for example, the discussion in Gruber 1993) that have been used as examples – have put Humanists (even Digital Humanists) off. Veltman 2004 provides this kind of argument when he claims that the preservation of culture requires the dealing of changing meanings over different places and times, and that computer ontologies try to “create data structures that assume a single world-view”. (p. 7). Now, this would indeed be a significant concern and, not accidently, fit with, say, Louch’s (Louch 1969) reasoning about why narrative, with its subtlety of expression, remains for many historians the main vehicle for research output. However the story need not be as pessimistic as Veltman seems to believe, since ontology modelling need not mean that a single view of the material is an inevitable result.

Indeed, our factoid approach can show that formal structuring if designed correctly need not impose, as Veltman implies, a single perspective on the data it models, but is capable of accommodating a range of views from the different sources. In the factoid model statements about the people are not made baldly: “Alfred the Great learned his Latin from Plegmund, Asser, Grimbald and John”. Instead, the statement records what a contemporary source says: “The source of the West-Saxon Version of Pope Gregory the Great’s ‘Pastor Care’ (section 7.18) asserts that Alfred the Great learned his Latin from Plegmund …”. By introducing the source as an intermediary, the model can also accommodate contradictory statements from different sources.

CIDOC provides classes meant to facilitate the representation of this iatus between a fact and the reading (expressed by the source, or by us in virtue of our editorial role) of that fact. On page 8 of the CIDOC specifications, the E13-Attribute Assignment class is presented as what allows the “documentation of how the respective assignment came about, and whose opinion it was”. By using this class all the “properties assigned in such an action can also be seen as directly attached to the respective item or concept, possibly as a collection of contradictory values”. Examples of how to employ this mechanism to the end of constructing a reified description of reality can also be found in the context of the ‘exhibition problem’ discussion (Eide, 2009). An alternative solution could be instead extending CIDOC by using ad hoc classes; in the context of building ontologies for the philosophical domain, an approach of this kind has been previously documented by one of the authors (Pasin, 2009, p.7). Here a purposedly created ‘Interpretation Event’ class extends CIDOC and allows different users to organize networks of philosophical ideas in a subjective manner. Despite some few early examples available, the issue of how to properly map our structured factoid model into an OWL ontology based on CIDOC-CRM remains quite a difficult and interesting one. In this paper we will report in some detail on our most recent attempts to address this problem, by reviewing the pros and cons of the existing approaches and exploring in some depth aspects of the mapping that required significant enrichment or extension of CIDOC to accomplish.

Bradley, John and Harold Short (2005). “Texts into databases: the Evolving Field of New-style Prosopography”. In Literary and Linguistic Computing Vol. 20 Suppl. 1:3-CIDOC 2006. The CIDOC Conceptual Reference Model. International Council of Museums.

Doerr, Martin (2003). “The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata”. In AI Magazine Vol. 24 No. 3. Online version available at

Eide, Oyvinde (2008). “The Exhibition Problem. A Real-life Example with a Suggested Solution”. In Literary and Linguistic Computing vol. 23 (1) pp. 27-37

Gruber, Tom (1993). “Towards Principles for the Design of Ontologies Used for Knowledge Sharing”. In Nicola Guarino and Roberto Poli (eds.) Formal Ontology in Conceptual Analysis and Knowledge Representation. Kluwer Academic publishers.

Gruber, Tom (2000). “Every Ontology is a Treaty”. Interview for Semantic Web and Information Systems SIG of the Association for Information Systems. SIGSEMIS Bulletin vol. 1 (3).

Louch, Alfred .R. (1969). “History as Narrative”. In History and Theory Vol. 8 No 1. pp. 54-70.

Martindale, J.R. (1992). The Prosopography of the Later Roman Empire, 3: A.D. 527-641. Cambridge: Cambridge University Press. 1992.

Pasin, Michele and Motta, Enrico (2009). “Ontological Requirements for Annotation and Navigation of Philosophical Resources”. In Synthese. Online at

Schiemann, Bernard, Martin Oischinger and Günther Görz (2010). Erlanger CRM/OWL. Online at

Veltman, Kim H. (2004). “Towards a Semantic Web for Culture”. In Journal of Digital Information. Vol 4 No 4. Online at

Zöllner-Weber, Amélie (2009). “Ontologies and Logic Reasoning as Tools in Humanities”. In Digital Humanities Quarterly Vol 3 No 4. Online at

Representing Geographic Knowledge: Opportunities and Challenges from the Atlanta Maps Project at Emory University

Page, Michael. Geospatial Coordinator, Robert W. Woodruff Library, Emory University;
Varner, Stewart. Digital Scholarship Coordinator, Robert W. Woodruff Library, Emory University

Title: Representing Geographic Knowledge: Opportunities and Challenges from the Atlanta Maps Project at Emory University

Abstract: Printed maps have long been a means to take a survey of an area, create inventories, and provide tools for navigation or reference. Maps create superficial representations of a space but they often tacitly record the more complex social and political history of a place in the process. As a result, maps are attractive scholarly resources for a wide range of researchers. Emerging geographic technology such as GIS offers new opportunities to humanities scholars interested in understanding how meaning is created, and contested, spatially. For example, scholars are now able to represent spatially situated changes overtime in a much clearer way than what would have been possible before. In doing so, these maps may reveal information that was hidden in text-based scholarship. Furthermore, geolocation projects are able to grow and evolve in ways that more static resources never could. However, GIS also presents some important challenges. No matter how dynamic and interactive maps becomes they will always be representations. As such, they will always highlight some aspects of a place while neglecting others. A city is a chaotic, organic space where power and resistance to power shape each other and the spaces they occupy. A map, on the other hand, organizes spaces and freezes the chaos at particular moments and from particular perspectives. The ability of a map to present the city as a controlled and orderly space is what makes it both useful and potentially deceptive. The scientific authority of GIS adds to the danger that the line between geographic data and politically situated perspectives could be blurred.

This presentation will illustrate these opportunities and challenges with experiences gained from developing a digital map of Atlanta at the Emory University Libraries. Using GIS to produce a rich digital representation of Atlanta, Georgia, the first phase of this project produced a digital map based on an atlas of the city from 1928. The map is so intricately detailed it includes everything from roads and railways to building footprints and manhole covers. Because construction all but ceased in the city shortly after this atlas was published due to, first, Great Depression, and then by World War Two, the map provides a relatively reliable image of the city as it was for the two decades leading up to the Civil Rights Era. Using this digital map as a foundation, the second phase of the project will involve building a geo-database in which the geometric features will be given both descriptors (name, type, ownership, etc.) and linkages to other digital objects (photographs, audio, maps, etc.). For example, the library holds extensive historical records from a local African American funeral home. These records document the address, age and cause of death for thousands of individuals. By representing this information geographically on the digital map, scholars may be able to find new patterns that illuminate a relationship between utility infrastructure and public health or enhance our understanding of racial segregation. Eventually, this map could be used to expose a wide variety of library resources and enhance numerous research projects. We hope that our presentation will inspire similar projects, elicit suggestions for improvement and become part of a discussion about the proper use of GIS in humanities work.

Viral Venuses: The Potential of Digital Pedagogy in Feminist Classrooms

Hill, DaMaris. Doctoral Student, English-Creative Writing Program, University of Kansas

Title: Viral Venuses: The Potential of Digital Pedagogy in Feminist Classrooms

View presentation (Prezi)

Abstract: The legacy of Baartman’s exploitation has yet to be resolved by the reemergence of humanist values or the illusions of gender neutral digital environments. In 2008 the National Endowment for the Humanities defined digital humanities as an umbrella term used to describe the different activities surrounding technology and humanities scholarship. A digital humanities approach to this humanities based feminist studies course seemed eminent, particularly when one considers the influence of social media and digital mediums on popular culture. Teaching this course challenged me to align libratory pedagogy, feminist instructional theory, and digital technology, digi-feminist pedagogical practice.

This course aims to understand how the body –the female body in particular –has figured into philosophy, cultural studies, history, literature, and visual culture to include digital spaces. The course includes analysis of how these standards of beauty change across time and cultural groups, and the impact of these standards on women as individuals and on social and political outcomes. The connections between social media and physical appearance are easily recognized by many students. Social media outlets feature large quantities of photographs and images that influence standards of beauty on a very personal level. Additionally, digital devices are largely considered as accessories that accentuate attractiveness and act as indicators of personality in our culture. The Internet, instantly accessible via cell phone, laptop and iPad, has become the initial site of exploration and research for most students. One of the pedagogical aims of the course is to facilitate learning and generate outcomes in digital forms. In addition to class discussion and small group exploration of feminist issues, some of the digital tools used to facilitate learning include:

• Spencer Art Museum Digital Archives
• Individual online course conferences
• Blackboard
• YouTube Clips and live streams
• Final projects facilitated using webpages • Electronic database searches
• Digital music and audio files
• Modified identity boxes, Photoshop image design
• E-readers (Nook, iPad, ect.)

The assignments are in digital formats and the final project includes developing a webpage.

Therefore, the assignments are easily shared with peer groups outside of the classroom environment using Facebook, Tumbler, Twitter, MySpace, blogs, and other social media outlets. An unarticulated pedagogical experiment is being conducted as a result. Will the students share what they learned with others using social media outlets? I suspect that the 33 students that enrolled in the course will facilitate interests in feminist studies and the Venus Hottentot by sharing their assignments and webpages with an audience that extends our physical class. Will Venus studies go viral? Will their perspectives of feminist issues appear on webpages, email, tumblers, tweets, posts, and blogs that are read and visited by many Internet users around the world?

Literature Unbound: Networks of scholarly communication and knowledge creation in digital literary magazines

Green, Harriett. English and Digital Humanities Librarian, University of Illinois at Urbana-Champaign

Title: Literature Unbound: Networks of scholarly communication and knowledge creation in digital literary magazines

Abstract: Online-only, or “born digital” literary magazines and journals are proliferating faster than ever before: Once considered transitory upstarts and publications of last resort, they are now a well-numbered and thriving branch of literary publishing that promotes a rich lode of literature from both emerging and lauded writers. This paper examines a selection of digital literary journals to analyze what the publication records reveal about the role and status of digital literary journals for scholarly communications and evaluative scholarship in creative writing and writing studies. The initial study presented in the paper examines a selection of digital literary journals that have been published on a regular basis for a minimum of the past five years. These titles include Blackbird, The Cortland Review, Mudlark, Painted Bride Quarterly, 2RiverView, and Cerise Press. Collected data analyzed in the paper includes the genres of works published in the journals, formats of the works published, the frequency of publication of the different genres of works, the affiliations of authors, and the structure of the editorial processes in each journal. The author will analyze this data to explore how the dissemination of literary works has been transformed by digital literary publishing: How do digital literary journals exploit their digital platforms to publish works in new types of formats? How does the journal’s publication frequency, distribution of genres, and affiliations of authors begin to reflect their status as a journal for disseminating a scholar’s creative and critical works? As part of this examination, the author will compare the publication frequency of these digital journals to a selection of established print journals over a similar timespan, including Antioch Review, Kenyon Review, Sewanee Review, Ploughshares, and Paris Review. The study’s analysis will then explore how these literary magazines’ editorial structures and processes establish themselves as legitimate arbiters and evaluators of creative literary scholarship. Ultimately, this study seeks to open a dialogue on how digital literary magazines are becoming established conduits in the networks of scholarly communication for creative writing and writing studies, and how their innovative publishing methods that exploit their digital media platforms challenge us to re-consider how writing is presented and consumed.

Employing Geospatial Genealogy to Reveal Residential and Kinship Patterns in a Pre-Holocaust Ukrainian Village

Egbert, Stephen. Department of Geography, University of Kansas; Roekard, Karen. Independent Scholar

Title: Employing Geospatial Genealogy to Reveal Residential and Kinship Patterns in a Pre-Holocaust Ukrainian Village

Abstract: By incorporating data from a variety of historical records into geographic information systems (GIS), we are conducting research into visualizing what can be learned about residential and kinship patterns in the mixed-ethnic settlements of pre-Holocaust Eastern Europe. We have termed this process – the linkage of records traditionally used for family history research with GIS –“geospatial genealogy.” Our prototype is the town of Rawa Ruska, Ukraine, located on the Rata River near the Polish border. It was founded in the mid-fifteenth century and was a “mixed” town of Jews, Poles, and Ukrainians. Over time its governance shifted among Austria-Hungary, Poland, Nazi Germany, the USSR, and now Ukraine. During WWII the Jews of Rawa Ruska were murdered in various “actions” at nearby mass gravesites or gassed at the Belzec extermination camp, 14 kilometers away. Our reconstruction, based on an 1854 cadastral map, utilizes house numbers listed on the map and cross-references them as they are used elsewhere, e.g. in vital records, tax and residence rolls, Tabula register contracts, etc. from the late 1700s to the early 1900s. Thus, they provide a key link in establishing spatial patterns. Mapping residence patterns permits, for example, the examination of clustering or dispersion over time by ethnic group and relative wealth, or the degree of clustering around focal points such as the town square or places of worship.

Sounding it out: modeling orality for large-scale text collection analysis

Clement, Tanya. Assistant Professor, School of Information, University of Texas

Title: Sounding it out: modeling orality for large-scale text collection analysis

Abstract: Many scholars and poets have written about the remarkable experience of hearing Gertrude Stein’s texts read aloud. “Language poets” who emerged in the 1960s and 1970s and who form important scholarly communities today have adopted Stein as an early influence and a model. In part, the nature of this relationship has been ascribed to the indeterminacy and the manner of language play that Majorie Perloff and others see evinced in Stein’s writing, but the extent to which prosody and rhythm has also influenced these artists goes undocumented.

Further, very few scholars have had the means to investigate the speech patterns (whether African American or German or French) that may have influenced Stein. This paper will discuss a use-case study in which I am using data mining to examine clusters of patterns in Stein’s poetry and prose compared to those in non-fiction narratives and oral histories as well as those present in contemporary poetry. Taking advantage of pre-existing research and development with the Mellon-funded SEASR (The Software Environment for the Advancement of Scholarly Research) application, this work has included identifying OpenMary XML (a text-to-speech system that uses an internal XML-based representation language called MaryXML) output as a base analytic, producing a tabular representation of the data for clustering and predictive modeling that includes phonemic and syntactic elements, creating a routine in MEANDRE (a semantic-web-driven data-intensive flow execution environment) that produces this data and allows future users to produce similar results, and developing a user-interface for seeing these comparisons across collections of texts. Access to large-scale repositories of text opens larger questions about how literary scholars can use such repositories in their research. John F. Sowa writes in his seminal book on computational foundations, that theories of knowledge representation are particularly useful “for anyone whose job is to analyze knowledge about the real world and map it to a computable form” (xi). Similarly, Sowa notes that knowledge representation is unproductive if the logic and ontology which shape its application in a certain domain are unclear: “without logic, knowledge representation is vague, Sowa writes, “with no criteria for determining whether statements are redundant or contradictory,” and “without ontology, the terms and symbols are ill-defined, confused, and confusing” (xii). Knowledge representation is the work of all scholars in digital humanities and these scholars must help determine the logics and ontologies that shape how we access this data. Charles Bernstein has written that “[t]he relation of sound to meaning is something like the relation of the soul (or mind) to the body. They are aspects of each other, neither prior, neither independent (17). Scholars have not had the ability to analyze the features of text that correspond to orality—their phonemes and prosodic elements—much less compare these features with similar features across collections. To incorporate this kind of study in digital humanities, it is time we considered the logics and ontologies of orality in the computational environment.

Bernstein, Charles. Close Listening: Poetry and the Performed Word. Oxford University Press, 1998. Print.

Perloff, Marjorie. The Poetics of Indeterminacy: Rimbaud to Cage. Princeton, N.J: Princeton University Press, 1981. Print.

Sowa, John F. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA: Brooks Cole Publishing Co., 2000. Print.

Materiality and Meaning in Digital Poetics

Buchsbaum, Julianne. Humanities Librarian, University of Kansas

Title: Materiality and Meaning in Digital Poetics

Abstract: Poetry is highly self-reflexive, even hyperverbal, in its construction. Poets pay a great deal of attention to the formal properties of language—its textures, rhythms, graphical representation, materiality (the way words sound, the “mouthfeel” of words), down to the level of the actual syllable and phoneme—all of these nonsemantic aspects of linguistic signs (the aesthetic forms they take) inform a poem’s construction and its process of meaning-making. Therefore, a poem is always, in a sense, revealing, disclosing, or calling attention to its constructedness. Some poems try to veil their constructedness, by being “transparent,” but modernist and postmodernist works tend to be more self-conscious in calling attention to their writtenness as verbal constructs. Therefore, poetry, as a practice, is helpful for understanding the ways writing and meaning-making change in a digital medium. What does it mean to “write” in a digital medium? When the tools of one’s medium are constrained and/or liberated by bits and bytes, zeroes and ones, by the plasticity and multispatiality of cyperspace? “Materiality” is a term that has been used to write about digital texts since the 1990s by at least a few critics of new media. What exactly, though, is meant by the “materiality” of new computer media? How can digital poems even be said to be “material” at all, as opposed to analog, print-based works?

One might claim that the elements of cyberspace actually enter into and inform the production and reception of digital texts, that they change the very nature of those texts. Are the seemingly behind-the-scenes codes and web addresses of a page on which a digital poem is published actually part of that poem and inseparable from it? In the same way as the spine of a book, its binding, its page numbers, its index and table of contents, the font of its type, page format, etc., inform (to an extent) the experience of reading a poem in a print-based book of poems? In the “text-environment” of a digital poem, what is meant by materiality when there is no corresponding physical, palpable artifact that exists behind the work in the extensional world? Are we, in fact, looking at the demediation or dematerialization of physical culture? Without a material substratum, can we in fact speak of a human body’s interaction with technology? Can we translate a kind of materialist hermeneutics into the digital realm? In this presentation, I will examine the assumption that simply because we cannot reach out and touch a digital text object, electronic objects are disembodied and immaterial. I propose to take into consideration what it means to treat a digital text object from a textual-material angle, taking apart the anatomy of one or more pieces of born-digital writing and analyzing them from the perspectives of platform, interface, data standards, file formats, operating systems, versions and distributions of code, etc., keeping in mind that they are artifacts subject to embedded, historical, localized modes of understanding. I propose to speak not only of the “tiny junctures of silicon and metal,” but also of how exactly encoded data is always literally situated or embedded in a material site.

The Graphic Visualization of XML Documents

Birnbaum, David. Professor and Chair, Dept. of Slavic Languages and Literatures, University of Pittsburgh

Title: The Graphic Visualization of XML Documents

Abstract: This presentation describes the graphic visualization of XML documents in several projects in order to support philological research in the humanities. In many cases information that may not be easily accessible when the data is viewed in textual format (even with the benefit of markup) emerges strikingly when the marked-up prose is transformed, using XML tools, into a graphic representation. Furthermore, the derived graphic representations can be interwoven with more traditional textual ones in an interactive “workstation” that allows researchers to move easily among textual and graphic views as a way of researching and interrogating the content.

Fan Curation on the Internet

Baym, Nancy. Associate Professor, Communication Studies, University of Kansas

Title: Fan Curation on the Internet

Abstract: Audiences have always collected and codified information and expertise about the things they love, but the networked and persistent nature of online communication have given them new ways to do this. This talk will identify the kinds of curation fans are doing with an eye toward the complexity of understanding and preserving these sites for scholarly purposes.

Local-grammar Based approach to the recognition of variants of Loanwords

Poster Session

Frej, Mohamed. Student, Hankuk University of Foreign Studies, South Korea

Title: Local-grammar Based approach to the recognition of variants of Loanwords

Abstract: Many studies have investigated the role Loanwords play in second language learning. While English loanwords can be considered as an effective tool in teaching Korean to speakers of other languages, there are some problems connected with the variation of the spelling of English Loanwords. Even though there is an official norm imposed by the Korean government about the transliteration of loanwords in Korean, we observe people use, especially in internet documents, many variants of the standard spelling of loanwords. The variant spellings of loanwords are idiosyncratic phenomena that are problematic not only for natural language processing applications, but also for second language learners who get confused about the right spelling of a given loanword. This would hamper their second language learning process. In this paper, to account for this problem, we propose a finite-state methodology named Local-Grammar Graph (LGG) to describe and recognize these various spellings of loanwords. Local grammar graphs consist of two parts; the input and the output. We describe all possible variations in input paths to assign them into a standard spelling of the word in output path of the finite-state graph. One example of those graphs is the following: [graph provided at exhibit].

This graph can be used to describe and recognize all the possible variants of the loanword 파운데이션. It exactly accounts for 32 forms of the same word. Therefore, LGGs are definitely more effective and less time-consuming than having to describe those variations one by one in a list form. Unitex system (Paumier 2003) which has been developed to transform the LGGs into finite-state transducers, to be integrated in E-learning systems, will offer an adequate environment for this work. Finally, the methodology we present here may be applied on other languages.


Cheon, S.-M. (2008). A study of English Loanwords In Korean. Seoul: KSi. ISBN 978-89-534-7946-3

Gross, Maurice. (1997) the Construction of Local Grammars. In Finite-State Language Processing, E. Roche & Y. Schabès (eds.), Language, Speech, and Communication, Cambridge, Mass.: MIT Press, pages 329-354

Nam J. S. & Choi, K. S. 1997. Local-grammar based approach to proper noun recognition. Beijing.

