ABSTRACT
We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discourse-specific applications.
- Bruce Britton and John Black. 1985. Understanding Expository Text. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
- Jill Burstein, Daniel Marcu, Slava Andreyev, and Martin Chodorow. 2001. Towards automatic identification of discourse elements in essays. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France. Google ScholarDigital Library
- Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko, Gwyneth Doherty-Sneddon, and Anne Anderson. 1997. The reliability of a dialogue structure coding scheme. Computational Linguistics 23(1): 13--32. Google ScholarDigital Library
- Giacomo Ferrari. 1998. Preliminary steps toward the creation of a discourse and text resource. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC 1998), Granada, Spain, 999--1001.Google Scholar
- Giovanni Flammia and Victor Zue. 1995. Empirical evaluation of human performance and agreement in parsing discourse constituents in spoken dialogue. In Proceedings of the 4th European Conference on Speech Communication and Technology, Madrid, Spain, vol. 3, 1965--1968.Google Scholar
- Roger Garside, Steve Fligelstone and Simon Botley. 1997. Discourse Annotation: Anaphoric Relations in Corpora. In Corpus annotation: Linguistic information from computer text corpora, edited by R. Garside, G. Leech, and T. McEnery. London: Longman, 66--84.Google Scholar
- Talmy Givon. 1983. Topic continuity in discourse. In Topic Continuity in Discourse: a Quantitative Cross-Language Study. Amsterdam/Philadelphia: John Benjamins, 1--41.Google Scholar
- Joseph Evans Grimes. 1975. The Thread of Discourse. The Hague, Paris: Mouton.Google Scholar
- Barbara Grosz and Candice Sidner. 1986. Attentions, intentions, and the structure of discourse. Computational Linguistics, 12(3): 175--204. Google ScholarDigital Library
- Marti Hearst. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1): 33--64. Google ScholarDigital Library
- Julia Hirschberg and Diane Litman. 1993. Empirical studies on the disambiguation of cue phrases. Computational Linguistics 19(3): 501--530. Google ScholarDigital Library
- Eduard Hovy. 1993. Automated discourse generation using discourse structure relations. Artificial Intelligence 63(1-2): 341--386. Google ScholarDigital Library
- Klaus Krippendorff. 1980. Content Analysis: An Introduction to its Methodology. Beverly Hills, CA: Sage Publications.Google Scholar
- Geoffrey Leech, Tony McEnery, and Martin Wynne. 1997. Further levels of annotation. In Corpus Annotation: Linguistic Information from Computer Text Corpora, edited by R. Garside, G. Leech, and T. McEnery. London: Longman, 85--101.Google Scholar
- Robert Longacre. 1983. The Grammar of Discourse. New York: Plenum Press.Google Scholar
- William Mann and Sandra Thompson. 1988. Rhetorical structure theory. Toward a functional theory of text organization. Text, 8(3): 243--281.Google ScholarCross Ref
- William Mann and Sandra Thompson, eds. 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-raising Text. Amsterdam/Philadelphia: John Benjamins.Google Scholar
- Daniel Marcu. 2000. The Theory and Practice of Discourse Parsing and Summarization. Cambridge, MA: The MIT Press. Google ScholarDigital Library
- Daniel Marcu, Estibaliz Amorrortu, and Magdelena Romera. 1999. Experiments in constructing a corpus of discourse trees. In Proceedings of the ACL Workshop on Standards and Tools for Discourse Tagging, College Park, MD, 48--57.Google Scholar
- Daniel Marcu, Lynn Carlson, and Maki Watanabe. 2000. The automatic translation of discourse structures. Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, 9--17. Google ScholarDigital Library
- Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank, Computational Linguistics 19(2), 313--330. Google ScholarDigital Library
- Bonnie Meyer. 1985. Prose Analysis: Purposes, Procedures, and Problems. In Understanding Expository Text, edited by B. Britton and J. Black. Hillsdale, NJ: Lawrence Erlbaum Associates, 11--64.Google Scholar
- Johanna Moore. 1995. Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. Cambridge, MA: MIT Press. Google ScholarDigital Library
- Johanna Moore and Cecile Paris. 1993. Planning text for advisory dialogues: capturing intentional and rhetorical information. Computational Linguistics 19(4): 651--694. Google ScholarDigital Library
- Megan Moser and Johanna Moore. 1995. Investigating cue selection and placement in tutorial discourse. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, 130--135. Google ScholarDigital Library
- Tadashi Nomoto and Yuji Matsumoto. 1999. Learning discourse relations with active data selection. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, 158--167.Google Scholar
- Rebecca Passonneau and Diane Litman. 1997. Discourse segmentation by human and automatic means. Computational Linguistics 23(1): 103--140. Google ScholarDigital Library
- Marie-Paule Pery-Woodley and Josette Rebeyrolle. 1998. Domain and genre in sublanguage text: definitional microtexts in three corpora. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC-1998), Granada, Spain, 987--992.Google Scholar
- Livia Polanyi. 1988. A formal model of the structure of discourse. Journal of Pragmatics 12: 601--638.Google ScholarCross Ref
- Livia Polanyi. 1996. The linguistic structure of discourse. Center for the Study of Language and Information. CSLI-96-200.Google Scholar
- Josette Rebeyrolle. 2000. Utilisation de contextes définitoires pour l'acquisition de connaissances à partir de textes. In Actes Journées Francophones d'Ingénierie de la Connaissance (IC'2000), Toulouse, IRIT, 105--114.Google Scholar
- Harvey Sacks, Emmanuel Schegloff, and Gail Jefferson. 1974. A simple systematics for the organization of turntaking in conversation. Language 50: 696--735.Google ScholarCross Ref
- Sidney Siegal and N. J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.Google Scholar
- Beth Sundheim. 1995. Overview of results of the MUC-6 evaluation. In Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 13--31. Google ScholarDigital Library
- Benjamin K. T'sou, Tom B. Y. Lai, Samuel W. K. Chan, Weijun Gao, and Xuegang Zhan. 2000. Enhancement of Chinese discourse marker tagger with C.4.5. In Proceedings of the Second Chinese Language Processing Workshop, Hong Kong, 38--45. Google ScholarDigital Library
- Teun A. Van Dijk and Walter Kintsch. 1983. Strategies of Discourse Comprehension. New York: Academic Press.Google Scholar
- Ellen Voorhees and Donna Harman. 1999. The Eighth Text Retrieval Conference (TREC-8). NIST Special Publication 500--246.Google Scholar
- Charles Wayne. 2000. Multilingual topic detection and tracking: successful research enabled by corpora and evaluation. In Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece, 1487--1493.Google Scholar
- Janyce Wiebe, Rebecca Bruce, and Thomas O'Hara. 1999. Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. College Park, MD, 246--253. Google ScholarDigital Library
- Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory
Recommendations
Towards Building Vietnamese Discourse Treebank
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyDiscourse analysis is an important natural language processing task. There are many discourse parsers in many languages, such as English and Chinese, constructing discourse trees from text documents for further semantic analysis. However, there is no ...
Graph representations of discourse structure
The aim of this paper is to introduce a new method to represent discourse structures. This approach is based on three discourse theories in order to highlight three discourse features: cohesion, coherence and intentionality. The graph representations of ...
Attention, intentions, and the structure of discourse
In this paper we explore a new theory of discourse structure that stresses the role of purpose and processing in discourse. In this theory, discourse structure is composed of three separate but interrelated components: the structure of the sequence of ...
Comments