Although the structure of texts and the way they are produced and understood are active areas for research in a range of disciplines — linguistics, psychology, artificial intelligence and information science — this research has not yet produced convincing computer‐based solutions to the problem of producing abstracts of technical papers. Genuinely knowledge‐based approaches are probably furthest from producing practical results, because of the amount of background knowledge that they presuppose and because of the difficulty in principle of finding appropriate representations and processes. After reviewing the kinds of structure in abstracts that seem pertinent to the abstracting problem, we discuss a rule‐based approach that requires a relatively simple knowledge base, partitioned into two sets of rules: one for recognising textual fragments that explicitly signal the document topic, and another for detecting when referring expressions (especially pronouns) are linked to objects mentioned or evoked in a different sentence. These two rule sets are used in a method of exctracting from text to produce a paragraph which can function as an abstract, pioneered by Paice at the University of Lancaster. The second stage of the method is designed to ensure that sentences extracted in the first do not contain any references to parts of the text that have not been extracted, thus ensuring a minimal standard of coherence. In refining these methods, definite noun phases (those beginning with ‘the’) pose a particular problem, which is being addressed in current joint work by Paice and the author.
CitationDownload as .RIS
MCB UP Ltd
Copyright © 1990, MCB UP Limited