Search results

1 – 10 of over 65000
Article
Publication date: 1 February 1992

ROBERT N. ODDY, ELIZABETH DUROSS LIDDY, BHASKARAN BALAKRISHNAN, ANN BISHOP, JOSEPH ELEWONONI and EILEEN MARTIN

This paper is an exploratory study of one approach to incorporating situational information into information retrieval systems, drawing on principles and methods of discourse…

Abstract

This paper is an exploratory study of one approach to incorporating situational information into information retrieval systems, drawing on principles and methods of discourse linguistics. A tenet of discourse linguistics is that texts of a specific type possess a structure above the syntactic level, which follows conventions known to the people using such texts to communicate. In some cases, such as literature describing work done, the structure is closely related to situations, and may therefore be a useful representational vehicle for the present purpose. Abstracts of empirical research papers exhibit a well‐defined discourse‐level structure, which is revealed by lexical clues. Two methods of detecting the structure automatically are presented: (i) a Bayesian probabilistic analysis; and (ii) a neural network model. Both methods show promise in preliminary implementations. A study of users' oral problem statements indicates that they are not amenable to the same kind of processing. However, from in‐depth interviews with users and search intermediaries, the following conclusions are drawn: (i) the notion of a generic research script is meaningful to both users and intermediaries as a high‐level description of situation; (ii) a researcher's position in the script is a predictor of the relevance of documents; and (iii) currently, intermediaries can make very little use of situational information. The implications of these findings for system design are discussed, and a system structure presented to serve as a framework for future experimental work on the factors identified in this paper. The design calls for a dialogue with the user on his or her position in a research script and incorporates features permitting discourse‐level components of abstracts to be specified in search strategies.

Details

Journal of Documentation, vol. 48 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 May 1997

María J. López‐huertas

The need for thesauri to help users in their search for information in online information systems has been discussed for several decades. Many wide‐ranging contributions have been…

1175

Abstract

The need for thesauri to help users in their search for information in online information systems has been discussed for several decades. Many wide‐ranging contributions have been made to solve this problem. Nevertheless, investigation is needed to design a thesaurus structure based on what is relevant for users and generators of information within a specific subject domain. This paper explores the possibility of creating a thesaurus from the cognitive viewpoint. This approach is based on a system (in this case represented by a thesaurus) that organises its representation of knowledge or its classification as closely as possible to the authors‘ and users’ images of the subject domain with the objective of increasing the interaction between users and texts, and thus the communication in a given information retrieval system. From this point of view, the thesaurus structure is considered as the essential foundation on which to base such an interactive thesaurus. Furthermore, this structure is conceived as representing the merging point for both the generators‘ and the users’ models of the subject domain and for their information needs. This paper is dedicated mainly to the generators‘ side involved in this process. It demonstrates how an author’s writings can be used to identify the generators‘ model and perception of the subject domain, and how these can later be inserted in the thesaurus structure. Discourse analysis is used as a main method to identify the categories and its relevance for building such a structure is discussed. It also outlines a general approach for the user side to set up different methods of getting the users’ information needs into the thesaurus structure.

Details

Journal of Documentation, vol. 53 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 10 July 2019

(Mark) Feng Teng

This study aims to examine the writing outcomes of 6th-grade students learning English as a second language.

1130

Abstract

Purpose

This study aims to examine the writing outcomes of 6th-grade students learning English as a second language.

Design/methodology/approach

In all 45 students in a text structure instruction (TSI) group were compared with 45 students in a self-regulated strategy instruction (SRSI) group and 43 students receiving traditional writing instruction. SRSI was adapted from the self-regulated strategy development (SRSD) model (MacArthur et al., 2015). The SRSD model includes self-regulation writing strategies, text and genre knowledge and think-aloud modeling. Findings allowed for a comparison of TSI and SRSI, in which organization knowledge does not need to be taught using SRSD methods. Measures of writing outcomes, including writing quality and summarization of main ideas, were administered after a one-month intervention.

Findings

Results revealed that, compared with traditional instruction, the TSI and SRSI groups each exhibited better writing outcomes. Compared with the traditional instruction group, each technique had a unique impact: SRSI on writing quality, and TSI on main ideas included in written summaries. Linguistic and textual analyses of students’ writing revealed that the TSI and SRSI group learners both demonstrated high syntactic complexity, content organization and lexical variation in their compositions.

Research limitations/implications

The present study provides empirical evidence that explicit teaching of SRSI writing strategies or TSI can be implemented effectively and elicit gains in elementary school L2 learners’ written output. A clear division does not exist between self-regulated writing strategies and text structure knowledge; the two techniques should be complementary, as suggested in the earlier SRSD model.

Originality/value

Classroom-based research has addressed the need to enhance self-regulated capacity in writing. However, writing has become more challenging for primary school learners. In addition, writing is a cognitively demanding process. The plethora of processes involved in writing may be one of the factors that caused difficulties in writing. Thus, writing proficiency relies on the development of text structure knowledge and the fostering of self-regulation capabilities.

Details

English Teaching: Practice & Critique, vol. 18 no. 3
Type: Research Article
ISSN: 1175-8708

Keywords

Article
Publication date: 1 February 1994

MARÍA PINTO MOLINA

Content analysis, restricted within the limits of written textual documents (wtdca), is a field which is greatly in need of extensive interdisciplinary research. This would…

Abstract

Content analysis, restricted within the limits of written textual documents (wtdca), is a field which is greatly in need of extensive interdisciplinary research. This would clarify certain concepts, especially those concerned with ‘text’, as a new central nucleus of semiotic research, and ‘content’, or the informative power of text. The objective reality (syntax) of the written document should be, in the cognitive process that all content analysis entails, interpreted (semantically and pragmatically) in an intersubjective manner with regard to the context, the analyst's knowledge base and the documentary objectives. The contributions of semiolinguistics (textual), logic (formal) and psychology (cognitive) are fundamental to the conduct of these activities. The criteria used to validate the results obtained complete the necessary conceptual reference panorama.

Details

Journal of Documentation, vol. 50 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 October 2000

Marie‐Francine Moens and Jos Dumortier

Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars…

Abstract

Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars for abstracting texts in unlimited subject domains. We developed a system that parses texts based on the text grammar of a specific text type and that extracts sentences and statements which are relevant for inclusion in the abstracts. The system employs knowledge of the discourse patterns that are typical of news stories. The results are encouraging and demonstrate the importance of discourse structures in text summarisation.

Details

Journal of Documentation, vol. 56 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 16 July 2021

Young Man Ko, Min Sun Song and Seung Jun Lee

This study aims to develop metadata of conceptual elements based on the text structure of research articles on Korean studies, to propose a search algorithm that reflects the…

Abstract

Purpose

This study aims to develop metadata of conceptual elements based on the text structure of research articles on Korean studies, to propose a search algorithm that reflects the combination of semantically relevant data in accordance with the search intention of research paper and to examine the algorithm whether there is a difference in the intention-based search results.

Design/methodology/approach

This study constructed a metadata database of 5,007 research articles on Korean studies arranged by conceptual elements of text structure and developed F1(w)-score weighted to conceptual elements based on the F1-score and the number of data points from each element. This study evaluated the algorithm by comparing search results of the F1(w)-score algorithm with those of the Term Frequency- Inverse Document Frequency (TF-IDF) algorithm and simple keyword search.

Findings

The authors find that the higher the F1(w)-score, the closer the semantic relevance of search intention. Furthermore, F1(w)-score generated search results were more closely related to the search intention than those of TF-IDF and simple keyword search.

Research limitations/implications

Even though the F1(w)-score was developed in this study to evaluate the search results of metadata database structured by conceptual elements of text structure of Korean studies, the algorithm can be used as a tool for searching the database which is a tuning process of weighting required.

Practical implications

A metadata database based on text structure and a search method based on weights of metadata elements – F1(w)-score – can be useful for interdisciplinary studies, especially for semantic search in regional studies.

Originality/value

This paper presents a methodology for supporting IR using F1(w)-score—a novel model for weighting metadata elements based on text structure. The F1(w)-score-based search results show the combination of semantically relevant data, which are otherwise difficult to search for using similarity of search words.

Details

The Electronic Library , vol. 39 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 8 August 2008

Miguel A. Martínez‐Prieto, Pablo de la Fuente, Jesús M. Vegas, Joaquín Adiego and Carlos E. Cuesta

This paper aims to present the concept of electronic work, such as an e‐book integrator of concerns (logical structure, appearance and functionality), for representing literary…

2163

Abstract

Purpose

This paper aims to present the concept of electronic work, such as an e‐book integrator of concerns (logical structure, appearance and functionality), for representing literary texts available in electronic heterogeneous environments.

Design/methodology/approach

From the generic description of an e‐book and the descriptive requirements of the BiDiLiC project, the concept of electronic work is presented. These requirements involve a descriptive markup policy (based on TEI‐Lite) which defines the text's logical structure and is used for integrating the other concerns associated with the text: functionality and appearance. Finally, the article presents an example showing the integration of the previous concepts to achieve a functional implementation of the electronic work.

Findings

The electronic work covers the requirements of classic literary texts, while still allowing other types of texts to be represented easily. For this purpose, a robust logical structure based on TEI is defined, which offers an interchange norm for information stored in an electronic form. This representation, developed in XML, allows the logical structure of the text to be described generically, facilitating the integration (around it) of the service's functionality, as well as adapting its appearance for use in heterogeneous environments, such as the internet.

Originality/value

This paper proposes a new approach for interacting with electronic content. This approach is presented from conceptual basis to functional representation by way of theoretical reasoning and innovative technology.

Details

The Electronic Library, vol. 26 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 7 January 2020

Omri Suissa, Avshalom Elmalech and Maayan Zhitomirsky-Geffet

Digitization of historical documents is a challenging task in many digital humanities projects. A popular approach for digitization is to scan the documents into images, and then…

Abstract

Purpose

Digitization of historical documents is a challenging task in many digital humanities projects. A popular approach for digitization is to scan the documents into images, and then convert images into text using optical character recognition (OCR) algorithms. However, the outcome of OCR processing of historical documents is usually inaccurate and requires post-processing error correction. The purpose of this paper is to investigate how crowdsourcing can be utilized to correct OCR errors in historical text collections, and which crowdsourcing methodology is the most effective in different scenarios and for various research objectives.

Design/methodology/approach

A series of experiments with different micro-task’s structures and text lengths were conducted with 753 workers on the Amazon’s Mechanical Turk platform. The workers had to fix OCR errors in a selected historical text. To analyze the results, new accuracy and efficiency measures were devised.

Findings

The analysis suggests that in terms of accuracy, the optimal text length is medium (paragraph-size) and the optimal structure of the experiment is two phase with a scanned image. In terms of efficiency, the best results were obtained when using longer text in the single-stage structure with no image.

Practical implications

The study provides practical recommendations to researchers on how to build the optimal crowdsourcing task for OCR post-correction. The developed methodology can also be utilized to create golden standard historical texts for automatic OCR post-correction.

Originality/value

This is the first attempt to systematically investigate the influence of various factors on crowdsourcing-based OCR post-correction and propose an optimal strategy for this process.

Details

Aslib Journal of Information Management, vol. 72 no. 2
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 31 October 2023

Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…

Abstract

Purpose

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.

Design/methodology/approach

A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.

Findings

1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.

Originality/value

NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 1 October 2003

Maria Pinto

The technological revolution is affecting the structure, form and content of documents, reducing the effectiveness of traditional abstracts that, to some extent, are inadequate to…

2406

Abstract

The technological revolution is affecting the structure, form and content of documents, reducing the effectiveness of traditional abstracts that, to some extent, are inadequate to the new documentary conditions. Aims to show the directions in which abstracting/abstracts can evolve to achieve the necessary adequacy in the new digital environments. Three researching trends are proposed: theoretical, methodological and pragmatic. Theoretically, there are some needs for expanding the document concept, reengineering abstracting and designing interdisciplinary models. Methodologically, the trend is toward the structuring, automating and qualifying of the abstracts. Pragmatically, abstracts networking, combined with alternative and complementary models, open a new and promising horizon. Automating, structuring and qualifying abstracting/abstract offer some short‐term prospects for progress. Concludes that reengineering, networking and visualising would be middle‐term fruitful areas of research toward the full adequacy of abstracting in the new electronic age.

Details

Journal of Documentation, vol. 59 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of over 65000