Search results

1 – 10 of over 79000
Article
Publication date: 1 February 2022

Dan Reynolds

Researchers and teachers have noted the power of students reading text sets or multiple texts on the same topic, and numerous articles have been published with examples of and…

Abstract

Purpose

Researchers and teachers have noted the power of students reading text sets or multiple texts on the same topic, and numerous articles have been published with examples of and frameworks for text set construction. This study aims to traces the theoretical assumptions of these frameworks and explores their distinct implications and tensions for understanding disciplinary literacy in English language arts (ELA).

Design/methodology/approach

The author draws on three frameworks, using a focal article for each: cognitive (Lupo et al., 2018), critical (Lechtenberg, 2018) and disciplinary (Levine et al., 2018), and connect those articles to other research studies in that tradition. Separately, the author describes each of the three text set frameworks’ design principles. Then, across frameworks, the author analyze the disciplinary assumptions around each framework’s centering texts, epistemological goals and trajectories.

Findings

The centering text, goals and trajectories of each framework reflect its underlying epistemological lens. All frameworks include a text that serves as its epistemological center and the cognitive and disciplinary frameworks, both rely on progressions of complexity (knowledge/linguistic and literary, respectively). The author traces additional alignments and tensions between the frameworks and offer suggestions for possible hybridities in reading modality and reading volume.

Originality/value

Many articles have been written about models of text set construction, but few have compared the assumptions behind those models. Examining these assumptions may help English teachers and curriculum designers select texts and build curriculum that leverages the strengths of each model and informs researchers’ understanding of disciplinary literacy in ELA.

Details

English Teaching: Practice & Critique, vol. 21 no. 1
Type: Research Article
ISSN: 1175-8708

Keywords

Article
Publication date: 1 January 1996

PETER INGWERSEN

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a…

2528

Abstract

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic full‐text entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data fusion in IR. By explicitly incorporating all the cognitive structures participating in the interactive communication processes during IR, the cognitive theory provides a comprehensive view of these processes. It encompasses the ad hoc theories of text retrieval and IR techniques hitherto developed in mainstream retrieval research. It has elements in common with van Rijsbergen and Lalmas' logical uncertainty theory and may be regarded as compatible with that conception of IR. Epistemologically speaking, the theory views IR interaction as processes of cognition, potentially occurring in all the information processing components of IR, that may be applied, in particular, to the user in a situational context. The theory draws upon basic empirical results from information seeking investigations in the operational online environment, and from mainstream IR research on partial matching techniques and relevance feedback. By viewing users, source systems, intermediary mechanisms and information in a global context, the cognitive perspective attempts a comprehensive understanding of essential IR phenomena and concepts, such as the nature of information needs, cognitive inconsistency and retrieval overlaps, logical uncertainty, the concept of ‘document’, relevance measures and experimental settings. An inescapable consequence of this approach is to rely more on sociological and psychological investigative methods when evaluating systems and to view relevance in IR as situational, relative, partial, differentiated and non‐linear. The lack of consistency among authors, indexers, evaluators or users is of an identical cognitive nature. It is unavoidable, and indeed favourable to IR. In particular, for full‐text retrieval, alternative semantic entities, including Salton et al.'s ‘passage retrieval’, are proposed to replace the traditional document record as the basic retrieval entity. These empirically observed phenomena of inconsistency and of semantic entities and values associated with data interpretation support strongly a cognitive approach to IR and the logical use of polyrepresentation, cognitive overlaps, and both data fusion and data diffusion.

Details

Journal of Documentation, vol. 52 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 26 July 2011

Stephen Paling

The purpose of this paper is to describe a conceptualization and two‐stage pilot study that explores ways in which fuzzy sets can be used to measure the indexability of literary…

Abstract

Purpose

The purpose of this paper is to describe a conceptualization and two‐stage pilot study that explores ways in which fuzzy sets can be used to measure the indexability of literary texts.

Design/methodology/approach

Participants provided a subject description for each in a series of literary and nonliterary texts. Each participant was also randomly assigned to one of three tasks: using a visual analog scale to rate the clarity of each text, using a visual analog scale to rate the confidence each participant felt in describing the subject of each text, or sorting the texts from most to least clear without the use of a visual analog scale. Nonparametric statistics and qualitative analysis were used to analyze the data.

Findings

Participants and coders used the visual analog scales successfully. The participants perceived literary texts as less clear than nonliterary texts, and expressed less confidence in their subject description of literary texts than in their descriptions of literary texts. The study found preliminary support for the idea that fuzzy sets can provide a useful theoretical basis for examining the indexability of texts.

Originality/value

A measure of the indexability of literary texts could help provide sound theoretical guidance for construction of tools to organize those texts. A structured comparison of literary and nonliterary texts could help to build a theoretical base from which to make practical decisions about whether and how to perform subject analysis on each type of text.

Details

Journal of Documentation, vol. 67 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 29 January 2024

Kai Wang

The identification of network user relationship in Fancircle contributes to quantifying the violence index of user text, mining the internal correlation of network behaviors among…

Abstract

Purpose

The identification of network user relationship in Fancircle contributes to quantifying the violence index of user text, mining the internal correlation of network behaviors among users, which provides necessary data support for the construction of knowledge graph.

Design/methodology/approach

A correlation identification method based on sentiment analysis (CRDM-SA) is put forward by extracting user semantic information, as well as introducing violent sentiment membership. To be specific, the topic of the implementation of topology mapping in the community can be obtained based on self-built field of violent sentiment dictionary (VSD) by extracting user text information. Afterward, the violence index of the user text is calculated to quantify the fuzzy sentiment representation between the user and the topic. Finally, the multi-granularity violence association rules mining of user text is realized by constructing violence fuzzy concept lattice.

Findings

It is helpful to reveal the internal relationship of online violence under complex network environment. In that case, the sentiment dependence of users can be characterized from a granular perspective.

Originality/value

The membership degree of violent sentiment into user relationship recognition in Fancircle community is introduced, and a text sentiment association recognition method based on VSD is proposed. By calculating the value of violent sentiment in the user text, the annotation of violent sentiment in the topic dimension of the text is achieved, and the partial order relation between fuzzy concepts of violence under the effective confidence threshold is utilized to obtain the association relation.

Details

Data Technologies and Applications, vol. 58 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 8 December 2020

Matjaž Kragelj and Mirjana Kljajić Borštnar

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

3178

Abstract

Purpose

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

Design/methodology/approach

The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.

Findings

Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.

Research limitations/implications

The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.

Practical implications

The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.

Social implications

The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.

Originality/value

These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

Details

Journal of Documentation, vol. 77 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 11 June 2018

Mike Thelwall

The purpose of this paper is to investigate whether machine learning induces gender biases in the sense of results that are more accurate for male authors or for female authors…

1399

Abstract

Purpose

The purpose of this paper is to investigate whether machine learning induces gender biases in the sense of results that are more accurate for male authors or for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis.

Design/methodology/approach

This paper uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection.

Findings

Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender data sets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders.

Practical implications

End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with.

Originality/value

This is the first demonstration of gender bias in machine learning sentiment analysis.

Details

Online Information Review, vol. 42 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 1 February 1994

DAVID ELLIS, JONATHAN FURNER‐HINES and PETER WILLETT

An important stage in the process of retrieval of objects from a hypertext database is the creation of a set of inter‐nodal links that are intended to represent the relationships…

Abstract

An important stage in the process of retrieval of objects from a hypertext database is the creation of a set of inter‐nodal links that are intended to represent the relationships existing between objects; this operation is often undertaken manually, just as index terms are often manually assigned to documents in a conventional retrieval system. Studies of conventional systems have suggested that a degree of consistency in the terms assigned to documents by indexers is positively associated with retrieval effectiveness. It is thus of interest to investigate the consistency of assignment of links in separate hypertext versions of the same full‐text document, since a measure of agreement may be related to the subsequent utility of the resulting hypertext databases. The calculation of values indicating the degree of similarity between objects is a technique that has been widely used in the fields of textual and chemical information retrieval; in this paper, we describe the application of arithmetic coefficients and topological indices to the measurement of the degree of similarity between the sets of inter‐nodal links in hypertext databases. We publish the results of a study in which several different sets of links are inserted, by different people, between the paragraphs of each of a number of full‐text documents. Our results show little similarity between the sets of links identified by different people; this finding is comparable with those of studies of inter‐indexer consistency, where it has been found that there is generally only a low level of agreement between the sets of index terms assigned to a document by different indexers.

Details

Journal of Documentation, vol. 50 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 May 2007

Fuchun Peng and Xiangji Huang

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word…

Abstract

Purpose

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task.

Design/methodology/approach

Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation‐based approach was compared with the non‐segmentation‐based approach.

Findings

There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy.

Practical implications

Apply the findings to real web text classification is ongoing work.

Originality/value

The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.

Details

Journal of Documentation, vol. 63 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 June 2001

Eero Sormunen, Jaana Kekÿlÿinen, Jussi Koivisto and Kalervo Jÿrvelin

The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough…

Abstract

The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non‐relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept‐based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept‐based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept‐based QE in ranking than marginally relevant documents.

Details

Journal of Documentation, vol. 57 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 25 October 2022

Victor Diogho Heuer de Carvalho and Ana Paula Cabral Seixas Costa

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is…

Abstract

Purpose

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is supporting analyses, so security authorities can make appropriate decisions about their actions.

Design/methodology/approach

The corpora were obtained through web scraping from a newspaper's website and tweets from a Brazilian metropolitan region. Natural language processing was applied considering: text cleaning, lemmatization, summarization, part-of-speech and dependencies parsing, named entities recognition, and topic modeling.

Findings

Several results were obtained based on the methodology used, highlighting some: an example of a summarization using an automated process; dependency parsing; the most common topics in each corpus; the forty named entities and the most common slogans were extracted, highlighting those linked to public security.

Research limitations/implications

Some critical tasks were identified for the research perspective, related to the applied methodology: the treatment of noise from obtaining news on their source websites, passing through textual elements quite present in social network posts such as abbreviations, emojis/emoticons, and even writing errors; the treatment of subjectivity, to eliminate noise from irony and sarcasm; the search for authentic news of issues within the target domain. All these tasks aim to improve the process to enable interested authorities to perform accurate analyses.

Practical implications

The corpora dedicated to the public security domain enable several analyses, such as mining public opinion on security actions in a given location; understanding criminals' behaviors reported in the news or even on social networks and drawing their attitudes timeline; detecting movements that may cause damage to public property and people welfare through texts from social networks; extracting the history and repercussions of police actions, crossing news with records on social networks; among many other possibilities.

Originality/value

The work on behalf of the corpora reported in this text represents one of the first initiatives to create textual bases in Portuguese, dedicated to Brazil's specific public security domain.

Details

Library Hi Tech, vol. 42 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 10 of over 79000