Search results

1 – 10 of over 1000

View access options

Article

Publication date: 20 November 2009

Enhancing document modeling by means of open topic models: Crossing the frontier of classification schemes in digital libraries by example of the DDC

The purpose of this paper is to present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is to be done by exploring metadata as…

HTML

PDF (300 KB)

Downloads

891

Abstract

Purpose

The purpose of this paper is to present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is to be done by exploring metadata as provided by the Open Archives Initiative (OAI) to derive document snippets as minimal document representations. The reason is to reduce the effort of document processing in digital libraries. Further, the paper seeks to perform feature selection and extension by means of social ontologies and related web‐based lexical resources. This is done to provide reliable topic‐related classifications while circumventing the problem of data sparseness. Finally, the paper aims to evaluate the model by means of two language‐specific corpora. The paper bridges digital libraries, on the one hand, and computational linguistics, on the other. The aim is to make accessible computational linguistic methods to provide thematic classifications in digital libraries based on closed topic models such as the DDC.

Design/methodology/approach

The approach takes the form of text classification, text‐technology, computational linguistics, computational semantics, and social semantics.

Findings

It is shown that SVM‐based classifiers perform best by exploring certain selections of OAI document metadata.

Research limitations/implications

The findings show that it is necessary to further develop SVM‐based DDC‐classifiers by using larger training sets possibly for more than two languages in order to get better F‐measure values.

Originality/value

Algorithmic and formal‐mathematical information is provided on how to build DDC‐classifiers for digital libraries.

Details

Library Hi Tech, vol. 27 no. 4

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

View access options

Article

Publication date: 1 May 1969

LINGUISTICS AND TRANSLATION

R.R.K. HARTMAN

AS an ‘applied linguist’ I look at the process of translation as one of the most interesting and fascinating interlingual operations we know—although we really don't know enough…

HTML

PDF (269 KB)

Downloads

405

Abstract

AS an ‘applied linguist’ I look at the process of translation as one of the most interesting and fascinating interlingual operations we know—although we really don't know enough about it.

Details

Aslib Proceedings, vol. 21 no. 5

Type: Research Article

DOI:

ISSN: 0001-253X

View access options

Article

Publication date: 9 March 2015

Differences over discourse structure differences: a reply to Urquhart and Urquhart

Jennie A. Abrahamson and Victoria L. Rubin

The purpose of this paper is to respond to Urquhart and Urquhart’s critique of the previous work entitled “Discourse structure differences in lay and professional health…

HTML

PDF (92 KB)

Downloads

302

Abstract

Purpose

The purpose of this paper is to respond to Urquhart and Urquhart’s critique of the previous work entitled “Discourse structure differences in lay and professional health communication”, published in this journal in 2012 (Vol. 68 No. 6, pp. 826-851, doi: 10.1108/00220411211277064).

Design/methodology/approach

The authors examine Urquhart and Urquhart’s critique and provide responses to their concerns and cautionary remarks against cross-disciplinary contributions. The authors reiterate the central claim.

Findings

The authors argue that Mann and Thompson’s (1987, 1988) Rhetorical Structure Theory (RST) offers valuable insights into computer-mediated health communication and deserves further discussion of its methodological strength and weaknesses for application in library and information science.

Research limitations/implications

While the authors agree that some methodological limitations pointed out by Urquhart and Urquhart are valid, the authors take this opportunity to correct certain misunderstandings and misstatements.

Originality/value

The authors argue for continued use of innovative techniques borrowed from neighbouring disciplines, in spite of objections from the researchers accustomed to a familiar strand of literature. The authors encourage researchers to consider RST and other computational linguistics-based discourse analysis annotation frameworks that could provide the basis for integrated research, and eventual applications in information behaviour and information retrieval.

Details

Journal of Documentation, vol. 71 no. 2

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 12 October 2012

Discourse structure differences in lay and professional health communication

Jennie A. Abrahamson and Victoria L. Rubin

In this paper the authors seek to compare lay (consumer) and professional (physician) discourse structures in answers to diabetes‐related questions in a public consumer health…

HTML

PDF (167 KB)

Downloads

1042

Abstract

Purpose

In this paper the authors seek to compare lay (consumer) and professional (physician) discourse structures in answers to diabetes‐related questions in a public consumer health information website.

Design/methodology/approach

Ten consumer and ten physician question threads were aligned. They generated 26 consumer and ten physician answers, constituting a total dataset of 717 discourse units (in sentences or sentence fragments). The authors depart from previous LIS health information behaviour research by utilizing a computational linguistics‐based theoretical framework of rhetorical structure theory, which enables research at the pragmatics level of linguistics in terms of the goals and effects of human communication.

Findings

The authors reveal differences in discourse organization by identifying prevalent rhetorical relations in each type of discourse. Consumer answers included predominately (66 per cent) presentational rhetorical structure relations, those intended to motivate or otherwise help a user do something (e.g. motivation, concession, and enablement). Physician answers included mainly subject matter relations (64 per cent), intended to inform, or simply transfer information to a user (e.g. elaboration, condition, and interpretation).

Research limitations/implications

The findings suggest different communicative goals expressed in lay and professional health information sharing. Consumers appear to be more motivating, or activating, and more polite (linguistically) than physicians in how they share information with consumers online in similar topics in diabetes management. The authors consider whether one source of information encourages adherence to healthy behaviour more effectively than another.

Originality/value

Analysing discourse structure – using rhetorical structure theory – is a novel and promising approach in information behaviour research, and one that traverses the lexico‐semantic level of linguistic analysis towards pragmatics of language use.

Details

Journal of Documentation, vol. 68 no. 6

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 1 June 2000

Linguistics: A Guide to the Reference Literature2nd edition

H.G.A. Hughes

HTML

Downloads

Details

Reference Reviews, vol. 14 no. 6

Type: Research Article

DOI:

ISSN: 0950-4125

Keywords

View access options

Article

Publication date: 25 January 2011

Enabling distributed communication of manual skills

Stephen Fox, Patrick Ehlen and Matthew Purver

The purpose of this paper is to inform the development of mixed initiative systems for distributed digital communication of manual skills. In particular, manual skills that are…

HTML

PDF (168 KB)

Downloads

1473

Abstract

Purpose

The purpose of this paper is to inform the development of mixed initiative systems for distributed digital communication of manual skills. In particular, manual skills that are essential in project production paradigms such as engineer‐to‐order.

Design/methodology/approach

Findings from survey research, which included literature review and interviews with practitioners, are reported. Literature review investigated media, strategies, and computation relevant to distributed digital communication of manual skills. Interviews investigated attitudes among industry practitioners towards distributed digital communication of manual skills.

Findings

Communication media, instructional strategies, and computational semantics techniques are available which can be integrated to address the limitations of human communication of manual skills.

Research limitations/implications

Only ten organizations were involved in interviews investigating attitudes towards distributed digital communication of manual skills.

Practical implications

Manual skills will continue to be important to project businesses involved in the production, refurbishment, and/or maintenance of large engineer‐to‐order products such as public buildings and process plants. The limitations of human communication can be addressed by using a variety media, such as augmented reality headsets, to enable new instructional strategies, such as just‐in‐time training. Further, combinations of media and strategies can be integrated with computational semantics in the development of mixed initiative systems which provide feedback as well as initial instruction.

Originality/value

The originality of the research reported in this paper is that it addresses a full range of enablers for distributed communication of manual skills. Further, an overview of computational semantics is presented which does not rely on prior specialist knowledge. The value of this paper is that it introduces a framework for enabling distributed communication of manual skills. In addition, a preliminary ontology for distributed communication of manual skills is introduced, together with recommendations for implementation.

Details

International Journal of Managing Projects in Business, vol. 4 no. 1

Type: Research Article

DOI:

ISSN: 1753-8378

Keywords

View access options

Article

Publication date: 6 February 2017

Multi-granularity hierarchical topic-based segmentation of structured, digital library resources

Zhongyi Wang, Jin Zhang and Jing Huang

Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed…

HTML

PDF (775 KB)

Downloads

443

Abstract

Purpose

Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed but not coherent texts such as documents of a digital library which have hierarchical structures. To overcome the focus on linear segmentation in document segmentation and to realize the purpose of hierarchical segmentation for a digital library’s structured resources, this paper aimed to propose a new multi-granularity hierarchical topic-based segmentation system (MHTSS) to decide section breaks.

Design/methodology/approach

MHTSS adopts up-down segmentation strategy to divide a structured, digital library document into a document segmentation tree. Specifically, it works in a three-stage process, such as document parsing, coarse segmentation based on document access structures and fine-grained segmentation based on lexical cohesion.

Findings

This paper analyzed limitations of document segmentation methods for the structured, digital library resources. Authors found that the combination of document access structures and lexical cohesion techniques should complement each other and allow for a better segmentation of structured, digital library resources. Based on this finding, this paper proposed the MHTSS for the structured, digital library resources. To evaluate it, MHTSS was compared to the TT and C99 algorithms on real-world digital library corpora. Through comparison, it was found that the MHTSS achieves top overall performance.

Practical implications

With MHTSS, digital library users can get their relevant information directly in segments instead of receiving the whole document. This will improve retrieval performance as well as dramatically reduce information overload.

Originality/value

This paper proposed MHTSS for the structured, digital library resources, which combines the document access structures and lexical cohesion techniques to decide section breaks. With this system, end-users can access a document by sections through a document structure tree.

Details

The Electronic Library, vol. 35 no. 1

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 8 May 2017

Competitiveness analysis through comparative relation mining: Evidence from restaurants’ online reviews

Hongwei Wang, Song Gao, Pei Yin and James Nga-Kwok Liu

Comparative opinions widely exist in online reviews as a common way of expressing consumers’ ideas or preferences toward certain products. Such opinion-rich texts are key proxies…

HTML

PDF (374 KB)

Downloads

1367

Abstract

Purpose

Comparative opinions widely exist in online reviews as a common way of expressing consumers’ ideas or preferences toward certain products. Such opinion-rich texts are key proxies for detecting product competitiveness. The purpose of this paper is to set up a model for competitiveness analysis by identifying comparative relations from online reviews for restaurants based on both pattern matching and machine learning.

Design/methodology/approach

The authors define the sub-category of comparative sentences according to Chinese linguistics. Classification rules are set up for each type of comparative relations through class sequence rule. To improve the accuracy of classification, a comparative entity dictionary is then introduced for further identifying comparative sentences. Finally, the authors collect reviews for restaurants from Dianping.com to conduct experiments for testing the proposed model.

Findings

The experiments show that the proposed method outperforms the baseline methods in terms of precision in identifying comparative sentences. On the basis of such comparison-rich sentences, product features and comparative relations are extracted for sentiment analysis, and sentimental score is assigned to each comparative relation to facilitate competitiveness analysis.

Research limitations/implications

Only the explicit comparative relations are discussed, neglecting the implicit ones. Besides that, the study is grounded in the assumption that all features are homogeneous. In some cases, however, the weights to different aspects are not of the same importance to market.

Practical implications

On the basis of comparative relation mining, product features and comparative opinions are extracted for competitiveness analysis, which is of interest to businesses for finding weakness or strength of products, as well as to consumers for making better purchase decisions.

Social implications

Comparative relation mining could be possibly applied in social media for identifying relations among users or products, and ranking users or products, as well as helping companies target and track competitors to enhance competitiveness.

Originality/value

The authors propose a research framework for restaurant competitiveness analysis by mining comparative relations from online consumer reviews. The results would be able to differentiate one restaurant from another in some aspects of interest to consumers, and reveal the changes in these differences over time.

Details

Industrial Management & Data Systems, vol. 117 no. 4

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 3 November 2020

Hate speech detection in Twitter using hybrid embeddings and improved cuckoo search-based neural networks

Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu and Idowu Ademola Osinuga

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with…

HTML

PDF (2.1 MB)

Downloads

478

Abstract

Purpose

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.

Design/methodology/approach

This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.

Findings

The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.

Research limitations/implications

Finally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.

Originality/value

The main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 2 October 2018

The dark side of news community forums: opinion manipulation trolls

Todor Mihaylov, Tsvetomila Mihaylova, Preslav Nakov, Lluís Màrquez, Georgi D. Georgiev and Ivan Kolev Koychev

The purpose of this paper is to explore the dark side of news community forums: the proliferation of opinion manipulation trolls. In particular, it explores the idea that a user…

HTML

PDF (336 KB)

Downloads

1194

Abstract

Purpose

The purpose of this paper is to explore the dark side of news community forums: the proliferation of opinion manipulation trolls. In particular, it explores the idea that a user who is called a troll by several people is likely to be one. It further demonstrates the utility of this idea for detecting accused and paid opinion manipulation trolls and their comments as well as for predicting the credibility of comments in news community forums.

Design/methodology/approach

The authors are aiming to build a classifier to distinguish trolls vs regular users. Unfortunately, it is not easy to get reliable training data. The authors solve this issue pragmatically: the authors assume that a user who is called a troll by several people is likely to be such, which are called accused trolls. Based on this assumption and on leaked reports about actual paid opinion manipulation trolls, the authors build a classifier to distinguish trolls vs regular users.

Findings

The authors compare the profiles of paid trolls vs accused trolls vs non-trolls, and show that a classifier trained to distinguish accused trolls from non-trolls does quite well also at telling apart paid trolls from non-trolls.

Research limitations/implications

The troll detection works even for users with about 10 comments, but it achieves the best performance for users with a sizable number of comments in the forum, e.g. 100 or more. Yet, there is not such a limitation for troll comment detection.

Practical implications

The approach would help forum moderators in their work, by pointing them to the most suspicious users and comments. It would be also useful to investigative journalists who want to find paid opinion manipulation trolls.

Social implications

The authors can offer a better experience to online users by filtering out opinion manipulation trolls and their comments.

Originality/value

The authors propose a novel approach for finding paid opinion manipulation trolls and their posts.

Details

Internet Research, vol. 28 no. 5

Type: Research Article

DOI:

ISSN: 1066-2243

Keywords

Access

Year

Content type

Article (1491)

1 – 10 of over 1000

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Details

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable