Search results
1 – 10 of over 1000Alexander Mehler and Ulli Waltinger
The purpose of this paper is to present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is to be done by exploring metadata as…
Abstract
Purpose
The purpose of this paper is to present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is to be done by exploring metadata as provided by the Open Archives Initiative (OAI) to derive document snippets as minimal document representations. The reason is to reduce the effort of document processing in digital libraries. Further, the paper seeks to perform feature selection and extension by means of social ontologies and related web‐based lexical resources. This is done to provide reliable topic‐related classifications while circumventing the problem of data sparseness. Finally, the paper aims to evaluate the model by means of two language‐specific corpora. The paper bridges digital libraries, on the one hand, and computational linguistics, on the other. The aim is to make accessible computational linguistic methods to provide thematic classifications in digital libraries based on closed topic models such as the DDC.
Design/methodology/approach
The approach takes the form of text classification, text‐technology, computational linguistics, computational semantics, and social semantics.
Findings
It is shown that SVM‐based classifiers perform best by exploring certain selections of OAI document metadata.
Research limitations/implications
The findings show that it is necessary to further develop SVM‐based DDC‐classifiers by using larger training sets possibly for more than two languages in order to get better F‐measure values.
Originality/value
Algorithmic and formal‐mathematical information is provided on how to build DDC‐classifiers for digital libraries.
Details
Keywords
AS an ‘applied linguist’ I look at the process of translation as one of the most interesting and fascinating interlingual operations we know—although we really don't know enough…
Jennie A. Abrahamson and Victoria L. Rubin
The purpose of this paper is to respond to Urquhart and Urquhart’s critique of the previous work entitled “Discourse structure differences in lay and professional health…
Abstract
Purpose
The purpose of this paper is to respond to Urquhart and Urquhart’s critique of the previous work entitled “Discourse structure differences in lay and professional health communication”, published in this journal in 2012 (Vol. 68 No. 6, pp. 826-851, doi: 10.1108/00220411211277064).
Design/methodology/approach
The authors examine Urquhart and Urquhart’s critique and provide responses to their concerns and cautionary remarks against cross-disciplinary contributions. The authors reiterate the central claim.
Findings
The authors argue that Mann and Thompson’s (1987, 1988) Rhetorical Structure Theory (RST) offers valuable insights into computer-mediated health communication and deserves further discussion of its methodological strength and weaknesses for application in library and information science.
Research limitations/implications
While the authors agree that some methodological limitations pointed out by Urquhart and Urquhart are valid, the authors take this opportunity to correct certain misunderstandings and misstatements.
Originality/value
The authors argue for continued use of innovative techniques borrowed from neighbouring disciplines, in spite of objections from the researchers accustomed to a familiar strand of literature. The authors encourage researchers to consider RST and other computational linguistics-based discourse analysis annotation frameworks that could provide the basis for integrated research, and eventual applications in information behaviour and information retrieval.
Details
Keywords
Jennie A. Abrahamson and Victoria L. Rubin
In this paper the authors seek to compare lay (consumer) and professional (physician) discourse structures in answers to diabetes‐related questions in a public consumer health…
Abstract
Purpose
In this paper the authors seek to compare lay (consumer) and professional (physician) discourse structures in answers to diabetes‐related questions in a public consumer health information website.
Design/methodology/approach
Ten consumer and ten physician question threads were aligned. They generated 26 consumer and ten physician answers, constituting a total dataset of 717 discourse units (in sentences or sentence fragments). The authors depart from previous LIS health information behaviour research by utilizing a computational linguistics‐based theoretical framework of rhetorical structure theory, which enables research at the pragmatics level of linguistics in terms of the goals and effects of human communication.
Findings
The authors reveal differences in discourse organization by identifying prevalent rhetorical relations in each type of discourse. Consumer answers included predominately (66 per cent) presentational rhetorical structure relations, those intended to motivate or otherwise help a user do something (e.g. motivation, concession, and enablement). Physician answers included mainly subject matter relations (64 per cent), intended to inform, or simply transfer information to a user (e.g. elaboration, condition, and interpretation).
Research limitations/implications
The findings suggest different communicative goals expressed in lay and professional health information sharing. Consumers appear to be more motivating, or activating, and more polite (linguistically) than physicians in how they share information with consumers online in similar topics in diabetes management. The authors consider whether one source of information encourages adherence to healthy behaviour more effectively than another.
Originality/value
Analysing discourse structure – using rhetorical structure theory – is a novel and promising approach in information behaviour research, and one that traverses the lexico‐semantic level of linguistic analysis towards pragmatics of language use.
Details
Keywords
Abstract
Details
Keywords
Stephen Fox, Patrick Ehlen and Matthew Purver
The purpose of this paper is to inform the development of mixed initiative systems for distributed digital communication of manual skills. In particular, manual skills that are…
Abstract
Purpose
The purpose of this paper is to inform the development of mixed initiative systems for distributed digital communication of manual skills. In particular, manual skills that are essential in project production paradigms such as engineer‐to‐order.
Design/methodology/approach
Findings from survey research, which included literature review and interviews with practitioners, are reported. Literature review investigated media, strategies, and computation relevant to distributed digital communication of manual skills. Interviews investigated attitudes among industry practitioners towards distributed digital communication of manual skills.
Findings
Communication media, instructional strategies, and computational semantics techniques are available which can be integrated to address the limitations of human communication of manual skills.
Research limitations/implications
Only ten organizations were involved in interviews investigating attitudes towards distributed digital communication of manual skills.
Practical implications
Manual skills will continue to be important to project businesses involved in the production, refurbishment, and/or maintenance of large engineer‐to‐order products such as public buildings and process plants. The limitations of human communication can be addressed by using a variety media, such as augmented reality headsets, to enable new instructional strategies, such as just‐in‐time training. Further, combinations of media and strategies can be integrated with computational semantics in the development of mixed initiative systems which provide feedback as well as initial instruction.
Originality/value
The originality of the research reported in this paper is that it addresses a full range of enablers for distributed communication of manual skills. Further, an overview of computational semantics is presented which does not rely on prior specialist knowledge. The value of this paper is that it introduces a framework for enabling distributed communication of manual skills. In addition, a preliminary ontology for distributed communication of manual skills is introduced, together with recommendations for implementation.
Details
Keywords
Zhongyi Wang, Jin Zhang and Jing Huang
Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed…
Abstract
Purpose
Current segmentation systems almost invariably focus on linear segmentation and can only divide text into linear sequences of segments. This suits cohesive text such as news feed but not coherent texts such as documents of a digital library which have hierarchical structures. To overcome the focus on linear segmentation in document segmentation and to realize the purpose of hierarchical segmentation for a digital library’s structured resources, this paper aimed to propose a new multi-granularity hierarchical topic-based segmentation system (MHTSS) to decide section breaks.
Design/methodology/approach
MHTSS adopts up-down segmentation strategy to divide a structured, digital library document into a document segmentation tree. Specifically, it works in a three-stage process, such as document parsing, coarse segmentation based on document access structures and fine-grained segmentation based on lexical cohesion.
Findings
This paper analyzed limitations of document segmentation methods for the structured, digital library resources. Authors found that the combination of document access structures and lexical cohesion techniques should complement each other and allow for a better segmentation of structured, digital library resources. Based on this finding, this paper proposed the MHTSS for the structured, digital library resources. To evaluate it, MHTSS was compared to the TT and C99 algorithms on real-world digital library corpora. Through comparison, it was found that the MHTSS achieves top overall performance.
Practical implications
With MHTSS, digital library users can get their relevant information directly in segments instead of receiving the whole document. This will improve retrieval performance as well as dramatically reduce information overload.
Originality/value
This paper proposed MHTSS for the structured, digital library resources, which combines the document access structures and lexical cohesion techniques to decide section breaks. With this system, end-users can access a document by sections through a document structure tree.
Details
Keywords
Hongwei Wang, Song Gao, Pei Yin and James Nga-Kwok Liu
Comparative opinions widely exist in online reviews as a common way of expressing consumers’ ideas or preferences toward certain products. Such opinion-rich texts are key proxies…
Abstract
Purpose
Comparative opinions widely exist in online reviews as a common way of expressing consumers’ ideas or preferences toward certain products. Such opinion-rich texts are key proxies for detecting product competitiveness. The purpose of this paper is to set up a model for competitiveness analysis by identifying comparative relations from online reviews for restaurants based on both pattern matching and machine learning.
Design/methodology/approach
The authors define the sub-category of comparative sentences according to Chinese linguistics. Classification rules are set up for each type of comparative relations through class sequence rule. To improve the accuracy of classification, a comparative entity dictionary is then introduced for further identifying comparative sentences. Finally, the authors collect reviews for restaurants from Dianping.com to conduct experiments for testing the proposed model.
Findings
The experiments show that the proposed method outperforms the baseline methods in terms of precision in identifying comparative sentences. On the basis of such comparison-rich sentences, product features and comparative relations are extracted for sentiment analysis, and sentimental score is assigned to each comparative relation to facilitate competitiveness analysis.
Research limitations/implications
Only the explicit comparative relations are discussed, neglecting the implicit ones. Besides that, the study is grounded in the assumption that all features are homogeneous. In some cases, however, the weights to different aspects are not of the same importance to market.
Practical implications
On the basis of comparative relation mining, product features and comparative opinions are extracted for competitiveness analysis, which is of interest to businesses for finding weakness or strength of products, as well as to consumers for making better purchase decisions.
Social implications
Comparative relation mining could be possibly applied in social media for identifying relations among users or products, and ranking users or products, as well as helping companies target and track competitors to enhance competitiveness.
Originality/value
The authors propose a research framework for restaurant competitiveness analysis by mining comparative relations from online consumer reviews. The results would be able to differentiate one restaurant from another in some aspects of interest to consumers, and reveal the changes in these differences over time.
Details
Keywords
Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu and Idowu Ademola Osinuga
Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with…
Abstract
Purpose
Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.
Design/methodology/approach
This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.
Findings
The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.
Research limitations/implications
Finally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.
Originality/value
The main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.
Details
Keywords
Todor Mihaylov, Tsvetomila Mihaylova, Preslav Nakov, Lluís Màrquez, Georgi D. Georgiev and Ivan Kolev Koychev
The purpose of this paper is to explore the dark side of news community forums: the proliferation of opinion manipulation trolls. In particular, it explores the idea that a user…
Abstract
Purpose
The purpose of this paper is to explore the dark side of news community forums: the proliferation of opinion manipulation trolls. In particular, it explores the idea that a user who is called a troll by several people is likely to be one. It further demonstrates the utility of this idea for detecting accused and paid opinion manipulation trolls and their comments as well as for predicting the credibility of comments in news community forums.
Design/methodology/approach
The authors are aiming to build a classifier to distinguish trolls vs regular users. Unfortunately, it is not easy to get reliable training data. The authors solve this issue pragmatically: the authors assume that a user who is called a troll by several people is likely to be such, which are called accused trolls. Based on this assumption and on leaked reports about actual paid opinion manipulation trolls, the authors build a classifier to distinguish trolls vs regular users.
Findings
The authors compare the profiles of paid trolls vs accused trolls vs non-trolls, and show that a classifier trained to distinguish accused trolls from non-trolls does quite well also at telling apart paid trolls from non-trolls.
Research limitations/implications
The troll detection works even for users with about 10 comments, but it achieves the best performance for users with a sizable number of comments in the forum, e.g. 100 or more. Yet, there is not such a limitation for troll comment detection.
Practical implications
The approach would help forum moderators in their work, by pointing them to the most suspicious users and comments. It would be also useful to investigative journalists who want to find paid opinion manipulation trolls.
Social implications
The authors can offer a better experience to online users by filtering out opinion manipulation trolls and their comments.
Originality/value
The authors propose a novel approach for finding paid opinion manipulation trolls and their posts.
Details