Search results
1 – 10 of 684Paramita Ray and Amlan Chakrabarti
Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis of users…
Abstract
Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis of users opinion. Hence, the organizations would benefit through the development of a platform, which can analyze public sentiments in the social media about their products and services to provide a value addition in their business process. Over the last few years, deep learning is very popular in the areas of image classification, speech recognition, etc. However, research on the use of deep learning method in sentiment analysis is limited. It has been observed that in some cases the existing machine learning methods for sentiment analysis fail to extract some implicit aspects and might not be very useful. Therefore, we propose a deep learning approach for aspect extraction from text and analysis of users sentiment corresponding to the aspect. A seven layer deep convolutional neural network (CNN) is used to tag each aspect in the opinionated sentences. We have combined deep learning approach with a set of rule-based approach to improve the performance of aspect extraction method as well as sentiment scoring method. We have also tried to improve the existing rule-based approach of aspect extraction by aspect categorization with a predefined set of aspect categories using clustering method and compared our proposed method with some of the state-of-the-art methods. It has been observed that the overall accuracy of our proposed method is 0.87 while that of the other state-of-the-art methods like modified rule-based method and CNN are 0.75 and 0.80 respectively. The overall accuracy of our proposed method shows an increment of 7–12% from that of the state-of-the-art methods.
Details
Keywords
For many pattern recognition problems, the relation between the sample vectors and the class labels are known during the data acquisition procedure. However, how to find the…
Abstract
Purpose
For many pattern recognition problems, the relation between the sample vectors and the class labels are known during the data acquisition procedure. However, how to find the useful rules or knowledge hidden in the data is very important and challengeable. Rule extraction methods are very useful in mining the important and heuristic knowledge hidden in the original high-dimensional data. It can help us to construct predictive models with few attributes of the data so as to provide valuable model interpretability and less training times.
Design/methodology/approach
In this paper, a novel rule extraction method with the application of biclustering algorithm is proposed.
Findings
To choose the most significant biclusters from the huge number of detected biclusters, a specially modified information entropy calculation method is also provided. It will be shown that all of the important knowledge is in practice hidden in these biclusters.
Originality/value
The novelty of the new method lies in the detected biclusters can be conveniently translated into if-then rules. It provides an intuitively explainable and comprehensive approach to extract rules from high-dimensional data while keeping high classification accuracy.
Details
Keywords
Lichao Zhu, Hangzhou Yang and Zhijun Yan
The purpose of this paper is to develop a new method to extract medical temporal information from online health communities.
Abstract
Purpose
The purpose of this paper is to develop a new method to extract medical temporal information from online health communities.
Design/methodology/approach
The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words.
Findings
For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept.
Originality/value
The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.
Details
Keywords
Maria Indrawan-Santiago, Matthias Steinbauer and Gabriele Anderst-Kotsis
Heitor Hoffman Nakashima, Daielly Mantovani and Celso Machado Junior
This paper aims to investigate whether professional data analysts’ trust of black-box systems is increased by explainability artifacts.
Abstract
Purpose
This paper aims to investigate whether professional data analysts’ trust of black-box systems is increased by explainability artifacts.
Design/methodology/approach
The study was developed in two phases. First a black-box prediction model was estimated using artificial neural networks, and local explainability artifacts were estimated using local interpretable model-agnostic explanations (LIME) algorithms. In the second phase, the model and explainability outcomes were presented to a sample of data analysts from the financial market and their trust of the models was measured. Finally, interviews were conducted in order to understand their perceptions regarding black-box models.
Findings
The data suggest that users’ trust of black-box systems is high and explainability artifacts do not influence this behavior. The interviews reveal that the nature and complexity of the problem a black-box model addresses influences the users’ perceptions, trust being reduced in situations that represent a threat (e.g. autonomous cars). Concerns about the models’ ethics were also mentioned by the interviewees.
Research limitations/implications
The study considered a small sample of professional analysts from the financial market, which traditionally employs data analysis techniques for credit and risk analysis. Research with personnel in other sectors might reveal different perceptions.
Originality/value
Other studies regarding trust in black-box models and explainability artifacts have focused on ordinary users, with little or no knowledge of data analysis. The present research focuses on expert users, which provides a different perspective and shows that, for them, trust is related to the quality of data and the nature of the problem being solved, as well as the practical consequences. Explanation of the algorithm mechanics itself is not significantly relevant.
Details
Keywords
Omar Alqaryouti, Nur Siyam, Azza Abdel Monem and Khaled Shaalan
Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help…
Abstract
Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help government entities gain insights on the needs and expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average. Also, when using the same dataset, the proposed approach outperforms machine learning approaches that uses support vector machine (SVM). However, using these lexicons and rules as input features to the SVM model has achieved higher accuracy than other SVM models.
Details
Keywords
Laura Lucantoni, Sara Antomarioni, Filippo Emanuele Ciarapica and Maurizio Bevilacqua
The Overall Equipment Effectiveness (OEE) is considered a standard for measuring equipment productivity in terms of efficiency. Still, Artificial Intelligence solutions are rarely…
Abstract
Purpose
The Overall Equipment Effectiveness (OEE) is considered a standard for measuring equipment productivity in terms of efficiency. Still, Artificial Intelligence solutions are rarely used for analyzing OEE results and identifying corrective actions. Therefore, the approach proposed in this paper aims to provide a new rule-based Machine Learning (ML) framework for OEE enhancement and the selection of improvement actions.
Design/methodology/approach
Association Rules (ARs) are used as a rule-based ML method for extracting knowledge from huge data. First, the dominant loss class is identified and traditional methodologies are used with ARs for anomaly classification and prioritization. Once selected priority anomalies, a detailed analysis is conducted to investigate their influence on the OEE loss factors using ARs and Network Analysis (NA). Then, a Deming Cycle is used as a roadmap for applying the proposed methodology, testing and implementing proactive actions by monitoring the OEE variation.
Findings
The method proposed in this work has also been tested in an automotive company for framework validation and impact measuring. In particular, results highlighted that the rule-based ML methodology for OEE improvement addressed seven anomalies within a year through appropriate proactive actions: on average, each action has ensured an OEE gain of 5.4%.
Originality/value
The originality is related to the dual application of association rules in two different ways for extracting knowledge from the overall OEE. In particular, the co-occurrences of priority anomalies and their impact on asset Availability, Performance and Quality are investigated.
Details
Keywords
Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang
This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.
Abstract
Purpose
This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.
Design/methodology/approach
In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.
Findings
Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.
Originality/value
To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.
Details
Keywords
Zhishuo Liu, Qianhui Shen, Jingmiao Ma and Ziqi Dong
This paper aims to extract the comment targets in Chinese online shopping platform.
Abstract
Purpose
This paper aims to extract the comment targets in Chinese online shopping platform.
Design/methodology/approach
The authors first collect the comment texts, word segmentation, part-of-speech (POS) tagging and extracted feature words twice. Then they cluster the evaluation sentence and find the association rules between the evaluation words and the evaluation object. At the same time, they establish the association rule table. Finally, the authors can mine the evaluation object of comment sentence according to the evaluation word and the association rule table. At last, they obtain comment data from Taobao and demonstrate that the method proposed in this paper is effective by experiment.
Findings
The extracting comment target method the authors proposed in this paper is effective.
Research limitations/implications
First, the study object of extracting implicit features is review clauses, and not considering the context information, which may affect the accuracy of the feature excavation to a certain degree. Second, when extracting feature words, the low-frequency feature words are not considered, but some low-frequency feature words also contain effective information.
Practical implications
Because of the mass online reviews data, reading every comment one by one is impossible. Therefore, it is important that research on handling product comments and present useful or interest comments for clients.
Originality/value
The extracting comment target method the authors proposed in this paper is effective.
Details
Keywords
Zhuoxuan Jiang, Chunyan Miao and Xiaoming Li
Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by…
Abstract
Purpose
Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by learners all over the world, unprecedented massive educational resources are aggregated. The educational resources include videos, subtitles, lecture notes, quizzes, etc., on the teaching side, and forum contents, Wiki, log of learning behavior, log of homework, etc., on the learning side. However, the data are both unstructured and diverse. To facilitate knowledge management and mining on MOOCs, extracting keywords from the resources is important. This paper aims to adapt the state-of-the-art techniques to MOOC settings and evaluate the effectiveness on real data. In terms of practice, this paper also tries to answer the questions for the first time that to what extend can the MOOC resources support keyword extraction models, and how many human efforts are required to make the models work well.
Design/methodology/approach
Based on which side generates the data, i.e instructors or learners, the data are classified to teaching resources and learning resources, respectively. The approach used on teaching resources is based on machine learning models with labels, while the approach used on learning resources is based on graph model without labels.
Findings
From the teaching resources, the methods used by the authors can accurately extract keywords with only 10 per cent labeled data. The authors find a characteristic of the data that the resources of various forms, e.g. subtitles and PPTs, should be separately considered because they have the different model ability. From the learning resources, the keywords extracted from MOOC forums are not as domain-specific as those extracted from teaching resources, but they can reflect the topics which are lively discussed in forums. Then instructors can get feedback from the indication. The authors implement two applications with the extracted keywords: generating concept map and generating learning path. The visual demos show they have the potential to improve learning efficiency when they are integrated into a real MOOC platform.
Research limitations/implications
Conducting keyword extraction on MOOC resources is quite difficult because teaching resources are hard to be obtained due to copyrights. Also, getting labeled data is tough because usually expertise of the corresponding domain is required.
Practical implications
The experiment results support that MOOC resources are good enough for building models of keyword extraction, and an acceptable balance between human efforts and model accuracy can be achieved.
Originality/value
This paper presents a pioneer study on keyword extraction on MOOC resources and obtains some new findings.
Details