Search results
1 – 10 of over 1000Lei Lei, Yaochen Deng and Dilin Liu
Examining research topics in a specific area such as accounting is important to both novice and veteran researchers. The present study aims to identify the research topics in the…
Abstract
Purpose
Examining research topics in a specific area such as accounting is important to both novice and veteran researchers. The present study aims to identify the research topics in the area of accounting and to investigate the research trends by finding hot and cold topics from all those identified ones in the field.
Design/methodology/approach
A new dependency-based method focusing on noun phrases, which efficiently extracts research topics from a large set of library data, was proposed. An AR(1) autoregressive model was used to identify topics that have received significantly more or less attention from the researchers. The data used in the study included a total of 4,182 abstracts published in six leading (or premier) accounting journals from 2000 to May 2019.
Findings
The study identified 48 important research topics across the examined period as well as eight hot topics and one cold topic from the 48 topics.
Originality/value
The research topics identified based on the dependency-based method are similar to those found with the technique of latent Dirichlet allocation latent Dirichlet allocation (LDA) topic modelling. In addition, the method seems highly efficient, and the results are easier to interpret. Last, the research topics and trends found in the study provide reference to the researchers in the area of accounting.
Details
Keywords
Cong-Phuoc Phan, Hong-Quang Nguyen and Tan-Tai Nguyen
Large collections of patent documents disclosing novel, non-obvious technologies are publicly available and beneficial to academia and industries. To maximally exploit its…
Abstract
Purpose
Large collections of patent documents disclosing novel, non-obvious technologies are publicly available and beneficial to academia and industries. To maximally exploit its potential, searching these patent documents has increasingly become an important topic. Although much research has processed a large size of collections, a few studies have attempted to integrate both patent classifications and specifications for analyzing user queries. Consequently, the queries are often insufficiently analyzed for improving the accuracy of search results. This paper aims to address such limitation by exploiting semantic relationships between patent contents and their classification.
Design/methodology/approach
The contributions are fourfold. First, the authors enhance similarity measurement between two short sentences and make it 20 per cent more accurate. Second, the Graph-embedded Tree ontology is enriched by integrating both patent documents and classification scheme. Third, the ontology does not rely on rule-based method or text matching; instead, an heuristic meaning comparison to extract semantic relationships between concepts is applied. Finally, the patent search approach uses the ontology effectively with the results sorted based on their most common order.
Findings
The experiment on searching for 600 patent documents in the field of Logistics brings better 15 per cent in terms of F-Measure when compared with traditional approaches.
Research limitations/implications
The research, however, still requires improvement in which the terms and phrases extracted by Noun and Noun phrases making less sense in some aspect and thus might not result in high accuracy. The large collection of extracted relationships could be further optimized for its conciseness. In addition, parallel processing such as Map-Reduce could be further used to improve the search processing performance.
Practical implications
The experimental results could be used for scientists and technologists to search for novel, non-obvious technologies in the patents.
Social implications
High quality of patent search results will reduce the patent infringement.
Originality/value
The proposed ontology is semantically enriched by integrating both patent documents and their classification. This ontology facilitates the analysis of the user queries for enhancing the accuracy of the patent search results.
Details
Keywords
The purpose of this paper is to propose a hybrid ontology‐based solution to expand user's queries.
Abstract
Purpose
The purpose of this paper is to propose a hybrid ontology‐based solution to expand user's queries.
Design/methodology/approach
The solution aims for ontology development and query expansion with ontology‐based approach. The first task is to develop an ontology (named OMP), which relates to key‐properties and key‐members of objects described in words/terms of English vocabulary. Its training methodology is also a hybrid, rule‐based with proposed patterns and statistical‐based solution for selecting the best candidates from TREC English corpus. The second is proposals for mechanisms not only to look for relative result in the ontology OMP to complete and expand user's entered query/noun phrase, but also to expand the search progress by linking the OMP ontology to indexes of information retrieval system. Especially, the base of these two tasks is our proposal for four kinds of semantic relationship of words.
Findings
Several semantic relationships among words in vocabulary has been introduced and currently used in WordNet to represent the system of semantic networks. In another way, our analyzing for words in English vocabulary found that there are some kinds of semantic dependency in some cases for part(s) of a noun phrase, and it can be represented in grammar noun phrase syntax. That affects not only the proposed approach of ontology OMP development via identifying four kinds of semantic relationship and organizing its structure including core element types such as object and key‐member and key‐property, but also ontology training mechanism and solutions of query expansion by adding extended correspondent words (based on that relationship) to original query.
Research limitations/implications
In initial iteration, the approach is applied for English query only with limited size of ontology OMP and dependency on grammar rules‐based in creating patterns to extract data from corpus. For future research, applications for other languages (Vietnamese, Chinese …) with sharp focus on improvement of ontology training quality/quantity and query expansion precision are primary targets.
Practical implications
The developed ontology OMP can be shared as a support for other applications such as semantic data extraction or semantic information retrieval in other researches.
Originality/value
This paper fulfils an approach of ontology‐based query expansion and theoretical definitions of semantic relationship among words. Specially, these kinds of relationship can use to develop a useful semantic network system.
Details
Keywords
BRIAN VICKERY and ALINA VICKERY
The paper describes techniques developed by Tome Associates to process natural language queries into search statements suitable for transmission to online text database systems…
Abstract
The paper describes techniques developed by Tome Associates to process natural language queries into search statements suitable for transmission to online text database systems. The problems discussed include word identification, the handling of unknown words, the contents and structure of system dictionaries, the use of semantic categories and classification, disambiguation of multi‐meaning words, stemming and truncation, noun compounds and indications of relationship between search terms.
With the wealth of information available on the World Wide Web, it is difficult for anyone from a general user to the researcher to easily fulfill their information need. The main…
Abstract
Purpose
With the wealth of information available on the World Wide Web, it is difficult for anyone from a general user to the researcher to easily fulfill their information need. The main challenge is to categorize the documents systematically and also take into account more valuable data such as semantic information. The purpose of this paper is to develop a concept-based search system that leverages the external knowledge resources as the background knowledge for getting the accurate and efficient meaningful search results.
Design/methodology/approach
The paper introduces the approach which is based on formal concept analysis (FCA) with the semantic information to support the document management in information retrieval (IR). To describe the semantic information of the documents, the system uses the popular knowledge resources WordNet and Wikipedia. By using FCA, the system creates the concept lattice as the concept hierarchy of the document and proposes the navigation algorithm for retrieving the hierarchy based on the user query.
Findings
The semantic information of the document is based on the two external popular knowledge resources; the authors find that it will be more efficient to deal with the semantic mismatch problems of user need.
Originality/value
The navigation algorithm proposed in this research is applied to the scientific articles of the National Science Foundation (NSF). The proposed system can enhance the integration and exploration of the scientific articles for the advancement of the Scientific and Engineering Research Community.
Details
Keywords
Chedi Bechikh Ali, Hatem Haddad and Yahya Slimani
A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low…
Abstract
Purpose
A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing.
Design/methodology/approach
In this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets.
Findings
The results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies.
Originality/value
Using and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents.
Details
Keywords
Yi‐ling Lin, Peter Brusilovsky and Daqing He
The goal of the research is to explore whether the use of higher‐level semantic features can help us to build better self‐organising map (SOM) representation as measured from a…
Abstract
Purpose
The goal of the research is to explore whether the use of higher‐level semantic features can help us to build better self‐organising map (SOM) representation as measured from a human‐centred perspective. The authors also explore an automatic evaluation method that utilises human expert knowledge encapsulated in the structure of traditional textbooks to determine map representation quality.
Design/methodology/approach
Two types of document representations involving semantic features have been explored – i.e. using only one individual semantic feature, and mixing a semantic feature with keywords. Experiments were conducted to investigate the impact of semantic representation quality on the map. The experiments were performed on data collections from a single book corpus and a multiple book corpus.
Findings
Combining keywords with certain semantic features achieves significant improvement of representation quality over the keywords‐only approach in a relatively homogeneous single book corpus. Changing the ratios in combining different features also affects the performance. While semantic mixtures can work well in a single book corpus, they lose their advantages over keywords in the multiple book corpus. This raises a concern about whether the semantic representations in the multiple book corpus are homogeneous and coherent enough for applying semantic features. The terminology issue among textbooks affects the ability of the SOM to generate a high quality map for heterogeneous collections.
Originality/value
The authors explored the use of higher‐level document representation features for the development of better quality SOM. In addition the authors have piloted a specific method for evaluating the SOM quality based on the organisation of information content in the map.
Details
Keywords
Ankie Visschedijk and Forbes Gibb
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…
Abstract
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.
Smart manufacturing can lead to disruptive changes in production technologies and business models in the manufacturing industry. This paper aims to identify technological topics…
Abstract
Purpose
Smart manufacturing can lead to disruptive changes in production technologies and business models in the manufacturing industry. This paper aims to identify technological topics in smart manufacturing by using patent data, investigating technological trends and exploring potential opportunities.
Design/methodology/approach
The latent Dirichlet allocation (LDA) topic modeling technique was used to extract latent technological topics, and the generalized linear mixed model (GLMM) was used to analyze the relative emergence levels of the topics. Topic value and topic competitive analyses were developed to evaluate each topic's potential value and identify technological positions of competing firms, respectively.
Findings
A total of 14 topics were extracted from the collected patent data and several fast growth and high-value topics were identified, such as smart connection, cyber-physical systems (CPSs), manufacturing data analytics and powder bed fusion additive manufacturing. Several leading firms apply broad R&D emphasis across a variety of technological topics, while others focus on a few technological topics.
Practical implications
The developed methodology can help firms identify important technological topics in smart manufacturing for making their R&D investment decisions. Firms can select appropriate technology strategies depending on the topic's emergence position in the topic strategy matrix.
Originality/value
Previous research studies have not analyzed the maturity levels of technological topics. The topic-based patent analytics approach can complement previous studies. In addition, this study provides a multi-valuation framework for exploring technological opportunities, thus providing valuable information that supports a more robust understanding of the technology landscape of smart manufacturing.
Details