Search results
1 – 10 of 197The purpose of this paper is to improve the conceptual-based search by incorporating structural ontological information such as concepts and relations. Generally, Semantic-based…
Abstract
Purpose
The purpose of this paper is to improve the conceptual-based search by incorporating structural ontological information such as concepts and relations. Generally, Semantic-based information retrieval aims to identify relevant information based on the meanings of the query terms or on the context of the terms and the performance of semantic information retrieval is carried out through standard measures-precision and recall. Higher precision leads to the (meaningful) relevant documents obtained and lower recall leads to the less coverage of the concepts.
Design/methodology/approach
In this paper, the authors enhance the existing ontology-based indexing proposed by Kohler et al., by incorporating sibling information to the index. The index designed by Kohler et al., contains only super and sub-concepts from the ontology. In addition, in our approach, we focus on two tasks; query expansion and ranking of the expanded queries, to improve the efficiency of the ontology-based search. The aforementioned tasks make use of ontological concepts, and relations existing between those concepts so as to obtain semantically more relevant search results for a given query.
Findings
The proposed ontology-based indexing technique is investigated by analysing the coverage of concepts that are being populated in the index. Here, we introduce a new measure called index enhancement measure, to estimate the coverage of ontological concepts being indexed. We have evaluated the ontology-based search for the tourism domain with the tourism documents and tourism-specific ontology. The comparison of search results based on the use of ontology “with and without query expansion” is examined to estimate the efficiency of the proposed query expansion task. The ranking is compared with the ORank system to evaluate the performance of our ontology-based search. From these analyses, the ontology-based search results shows better recall when compared to the other concept-based search systems. The mean average precision of the ontology-based search is found to be 0.79 and the recall is found to be 0.65, the ORank system has the mean average precision of 0.62 and the recall is found to be 0.51, while the concept-based search has the mean average precision of 0.56 and the recall is found to be 0.42.
Practical implications
When the concept is not present in the domain-specific ontology, the concept cannot be indexed. When the given query term is not available in the ontology then the term-based results are retrieved.
Originality/value
In addition to super and sub-concepts, we incorporate the concepts present in same level (siblings) to the ontological index. The structural information from the ontology is determined for the query expansion. The ranking of the documents depends on the type of the query (single concept query, multiple concept queries and concept with relation queries) and the ontological relations that exists in the query and the documents. With this ontological structural information, the search results showed us better coverage of concepts with respect to the query.
Details
Keywords
E. Fersini and F. Sartori
The need of tools for content analysis, information extraction and retrieval of multimedia objects in their native form is strongly emphasized into the judicial domain: digital…
Abstract
Purpose
The need of tools for content analysis, information extraction and retrieval of multimedia objects in their native form is strongly emphasized into the judicial domain: digital videos represent a fundamental informative source of events occurring during judicial proceedings that should be stored, organized and retrieved in short time and with low cost. This paper seeks to address these issues.
Design/methodology/approach
In this context the JUMAS system, stem from the homonymous European Project (www.jumasproject.eu), takes up the challenge of exploiting semantics and machine learning techniques towards a better usability of multimedia judicial folders.
Findings
In this paper one of the most challenging issues addressed by the JUMAS project is described: extracting meaningful abstracts of given judicial debates in order to efficiently access salient contents. In particular, the authors present an ontology enhanced multimedia summarization environment able to derive a synthetic representation of judicial media contents by a limited loss of meaningful information while overcoming the information overload problem.
Originality/value
The adoption of ontology‐based query expansion has made it possible to improve the performance of multimedia summarization algorithms with respect to the traditional approaches based on statistics. The effectiveness of the proposed approach has been evaluated on real media contents, highlighting a good potential for extracting key events in the challenging area of judicial proceedings.
Details
Keywords
D. Wollersheim and J. W. Rahayu
This paper presents a framework which combines data and text retrieval techniques to exercise and evaluate ontology based query expansions. We prepare by using linguistic…
Abstract
This paper presents a framework which combines data and text retrieval techniques to exercise and evaluate ontology based query expansions. We prepare by using linguistic techniques to identify query and document concepts, locating them in a ontologically defined semantic space. Expansions originate from the identified query concepts, with success determined by matching in the relevant document set. We identify three orthogonal dimensions that can affect query expansion success; relationship source, success measure technique, and query expansion technique. Expansion technique is further divided into six different categories: simple pruning, complex probability, voting, directional, semantic propagation, and multiple source concept. We describe each technique and show examples where they would be useful. The system architecture used facilitates plugging in of various expansion and evaluation routines, and flowing results from one method to the next. The system is useful for microanalysis of query expansion, discovering which components of ontological derived knowledge most influence query expansion success. In this work, we apply our framework to the medical domain.
Details
Keywords
Awny Sayed and Amal Al Muqrishi
The purpose of this paper is to present an efficient and scalable Arabic semantic search engine based on a domain-specific ontological graph for Colleges of Applied Science…
Abstract
Purpose
The purpose of this paper is to present an efficient and scalable Arabic semantic search engine based on a domain-specific ontological graph for Colleges of Applied Science, Sultanate of Oman (CASOnto). It also supports the factorial question answering and uses two types of searching: the keyword-based search and the semantics-based search in both languages Arabic and English. This engine is built on variety of technologies such as resource description framework data and ontological graph. Furthermore, two experimental results are conducted; the first is a comparison among entity-search and the classical-search in the system itself. The second compares the CASOnto with well-known semantic search engines such as Kngine, Wolfram Alpha and Google to measure their performance and efficiency.
Design/methodology/approach
The design and implementation of the system comprises the following phases, namely, designing inference, storing, indexing, searching, query processing and the user’s friendly interface, where it is designed based on a specific domain of the IBRI CAS (College of Applied Science) to highlight the academic and nonacademic departments. Furthermore, it is ontological inferred data stored in the tuple data base (TDB) and MySQL to handle the keyword-based search as well as entity-based search. The indexing and searching processes are built based on the Lucene for the keyword search, while TDB is used for the entity search. Query processing is a very important component in the search engines that helps to improve the user’s search results and make the system efficient and scalable. CASOnto handles the Arabic issues such as spelling correction, query completion, stop words’ removal and diacritics removal. It also supports the analysis of the factorial question answering.
Findings
In this paper, an efficient and scalable Arabic semantic search engine is proposed. The results show that the semantic search that built on the SPARQL is better than the classical search in both simple and complex queries. Clearly, the accuracy of semantic search equals to 100 per cent in both types of queries. On the other hand, the comparison of CASOnto with the Wolfram Alpha, Kngine and Google refers to better results by CASOnto. Consequently, it seems that our proposed engine retrieved better and efficient results than other engines. Thus, it is built according to the ontological domain-specific, highly scalable performance and handles the complex queries well by understanding the context behind the query.
Research limitations/implications
The proposed engine is built on a specific domain (CAS Ibri – Oman), and in the future vision, it will highlight the nonfactorial question answering and expand the domain of CASOnto to involve more integrated different domains.
Originality/value
The main contribution of this paper is to build an efficient and scalable Arabic semantic search engine. Because of the widespread use of search engines, a new dimension of challenge is created to keep up with the evolution of the semantic Web. Whereas, catering to the needs of users has become a matter of paramount importance in the light of artificial intelligence and technological development to access the accurate and the efficient information in less possible time. However, the research challenges still in its infancy due to lack of research engine that supports the Arabic language. It could be traced back to the complexity of the Arabic language morphological and grammar rules.
Details
Keywords
Andreas Vlachidis, Ceri Binding, Douglas Tudhope and Keith May
This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological…
Abstract
Purpose
This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic‐aware “rich” indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project.
Design/methodology/approach
The paper proposes use of the English Heritage extension (CRM‐EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology‐Oriented Information Extraction process. The process of semantic indexing is based on a rule‐based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules.
Findings
Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic‐aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms.
Originality/value
The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as “Grey Literature”, from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.
Details
Keywords
This paper seeks to adopt FRBRoo as an ontological approach to integrate heterogeneous metadata, and transform human-understandable format into machine-understandable format for…
Abstract
Purpose
This paper seeks to adopt FRBRoo as an ontological approach to integrate heterogeneous metadata, and transform human-understandable format into machine-understandable format for semantic query.
Design/methodology/approach
Two cases of use with museum artefacts and literary works were exploited to illustrate how FRBRoo can be used to re-contextualize the semantics of elements and the semantic relationships embedded in those elements. The shared ontology was then RDFized and examples were explored to examine the feasibility of the proposed approach.
Findings
FRBRoo can play a role as inter lingua aligning museum and library metadata to achieve heterogeneous metadata integration and semantic query without changing either of the original approaches to fit the other.
Research limitations/implications
Exploration of more diverse use cases is required to further align the different approaches of museums and libraries using FRBRoo and make revisions.
Practical implications
Solid evidence is provided for the use of FRBRoo in heterogeneous metadata integration and semantic query.
Originality/value
This is the first study to elaborate how FRBRoo can play a role as a shared ontology to integrate the heterogeneous metadata generated by museums and libraries. This paper also shows how the proposed approach is distinct from the Dublin Core format crosswalk in re-contextualizing semantic meanings and their relationships, and further provides four new sub-types for mapping description language.
Details
Keywords
Dimitrios A. Koutsomitropoulos
Effective synthesis of learning material is a multidimensional problem, which often relies on handpicking approaches and human expertise. Sources of educational content exist in a…
Abstract
Purpose
Effective synthesis of learning material is a multidimensional problem, which often relies on handpicking approaches and human expertise. Sources of educational content exist in a variety of forms, each offering proprietary metadata information and search facilities. This paper aims to show that it is possible to harvest scholarly resources from various repositories of open educational resources (OERs) in a federated manner. In addition, their subject can be automatically annotated using ontology inference and standard thematic terminologies.
Design/methodology/approach
Based on a semantic interpretation of their metadata, authors can align external collections and maintain them in a shared knowledge pool known as the Learning Object Ontology Repository (LOOR). The author leverages the LOOR and show that it is possible to search through various educational repositories’ metadata and amalgamate their semantics into a common learning object (LO) ontology. The author then proceeds with automatic subject classification of LOs using keyword expansion and referencing standard taxonomic vocabularies for thematic classification, expressed in SKOS.
Findings
The approach for automatic subject classification simply takes advantage of the implicit information in the searching and selection process and combines them with expert knowledge in the domain of reference (SKOS thesauri). This is shown to improve recall by a considerable factor, while precision remains unaffected.
Originality/value
To the best of the author’s knowledge, the idea of subject classification of LOs through the reuse of search query terms combined with SKOS-based matching and expansion has not been investigated before in a federated scholarly setting.
Details
Keywords
Omar El Midaoui, Btihal El Ghali, Abderrahim El Qadi and Moulay Driss Rahmani
Geographical query formulation is one of the key difficulties for users in search engines. The purpose of this study is to improve geographical search by proposing a novel…
Abstract
Purpose
Geographical query formulation is one of the key difficulties for users in search engines. The purpose of this study is to improve geographical search by proposing a novel geographical query reformulation (GQR) technique using a geographical taxonomy and word senses.
Design/methodology/approach
This work introduces an approach for GQR, which combines a method of query components separation that uses GeoNames, a technique for reformulating these components using WordNet and a geographic taxonomy constructed using the latent semantic analysis method.
Findings
The proposed approach was compared to two methods from the literature, using the mean average precision (MAP) and the precision at 20 documents (P@20). The experimental results show that it outperforms the other techniques by 15.73% to 31.21% in terms of P@20 and by 17.81% to 35.52% in terms of MAP.
Research limitations/implications
According to the experimental results, the best created taxonomy using the geographical adjacency taxonomy builder contains 7.67% of incorrect links. This paper believes that using a very big amount of data for taxonomy building can give better results. Thus, in future work, this paper intends to apply the approach in a big data context.
Originality/value
Despite this, the reformulation of geographical queries using the new proposed approach considerably improves the precision of queries and retrieves relevant documents that were not retrieved using the original queries. The strengths of the technique lie in the facts of reformulating both thematic and spatial entities and replacing the spatial entity of the query with terms that explain the intent of the query more precisely using a geographical taxonomy.
Details
Keywords
Chihli Hung, Chih‐Fong Tsai, Shin‐Yuan Hung and Chang‐Jiang Ku
A grid information retrieval model has benefits for sharing resources and processing mass information, but cannot handle conceptual heterogeneity without integration of semantic…
Abstract
Purpose
A grid information retrieval model has benefits for sharing resources and processing mass information, but cannot handle conceptual heterogeneity without integration of semantic information. The purpose of this research is to propose a concept‐based retrieval mechanism to catch the user's query intentions in a grid environment. This research re‐ranks documents over distributed data sources and evaluates performance based on the user judgment and processing time.
Design/methodology/approach
This research uses the ontology lookup service to build the concept set in the ontology and captures the user's query intentions as a means of query expansion for searching. The Globus toolkit is used to implement the grid service. The modification of the collection retrieval inference (CORI) algorithm is used for re‐ranking documents over distributed data sources.
Findings
The experiments demonstrate that this proposed approach successfully describes the user's query intentions evaluated by user judgment. For processing time, building a grid information retrieval model is a suitable strategy for the ontology‐based retrieval model.
Originality/value
Most current semantic grid models focus on construction of the semantic grid, and do not consider re‐ranking search results from distributed data sources. The significance of evaluation from the user's viewpoint is also ignored. This research proposes a method that captures the user's query intentions and re‐ranks documents in a grid based on the CORI algorithm. This proposed ontology‐based retrieval mechanism calculates the global relevance score of all documents in a grid and displays those documents with higher relevance to users.
Details
Keywords
J. Alfredo Sánchez, María Auxilio Medina, Oleg Starostenko, Antonio Benitez and Eduardo López Domínguez
This paper seeks to focus on the problems of integrating information from open, distributed scholarly collections, and on the opportunities these collections represent for…
Abstract
Purpose
This paper seeks to focus on the problems of integrating information from open, distributed scholarly collections, and on the opportunities these collections represent for research communities in developing countries. The paper aims to introduce OntOAIr, a semi‐automatic method for constructing lightweight ontologies of documents in repositories such as those provided by the Open Archives Initiative (OAI).
Design/methodology/approach
OntOAIr uses simplified document representations, a clustering algorithm, and ontological engineering techniques.
Findings
The paper presents experimental results of the potential positive impact of ontologies and specifically of OntOAIr on the use of collections provided by OAI.
Research limitations/implications
By applying OntOAIr, scholars who frequently spend many hours organizing OAI information spaces will obtain support that will allow them to speed up the entire research cycle and, expectedly, participate more fully in global research communities.
Originality/value
The proposed method allows human and software agents to organize and retrieve groups of documents from multiple collections. Applications of OntOAIr include enhanced document retrieval. In this paper, the authors focus particularly on document retrieval applications.
Details