Search results
1 – 10 of 270Raj Kumar Bhardwaj, Ritesh Kumar and Mohammad Nazim
This paper evaluates the precision of four metasearch engines (MSEs) – DuckDuckGo, Dogpile, Metacrawler and Startpage, to determine which metasearch engine exhibits the highest…
Abstract
Purpose
This paper evaluates the precision of four metasearch engines (MSEs) – DuckDuckGo, Dogpile, Metacrawler and Startpage, to determine which metasearch engine exhibits the highest level of precision and to identify the metasearch engine that is most likely to return the most relevant search results.
Design/methodology/approach
The research is divided into two parts: the first phase involves four queries categorized into two segments (4-Q-2-S), while the second phase includes six queries divided into three segments (6-Q-3-S). These queries vary in complexity, falling into three types: simple, phrase and complex. The precision, average precision and the presence of duplicates across all the evaluated metasearch engines are determined.
Findings
The study clearly demonstrated that Startpage returned the most relevant results and achieved the highest precision (0.98) among the four MSEs. Conversely, DuckDuckGo exhibited consistent performance across both phases of the study.
Research limitations/implications
The study only evaluated four metasearch engines, which may not be representative of all available metasearch engines. Additionally, a limited number of queries were used, which may not be sufficient to generalize the findings to all types of queries.
Practical implications
The findings of this study can be valuable for accreditation agencies in managing duplicates, improving their search capabilities and obtaining more relevant and precise results. These findings can also assist users in selecting the best metasearch engine based on precision rather than interface.
Originality/value
The study is the first of its kind which evaluates the four metasearch engines. No similar study has been conducted in the past to measure the performance of metasearch engines.
Details
Keywords
The purpose of the paper is to propose a semantic model for describing open source software (OSS) in a machine–human understandable format. The model is extracted to support…
Abstract
Purpose
The purpose of the paper is to propose a semantic model for describing open source software (OSS) in a machine–human understandable format. The model is extracted to support source code reusing and revising as the two primary targets of OSS through a systematic review of related documents.
Design/methodology/approach
Conducting a systematic review, all the software reusing criteria are identified and introduced to the web of data by an ontology for OSS (O4OSS). The software semantic model introduced in this paper explores OSS through triple expressions in which the O4OSS properties are predicates.
Findings
This model improves the quality of web data by describing software in a structured machine–human readable profile, which is linked to the related data that was previously published on the web. Evaluating the OSS semantic model is accomplished through comparing it with previous approaches, comparing the software structured metadata with profile index of software in some well-known repositories, calculating the software retrieval rank and surveying domain experts.
Originality/value
Considering context-specific information and authority levels, the proposed software model would be applicable to any open and close software. Using this model to publish software provides an infrastructure of connected meaningful data and helps developers overcome some specific challenges. By navigating software data, many questions which can be answered only through reading multiple documents can be automatically responded on the web of data.
Details
Keywords
Debasish Batabyal, Nilanjan Ray, Sudin Bag and Kaustav Nag
India is the birthplace of four major religions which are Hinduism, Jainism, Buddhism, and Sikhism. It is a country where people of all religions live in peace and harmony. Many…
Abstract
India is the birthplace of four major religions which are Hinduism, Jainism, Buddhism, and Sikhism. It is a country where people of all religions live in peace and harmony. Many tourists experience different forms of harassment during their pilgrimage journey, for example, fleecing, extortion of money, harassment by beggars, persistence by vendors and priests, fraud, sexual harassment, and other unacceptable behaviors. In order to appreciate the extent of harassment encountered by tourists, an in-depth study was conducted on the reviews provided by tourists on TripAdvisor's (Indian) website. This study characterizes harassments through ethnographic research approach of published reviews. A total of 260 reviews of 28 top Hindu temples are considered for all the states and union territories where the top Hindu pilgrim centers are located, (excluding Nagaland) according to TripAdvisor. The concerned reviews are categorized and further investigated through a primary data collection in proportion with the reviews received in respective temple sites in the study. through structural equation modeling (SEM). Important factors have been identified for future policy issues and recommendations in these most crowded places with unique mass tourism practices.
Details
Keywords
This paper aims to give an overview of the history and evolution of commercial search engines. It traces the development of search engines from their early days to their current…
Abstract
Purpose
This paper aims to give an overview of the history and evolution of commercial search engines. It traces the development of search engines from their early days to their current form as complex technology-powered systems that offer a wide range of features and services.
Design/methodology/approach
In recent years, advancements in artificial intelligence (AI) technology have led to the development of AI-powered chat services. This study explores official announcements and releases of three major search engines, Google, Bing and Baidu, of AI-powered chat services.
Findings
Three major players in the search engine market, Google, Microsoft and Baidu started to integrate AI chat into their search results. Google has released Bard, later upgraded to Gemini, a LaMDA-powered conversational AI service. Microsoft has launched Bing Chat, renamed later to Copilot, a GPT-powered by OpenAI search engine. The largest search engine in China, Baidu, released a similar service called Ernie. There are also new AI-based search engines, which are briefly described.
Originality/value
This paper discusses the strengths and weaknesses of the traditional – algorithmic powered search engines and modern search with generative AI support, and the possibilities of merging them into one service. This study stresses the types of inquiries provided to search engines, users’ habits of using search engines and the technological advantage of search engine infrastructure.
Details
Keywords
Oussama Ayoub, Christophe Rodrigues and Nicolas Travers
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…
Abstract
Purpose
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.
Design/methodology/approach
To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.
Findings
The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.
Originality/value
In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.
Details
Keywords
Debasis Majhi and Bhaskar Mukherjee
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…
Abstract
Purpose
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.
Design/methodology/approach
By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.
Findings
Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.
Practical implications
Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.
Originality/value
To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.
Details
Keywords
Carlos Lopezosa, Dimitrios Giomelakis, Leyberson Pedrosa and Lluís Codina
This paper constitutes the first academic study to be made of Google Discover as applied to online journalism.
Abstract
Purpose
This paper constitutes the first academic study to be made of Google Discover as applied to online journalism.
Design/methodology/approach
This paper constitutes the first academic study to be made of Google Discover as applied to online journalism. The study involved conducting 61 semi-structured interviews with experts that are representative of a range of different professional profiles within the fields of journalism and search engine positioning (SEO) in Brazil, Spain and Greece. Based on the data collected, the authors created five semantic categories and compared the experts' perceptions in order to detect common response patterns.
Findings
This study results confirm the existence of different degrees of convergence and divergence in the opinions expressed in these three countries regarding the main dimensions of Google Discover, including specific strategies using the feed, its impact on web traffic, its impact on both quality and sensationalist content and on the degree of responsibility shown by the digital media in its use. The authors are also able to propose a set of best practices that journalists and digital media in-house web visibility teams should take into account to increase their probability of appearing in Google Discover. To this end, the authors consider strategies in the following areas of application: topics, different aspects of publication, elements of user experience, strategic analysis and diffusion and marketing.
Originality/value
Although research exists on the application of SEO to different areas, there have not, to date, been any studies examining Google Discover.
Peer review
The peer-review history for this article is available at: https://publons.com/publon/10.1108/OIR-10-2022-0574
Details
Keywords
Morteza Mohammadi Ostani, Jafar Ebadollah Amoughin and Mohadeseh Jalili Manaf
This study aims to adjust Thesis-type properties on Schema.org using metadata models and standards (MS) (Bibframe, electronic thesis and dissertations [ETD]-MS, Common European…
Abstract
Purpose
This study aims to adjust Thesis-type properties on Schema.org using metadata models and standards (MS) (Bibframe, electronic thesis and dissertations [ETD]-MS, Common European Research Information Format [CERIF] and Dublin Core [DC]) to enrich the Thesis-type properties for better description and processing on the Web.
Design/methodology/approach
This study is applied, descriptive analysis in nature and is based on content analysis in terms of method. The research population consisted of elements and attributes of the metadata model and standards (Bibframe, ETD-MS, CERIF and DC) and Thesis-type properties in the Schema.org. The data collection tool was a researcher-made checklist, and the data collection method was structured observation.
Findings
The results show that the 65 Thesis-type properties and the two levels of Thing and CreativeWork as its parents on Schema.org that corresponds to the elements and attributes of related models and standards. In addition, 12 properties are special to the Thesis type for better comprehensive description and processing, and 27 properties are added to the CreativeWork type.
Practical implications
Enrichment and expansion of Thesis-type properties on Schema.org is one of the practical applications of the present study, which have enabled more comprehensive description and processing and increased access points and visibility for ETDs in the environment Web and digital libraries.
Originality/value
This study has offered some new Thesis type properties and CreativeWork levels on Schema.org. To the best of the authors’ knowledge, this is the first time this issue is investigated.
Details
Keywords
Emine Sendurur and Sonja Gabriel
This study aims to discover how domain familiarity and language affect the cognitive load and the strategies applied for the evaluation of search engine results pages (SERP).
Abstract
Purpose
This study aims to discover how domain familiarity and language affect the cognitive load and the strategies applied for the evaluation of search engine results pages (SERP).
Design/methodology/approach
This study used an experimental research design. The pattern of the experiment was based upon repeated measures design. Each student was given four SERPs varying in two dimensions: language and content. The criteria of students to decide on the three best links within the SERP, the reasoning behind their selection, and their perceived cognitive load of the given task were the repeated measures collected from each participant.
Findings
The evaluation criteria changed according to the language and task type. The cognitive load was reported higher when the content was presented in English or when the content was academic. Regarding the search strategies, a majority of students trusted familiar sources or relied on keywords they found in the short description of the links. A qualitative analysis showed that students can be grouped into different types according to the reasons they stated for their choices. Source seeker, keyword seeker and specific information seeker were the most common types observed.
Originality/value
This study has an international scope with regard to data collection. Moreover, the tasks and findings contribute to the literature on information literacy.
Details