Search results

1 – 10 of over 14000
Content available
Article
Publication date: 1 February 2002

David Mason

162

Abstract

Details

The Electronic Library, vol. 20 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 20 February 2007

Brett Spencer

The purpose of this paper is to propose a practical plan for finding free specialty databases and search engines that access the Deep Web, the hidden part of the internet that…

3076

Abstract

Purpose

The purpose of this paper is to propose a practical plan for finding free specialty databases and search engines that access the Deep Web, the hidden part of the internet that offers a greater quantity and quality of information than the regular web.

Design/methodology/approach

The author presents a self‐paced, adaptable worksheet of Deep Web search techniques. An explanation is provided for the utilization of general search tools to identify other, more specialized search tools. The techniques therein build upon searching methods suggested by previous authors.

Findings

The techniques facilitated the process of finding specialty tools.

Practical implications

The article helps librarians compile toolkits of specialty databases for use in serving their patrons. Reference librarians with collection responsibilities can expand their libraries' collections at no cost by identifying free web databases. In developing countries, librarians without access to subsidized collections of databases can use the practical advice in this article to find free databases for their patrons. In addition, virtual reference librarians can use the techniques to discover databases that they can recommend to patrons in the absence of print reference books.

Originality/value

The article illustrates an alternate, vertical strategy for web searching as opposed to the conventional, horizontal strategy of web searching. While other authors have already suggested some of these techniques, this article further develops these methods, synthesizes these ideas into a plan, and includes more techniques for Deep Web searching.

Details

Reference Services Review, vol. 35 no. 1
Type: Research Article
ISSN: 0090-7324

Keywords

Article
Publication date: 6 January 2022

Hanan Alghamdi and Ali Selamat

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…

Abstract

Purpose

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.

Design/methodology/approach

This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.

Findings

Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.

Originality/value

At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 February 2016

Mhamed Zineddine

– The purpose of this paper is to decrease the traffic created by search engines’ crawlers and solve the deep web problem using an innovative approach.

1382

Abstract

Purpose

The purpose of this paper is to decrease the traffic created by search engines’ crawlers and solve the deep web problem using an innovative approach.

Design/methodology/approach

A new algorithm was formulated based on best existing algorithms to optimize the existing traffic caused by web crawlers, which is approximately 40 percent of all networking traffic. The crux of this approach is that web servers monitor and log changes and communicate them as an XML file to search engines. The XML file includes the information necessary to generate refreshed pages from existing ones and reference new pages that need to be crawled. Furthermore, the XML file is compressed to decrease its size to the minimum required.

Findings

The results of this study have shown that the traffic caused by search engines’ crawlers might be reduced on average by 84 percent when it comes to text content. However, binary content faces many challenges and new algorithms have to be developed to overcome these issues. The proposed approach will certainly mitigate the deep web issue. The XML files for each domain used by search engines might be used by web browsers to refresh their cache and therefore help reduce the traffic generated by normal users. This reduces users’ perceived latency and improves response time to http requests.

Research limitations/implications

The study sheds light on the deficiencies and weaknesses of the algorithms monitoring changes and generating binary files. However, a substantial decrease of traffic is achieved for text-based web content.

Practical implications

The findings of this research can be adopted by web server software and browsers’ developers and search engine companies to reduce the internet traffic caused by crawlers and cut costs.

Originality/value

The exponential growth of web content and other internet-based services such as cloud computing, and social networks has been causing contention on available bandwidth of the internet network. This research provides a much needed approach to keeping traffic in check.

Details

Internet Research, vol. 26 no. 1
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 1 June 2005

Yanbo Ru and Ellis Horowitz

The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a form…

2368

Abstract

Purpose

The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a form that is easily retrieved by all users. The purpose of this paper is to identify the challenges and problems underlying existing work in this area.

Design/methodology/approach

A discussion based on a short survey of prior work, including automated discovery of invisible web site search interfaces, automated classification of invisible web sites, label assignment and form filling, information extraction from the resulting pages, learning the query language of the search interface, building content summary for an invisible web site, selecting proper databases, integrating invisible web‐search interfaces, and accessing the performance of an invisible web site.

Findings

Existing technologies and tools for indexing the invisible web follow one of two strategies: indexing the web site interface or examining a portion of the contents of an invisible web site and indexing the results.

Originality/value

The paper is of value to those involved with information management.

Details

Online Information Review, vol. 29 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

Book part
Publication date: 21 November 2005

Mary Ellen Bates

One of the very first information entrepreneur businesses was Information Unlimited, founded by Sue Rugge and Georgia Finnigan back in 1971. Charging $10/hour for their research…

Abstract

One of the very first information entrepreneur businesses was Information Unlimited, founded by Sue Rugge and Georgia Finnigan back in 1971. Charging $10/hour for their research, Sue and Georgia essentially created a new industry, offering on-demand research provided by skilled librarians and researchers, to anyone who was willing to pay. Sue went on to found two more independent research companies, Information on Demand and The Rugge Group. Sue was also co-founder of The Information Professionals Institute, a company that focused on seminars for the information industry (including an all-day workshop on how to become an information entrepreneur).

Details

Advances in Librarianship
Type: Book
ISBN: 978-0-12024-629-8

Article
Publication date: 19 March 2018

Hyo-Jung Oh, Dong-Hyun Won, Chonghyuck Kim, Sung-Hee Park and Yong Kim

The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.

Abstract

Purpose

The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.

Design/methodology/approach

This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages.

Findings

Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case.

Research limitations/implications

To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors.

Practical implications

The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs.

Originality/value

This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.

Details

Data Technologies and Applications, vol. 52 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Abstract

Details

Cryptomarkets: A Research Companion
Type: Book
ISBN: 978-1-83867-030-6

Article
Publication date: 1 October 2004

Gerry McKiernan

Provides a profile of Marcus Zillman, an innovative leader in the field of information access, organization and use. Zillman, a prolific author, keynote speaker and consultant…

Abstract

Provides a profile of Marcus Zillman, an innovative leader in the field of information access, organization and use. Zillman, a prolific author, keynote speaker and consultant, created BotSpot, a meta‐resource for robot and intelligent agent software sites and information, in 1996, and has continued to produce other innovative resources to help readers and web users.

Details

Library Hi Tech News, vol. 21 no. 9
Type: Research Article
ISSN: 0741-9058

Keywords

Abstract

1 – 10 of over 14000