Search results
1 – 10 of over 14000
Abstract
Details
Keywords
The purpose of this paper is to propose a practical plan for finding free specialty databases and search engines that access the Deep Web, the hidden part of the internet that…
Abstract
Purpose
The purpose of this paper is to propose a practical plan for finding free specialty databases and search engines that access the Deep Web, the hidden part of the internet that offers a greater quantity and quality of information than the regular web.
Design/methodology/approach
The author presents a self‐paced, adaptable worksheet of Deep Web search techniques. An explanation is provided for the utilization of general search tools to identify other, more specialized search tools. The techniques therein build upon searching methods suggested by previous authors.
Findings
The techniques facilitated the process of finding specialty tools.
Practical implications
The article helps librarians compile toolkits of specialty databases for use in serving their patrons. Reference librarians with collection responsibilities can expand their libraries' collections at no cost by identifying free web databases. In developing countries, librarians without access to subsidized collections of databases can use the practical advice in this article to find free databases for their patrons. In addition, virtual reference librarians can use the techniques to discover databases that they can recommend to patrons in the absence of print reference books.
Originality/value
The article illustrates an alternate, vertical strategy for web searching as opposed to the conventional, horizontal strategy of web searching. While other authors have already suggested some of these techniques, this article further develops these methods, synthesizes these ideas into a plan, and includes more techniques for Deep Web searching.
Details
Keywords
Hanan Alghamdi and Ali Selamat
With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…
Abstract
Purpose
With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.
Design/methodology/approach
This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.
Findings
Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.
Originality/value
At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.
Details
Keywords
– The purpose of this paper is to decrease the traffic created by search engines’ crawlers and solve the deep web problem using an innovative approach.
Abstract
Purpose
The purpose of this paper is to decrease the traffic created by search engines’ crawlers and solve the deep web problem using an innovative approach.
Design/methodology/approach
A new algorithm was formulated based on best existing algorithms to optimize the existing traffic caused by web crawlers, which is approximately 40 percent of all networking traffic. The crux of this approach is that web servers monitor and log changes and communicate them as an XML file to search engines. The XML file includes the information necessary to generate refreshed pages from existing ones and reference new pages that need to be crawled. Furthermore, the XML file is compressed to decrease its size to the minimum required.
Findings
The results of this study have shown that the traffic caused by search engines’ crawlers might be reduced on average by 84 percent when it comes to text content. However, binary content faces many challenges and new algorithms have to be developed to overcome these issues. The proposed approach will certainly mitigate the deep web issue. The XML files for each domain used by search engines might be used by web browsers to refresh their cache and therefore help reduce the traffic generated by normal users. This reduces users’ perceived latency and improves response time to http requests.
Research limitations/implications
The study sheds light on the deficiencies and weaknesses of the algorithms monitoring changes and generating binary files. However, a substantial decrease of traffic is achieved for text-based web content.
Practical implications
The findings of this research can be adopted by web server software and browsers’ developers and search engine companies to reduce the internet traffic caused by crawlers and cut costs.
Originality/value
The exponential growth of web content and other internet-based services such as cloud computing, and social networks has been causing contention on available bandwidth of the internet network. This research provides a much needed approach to keeping traffic in check.
Details
Keywords
The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a form…
Abstract
Purpose
The existence and continued growth of the invisible web creates a major challenge for search engines that are attempting to organize all of the material on the web into a form that is easily retrieved by all users. The purpose of this paper is to identify the challenges and problems underlying existing work in this area.
Design/methodology/approach
A discussion based on a short survey of prior work, including automated discovery of invisible web site search interfaces, automated classification of invisible web sites, label assignment and form filling, information extraction from the resulting pages, learning the query language of the search interface, building content summary for an invisible web site, selecting proper databases, integrating invisible web‐search interfaces, and accessing the performance of an invisible web site.
Findings
Existing technologies and tools for indexing the invisible web follow one of two strategies: indexing the web site interface or examining a portion of the contents of an invisible web site and indexing the results.
Originality/value
The paper is of value to those involved with information management.
Details
Keywords
One of the very first information entrepreneur businesses was Information Unlimited, founded by Sue Rugge and Georgia Finnigan back in 1971. Charging $10/hour for their research…
Abstract
One of the very first information entrepreneur businesses was Information Unlimited, founded by Sue Rugge and Georgia Finnigan back in 1971. Charging $10/hour for their research, Sue and Georgia essentially created a new industry, offering on-demand research provided by skilled librarians and researchers, to anyone who was willing to pay. Sue went on to found two more independent research companies, Information on Demand and The Rugge Group. Sue was also co-founder of The Information Professionals Institute, a company that focused on seminars for the information industry (including an all-day workshop on how to become an information entrepreneur).
Hyo-Jung Oh, Dong-Hyun Won, Chonghyuck Kim, Sung-Hee Park and Yong Kim
The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.
Abstract
Purpose
The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.
Design/methodology/approach
This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages.
Findings
Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case.
Research limitations/implications
To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors.
Practical implications
The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs.
Originality/value
This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.
Details
Keywords
Provides a profile of Marcus Zillman, an innovative leader in the field of information access, organization and use. Zillman, a prolific author, keynote speaker and consultant…
Abstract
Provides a profile of Marcus Zillman, an innovative leader in the field of information access, organization and use. Zillman, a prolific author, keynote speaker and consultant, created BotSpot, a meta‐resource for robot and intelligent agent software sites and information, in 1996, and has continued to produce other innovative resources to help readers and web users.
Details
Keywords
Abstract
Details