Search results

1 – 10 of over 182000
Article
Publication date: 13 December 2018

Thomas Belz, Dominik von Hagen and Christian Steffens

Using a meta-regression analysis, we quantitatively review the empirical literature on the relation between effective tax rate (ETR) and firm size. Accounting literature offers…

Abstract

Using a meta-regression analysis, we quantitatively review the empirical literature on the relation between effective tax rate (ETR) and firm size. Accounting literature offers two competing theories on this relation: The political cost theory, suggesting a positive size-ETR relation, and the political power theory, suggesting a negative size-ETR relation. Using a unique data set of 56 studies that do not show a clear tendency towards either of the two theories, we contribute to the discussion on the size-ETR relation in three ways: First, applying meta-regression analysis on a US meta-data set, we provide evidence supporting the political cost theory. Second, our analysis reveals factors that are possible sources of variation and bias in previous empirical studies; these findings can improve future empirical and analytical models. Third, we extend our analysis to a cross-country meta-data set; this extension enables us to investigate explanations for the two competing theories in more detail. We find that Hofstede’s cultural dimensions theory, a transparency index and a corruption index explain variation in the size-ETR relation. Independent of the two theories, we also find that tax planning aspects potentially affect the size-ETR relation. To our knowledge, these explanations have not yet been investigated in our research context.

Details

Journal of Accounting Literature, vol. 42 no. 1
Type: Research Article
ISSN: 0737-4607

Keywords

Article
Publication date: 19 October 2015

Eugene Ch'ng

The purpose of this paper is to present a Big Data solution as a methodological approach to the automated collection, cleaning, collation, and mapping of multimodal, longitudinal…

Abstract

Purpose

The purpose of this paper is to present a Big Data solution as a methodological approach to the automated collection, cleaning, collation, and mapping of multimodal, longitudinal data sets from social media. The paper constructs social information landscapes (SIL).

Design/methodology/approach

The research presented here adopts a Big Data methodological approach for mapping user-generated contents in social media. The methodology and algorithms presented are generic, and can be applied to diverse types of social media or user-generated contents involving user interactions, such as within blogs, comments in product pages, and other forms of media, so long as a formal data structure proposed here can be constructed.

Findings

The limited presentation of the sequential nature of content listings within social media and Web 2.0 pages, as viewed on web browsers or on mobile devices, do not necessarily reveal nor make obvious an unknown nature of the medium; that every participant, from content producers, to consumers, to followers and subscribers, including the contents they produce or subscribed to, are intrinsically connected in a hidden but massive network. Such networks when mapped, could be quantitatively analysed using social network analysis (e.g. centralities), and the semantics and sentiments could equally reveal valuable information with appropriate analytics. Yet that which is difficult is the traditional approach of collecting, cleaning, collating, and mapping such data sets into a sufficiently large sample of data that could yield important insights into the community structure and the directional, and polarity of interaction on diverse topics. This research solves this particular strand of problem.

Research limitations/implications

The automated mapping of extremely large networks involving hundreds of thousands to millions of nodes, encapsulating high resolution and contextual information, over a long period of time could possibly assist in the proving or even disproving of theories. The goal of this paper is to demonstrate the feasibility of using automated approaches for acquiring massive, connected data sets for academic inquiry in the social sciences.

Practical implications

The methods presented in this paper, together with the Big Data architecture can assist individuals and institutions with a limited budget, with practical approaches in constructing SIL. The software-hardware integrated architecture uses open source software, furthermore, the SIL mapping algorithms are easy to implement.

Originality/value

The majority of research in the literature uses traditional approaches for collecting social networks data. Traditional approaches can be slow and tedious; they do not yield adequate sample size to be of significant value for research. Whilst traditional approaches collect only a small percentage of data, the original methods presented here are able to collect and collate entire data sets in social media due to the automated and scalable mapping techniques.

Details

Industrial Management & Data Systems, vol. 115 no. 9
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 29 August 2019

Vivekanand Venkataraman, Syed Usmanulla, Appaiah Sonnappa, Pratiksha Sadashiv, Suhaib Soofi Mohammed and Sundaresh S. Narayanan

The purpose of this paper is to identify significant factors of environmental variables and pollutants that have an effect on PM2.5 through wavelet and regression analysis.

Abstract

Purpose

The purpose of this paper is to identify significant factors of environmental variables and pollutants that have an effect on PM2.5 through wavelet and regression analysis.

Design/methodology/approach

In order to provide stable data set for regression analysis, multiresolution analysis using wavelets is conducted. For the sampled data, multicollinearity among the independent variables is removed by using principal component analysis and multiple linear regression analysis is conducted using PM2.5 as a dependent variable.

Findings

It is found that few pollutants such as NO2, NOx, SO2, benzene and environmental factors such as ambient temperature, solar radiation and wind direction affect PM2.5. The regression model developed has high R2 value of 91.9 percent, and the residues are stationary and not correlated indicating a sound model.

Research limitations/implications

The research provides a framework for extracting stationary data and other important features such as change points in mean and variance, using the sample data for regression analysis. The work needs to be extended across all areas in India and for various other stationary data sets there can be different factors affecting PM2.5.

Practical implications

Control measures such as control charts can be implemented for significant factors.

Social implications

Rules and regulations can be made more stringent on the factors.

Originality/value

The originality of this paper lies in the integration of wavelets with regression analysis for air pollution data.

Details

International Journal of Quality & Reliability Management, vol. 36 no. 10
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 13 December 2019

Yang Li and Xuhua Hu

The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into…

Abstract

Purpose

The purpose of this paper is to solve the problem of information privacy and security of social users. Mobile internet and social network are more and more deeply integrated into people’s daily life, especially under the interaction of the fierce development momentum of the Internet of Things and diversified personalized services, more and more private information of social users is exposed to the network environment actively or unintentionally. In addition, a large amount of social network data not only brings more benefits to network application providers, but also provides motivation for malicious attackers. Therefore, under the social network environment, the research on the privacy protection of user information has great theoretical and practical significance.

Design/methodology/approach

In this study, based on the social network analysis, combined with the attribute reduction idea of rough set theory, the generalized reduction concept based on multi-level rough set from the perspectives of positive region, information entropy and knowledge granularity of rough set theory were proposed. Furthermore, it was traversed on the basis of the hierarchical compatible granularity space of the original information system and the corresponding attribute values are coarsened. The selected test data sets were tested, and the experimental results were analyzed.

Findings

The results showed that the algorithm can guarantee the anonymity requirement of data publishing and improve the effect of classification modeling on anonymous data in social network environment.

Research limitations/implications

In the test and verification of privacy protection algorithm and privacy protection scheme, the efficiency of algorithm and scheme needs to be tested on a larger data scale. However, the data in this study are not enough. In the following research, more data will be used for testing and verification.

Practical implications

In the context of social network, the hierarchical structure of data is introduced into rough set theory as domain knowledge by referring to human granulation cognitive mechanism, and rough set modeling for complex hierarchical data is studied for hierarchical data of decision table. The theoretical research results are applied to hierarchical decision rule mining and k-anonymous privacy protection data mining research, which enriches the connotation of rough set theory and has important theoretical and practical significance for further promoting the application of this theory. In addition, combined the theory of secure multi-party computing and the theory of attribute reduction in rough set, a privacy protection feature selection algorithm for multi-source decision table is proposed, which solves the privacy protection problem of feature selection in distributed environment. It provides a set of effective rough set feature selection method for privacy protection classification mining in distributed environment, which has practical application value for promoting the development of privacy protection data mining.

Originality/value

In this study, the proposed algorithm and scheme can effectively protect the privacy of social network data, ensure the availability of social network graph structure and realize the need of both protection and sharing of user attributes and relational data.

Details

Library Hi Tech, vol. 40 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 9 September 2014

Josep Maria Brunetti and Roberto García

The growing volumes of semantic data available in the web result in the need for handling the information overload phenomenon. The potential of this amount of data is enormous but…

Abstract

Purpose

The growing volumes of semantic data available in the web result in the need for handling the information overload phenomenon. The potential of this amount of data is enormous but in most cases it is very difficult for users to visualize, explore and use this data, especially for lay-users without experience with Semantic Web technologies. The paper aims to discuss these issues.

Design/methodology/approach

The Visual Information-Seeking Mantra “Overview first, zoom and filter, then details-on-demand” proposed by Shneiderman describes how data should be presented in different stages to achieve an effective exploration. The overview is the first user task when dealing with a data set. The objective is that the user is capable of getting an idea about the overall structure of the data set. Different information architecture (IA) components supporting the overview tasks have been developed, so they are automatically generated from semantic data, and evaluated with end-users.

Findings

The chosen IA components are well known to web users, as they are present in most web pages: navigation bars, site maps and site indexes. The authors complement them with Treemaps, a visualization technique for displaying hierarchical data. These components have been developed following an iterative User-Centered Design methodology. Evaluations with end-users have shown that they get easily used to them despite the fact that they are generated automatically from structured data, without requiring knowledge about the underlying semantic technologies, and that the different overview components complement each other as they focus on different information search needs.

Originality/value

Obtaining semantic data sets overviews cannot be easily done with the current semantic web browsers. Overviews become difficult to achieve with large heterogeneous data sets, which is typical in the Semantic Web, because traditional IA techniques do not easily scale to large data sets. There is little or no support to obtain overview information quickly and easily at the beginning of the exploration of a new data set. This can be a serious limitation when exploring a data set for the first time, especially for lay-users. The proposal is to reuse and adapt existing IA components to provide this overview to users and show that they can be generated automatically from the thesaurus and ontologies that structure semantic data while providing a comparable user experience to traditional web sites.

Details

Aslib Journal of Information Management, vol. 66 no. 5
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 9 August 2021

Vyacheslav I. Zavalin and Shawne D. Miksa

This paper aims to discuss the challenges encountered in collecting, cleaning and analyzing the large data set of bibliographic metadata records in machine-readable cataloging…

Abstract

Purpose

This paper aims to discuss the challenges encountered in collecting, cleaning and analyzing the large data set of bibliographic metadata records in machine-readable cataloging [MARC 21] format. Possible solutions are presented.

Design/methodology/approach

This mixed method study relied on content analysis and social network analysis. The study examined subject representation in MARC 21 metadata records created in 2020 in WorldCat – the largest international database of “big smart data.” The methodological challenges that were encountered and solutions are examined.

Findings

In this general review paper with a focus on methodological issues, the discussion of challenges is followed by a discussion of solutions developed and tested as part of this study. Data collection, processing, analysis and visualization are addressed separately. Lessons learned and conclusions related to challenges and solutions for the design of a large-scale study evaluating MARC 21 bibliographic metadata from WorldCat are given. Overall recommendations for the design and implementation of future research are suggested.

Originality/value

There are no previous publications that address the challenges and solutions of data collection and analysis of WorldCat’s “big smart data” in the form of MARC 21 data. This is the first study to use a large data set to systematically examine MARC 21 library metadata records created after the most recent addition of new fields and subfields to MARC 21 Bibliographic Format standard in 2019 based on resource description and access rules. It is also the first to focus its analyzes on the networks formed by subject terms shared by MARC 21 bibliographic records in a data set extracted from a heterogeneous centralized database WorldCat.

Details

The Electronic Library , vol. 39 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 11 September 2007

Linn Marks Collins, Jeremy A.T. Hussell, Robert K. Hettinga, James E. Powell, Ketan K. Mane and Mark L.B. Martinez

To describe how information visualization can be used in the design of interface tools for large‐scale repositories.

Abstract

Purpose

To describe how information visualization can be used in the design of interface tools for large‐scale repositories.

Design/methodology/approach

One challenge for designers in the context of large‐scale repositories is to create interface tools that help users find specific information of interest. In order to be most effective, these tools need to leverage the cognitive characteristics of the target users. At the Los Alamos National Laboratory, the authors' target users are scientists and engineers who can be characterized as higher‐order, analytical thinkers. In this paper, the authors describe a visualization tool they have created for making the authors' large‐scale digital object repositories more usable for them: SearchGraph, which facilitates data set analysis by displaying search results in the form of a two‐ or three‐dimensional interactive scatter plot.

Findings

Using SearchGraph, users can view a condensed, abstract visualization of search results. They can view the same dataset from multiple perspectives by manipulating several display, sort, and filter options. Doing so allows them to see different patterns in the dataset. For example, they can apply a logarithmic transformation in order to create more scatter in a dense cluster of data points or they can apply filters in order to focus on a specific subset of data points.

Originality/value

SearchGraph is a creative solution to the problem of how to design interface tools for large‐scale repositories. It is particularly appropriate for the authors' target users, who are scientists and engineers. It extends the work of the first two authors on ActiveGraph, a read‐write digital library visualization tool.

Details

Library Hi Tech, vol. 25 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 29 December 2023

Thanh-Nghi Do and Minh-Thu Tran-Nguyen

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD…

Abstract

Purpose

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD and FL-lSVM. These algorithms are designed to address the challenge of large-scale ImageNet classification.

Design/methodology/approach

The authors’ FL-lSGD and FL-lSVM trains in a parallel and incremental manner to build an ensemble local classifier on Raspberry Pis without requiring data exchange. The algorithms load small data blocks of the local training subset stored on the Raspberry Pi sequentially to train the local classifiers. The data block is split into k partitions using the k-means algorithm, and models are trained in parallel on each data partition to enable local data classification.

Findings

Empirical test results on the ImageNet data set show that the authors’ FL-lSGD and FL-lSVM algorithms with 4 Raspberry Pis (Quad core Cortex-A72, ARM v8, 64-bit SoC @ 1.5GHz, 4GB RAM) are faster than the state-of-the-art LIBLINEAR algorithm run on a PC (Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32GB RAM).

Originality/value

Efficiently addressing the challenge of large-scale ImageNet classification, the authors’ novel federated learning algorithms of local classifiers have been tailored to work on the Raspberry Pi. These algorithms can handle 1,281,167 images and 1,000 classes effectively.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 14 December 2023

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 28 October 2013

Robert Fox

For the past decade, as universities have increased their research commitments, the production of large data sets has become prevalent. Up to this point, the storage and curation…

998

Abstract

Purpose

For the past decade, as universities have increased their research commitments, the production of large data sets has become prevalent. Up to this point, the storage and curation of these data sets has been somewhat ad hoc and voluntary. Given recent mandatory stipulations coming from government funding sources regarding the handling of data sets, it is imperative that libraries step into this gap and provision data management services for their institutions. This column aims to explore two primary areas in which libraries can provision services for their parent institutions regarding data management.

Design/methodology/approach

The column is exploratory in nature.

Practical implications

As academic libraries take the lead in data management services, there are many positive implications for their parent institutions. Organizing and preserving important data sets could have a significant impact on the worldwide research community.

Originality/value

All academic libraries, no matter their size, have a level of responsibility regarding the collection and curation of data sets. This is a responsibility not only to the local institution, but also to the wider scope of researchers who may make use of those data sets. This column is an exhortation for academic libraries to take the lead in the area of data management.

Details

OCLC Systems & Services: International digital library perspectives, vol. 29 no. 4
Type: Research Article
ISSN: 1065-075X

Keywords

1 – 10 of over 182000