Search results
1 – 10 of 197With the rise of artificial intelligence and machine learning, competitive data science platforms like Kaggle are gaining momentum. From a host's perspective, the platforms offer…
Abstract
With the rise of artificial intelligence and machine learning, competitive data science platforms like Kaggle are gaining momentum. From a host's perspective, the platforms offer access to a large crowd of data scientists who can solve their data science problems efficiently and cost-effectively. From the participant's perspective, the platforms provide the opportunity to apply their skills to real-world problems, interact with other data scientists, and win prizes. The chapter provides an overview of competitive data science platforms and assesses their potential for business and academia. A series of opportunities and challenges of data competitions are outlined, and a concrete case is illustrated. The chapter also demonstrates common pitfalls that hosts of data competitions need to be aware of by discussing the relevance of problem definition, data leakage, and metrics to evaluate different solutions.
Details
Keywords
Torsten Maier, Joanna DeFranco and Christopher Mccomb
Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this…
Abstract
Purpose
Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this assumption. This study aims to examine the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased use for software development and data science, and platforms often encourage teamwork between participants.
Design/methodology/approach
We specifically examine the teams participating in data science competitions hosted by Kaggle. We analyze the data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis.
Findings
This work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage and pronoun usage when comparing top- and bottom-performing teams.
Research limitations/implications
These results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. Limitations of this research include not factoring in team member experience level and reliance on extant data.
Originality/value
These results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.
Details
Keywords
Hanieh Javadi Khasraghi, Isaac Vaghefi and Rudy Hirschheim
The research study intends to gain a better understanding of members' behaviors in the context of crowdsourcing contests. The authors examined the key factors that can motivate or…
Abstract
Purpose
The research study intends to gain a better understanding of members' behaviors in the context of crowdsourcing contests. The authors examined the key factors that can motivate or discourage contributing to a team and within the community.
Design/methodology/approach
The authors conducted 21 semi-structured interviews with Kaggle.com members and analyzed the data to capture individual members' contributions and emerging determinants that play a role during this process. The authors adopted a qualitative approach and used standard thematic coding techniques to analyze the data.
Findings
The analysis revealed two processes underlying contribution to the team and community and the decision-making involved in each. Accordingly, a set of key factors affecting each process were identified. Using Holbrook's (2006) typology of value creation, these factors were classified into four types, namely extrinsic and self-oriented (economic value), extrinsic and other-oriented (social value), intrinsic and self-oriented (hedonic value), and intrinsic and other-oriented (altruistic value). Three propositions were developed, which can be tested in future research.
Research limitations/implications
The study has a few limitations, which point to areas for future research on this topic. First, the authors only assessed the behaviors of individuals who use the Kaggle platform. Second, the findings of this study may not be generalizable to other crowdsourcing platforms such as Amazon Mechanical Turk, where there is no competition, and participants cannot meaningfully contribute to the community. Third, the authors collected data from a limited (yet knowledgeable) number of interviewees. It would be useful to use bigger sample sizes to assess other possible factors that did not emerge from our analysis. Finally, the authors presented a set of propositions for individuals' contributory behavior in crowdsourcing contest platforms but did not empirically test them. Future research is necessary to validate these hypotheses, for instance, by using quantitative methods (e.g. surveys or experiments).
Practical implications
The authors offer recommendations for implementing appropriate mechanisms for contribution to crowdsourcing contests and platforms. Practitioners should design architectures to minimize the effect of factors that reduce the likelihood of contributions and maximize the factors that increase contribution in order to manage the tension of simultaneously encouraging contribution and competition.
Social implications
The research study makes key theoretical contributions to research. First, the results of this study help explain the individuals' contributory behavior in crowdsourcing contests from two aspects: joining and selecting a team and content contribution to the community. Second, the findings of this study suggest a revised and extended model of value co-creation, one that integrates this study’s findings with those of Nov et al. (2009), Lakhani and Wolf (2005), Wasko and Faraj (2000), Chen et al. (2018), Hahn et al. (2008), Dholakia et al. (2004) and Teichmann et al. (2015). Third, using direct accounts collected through first-hand interviews with crowdsourcing contest members, this study provides an in-depth understanding of individuals' contributory behavior. Methodologically, this authors’ approach was distinct from common approaches used in this research domain that used secondary datasets (e.g. the content of forum discussions, survey data) (e.g. see Lakhani and Wolf, 2005; Nov et al., 2009) and quantitative techniques for analyzing collaboration and contribution behavior.
Originality/value
The authors advance the broad field of crowdsourcing by extending the literature on value creation in the online community, particularly as it relates to the individual participants. The study advances the theoretical understanding of contribution in crowdsourcing contests by focusing on the members' point of view, which reveals both the determinants and the process for joining teams during crowdsourcing contests as well as the determinants of contribution to the content distributed in the community.
Details
Keywords
Hanieh Javadi Khasraghi, Xuan Wang, Jun Sun and Bahar Javadi Khasraghi
To obtain optimal deliverables, more and more crowdsourcing platforms allow contest teams to submit tentative solutions and update scores/rankings on public leaderboards. Such…
Abstract
Purpose
To obtain optimal deliverables, more and more crowdsourcing platforms allow contest teams to submit tentative solutions and update scores/rankings on public leaderboards. Such feedback-seeking behavior for progress benchmarking pertains to the team representation activity of boundary spanning. The literature on virtual team performance primarily focuses on team characteristics, among which network closure is generally considered a positive factor. This study further examines how boundary spanning helps mitigate the negative impact of network closure.
Design/methodology/approach
This study collected data of 9,793 teams in 246 contests from Kaggle.com. Negative binomial regression modeling and linear regression modeling are employed to investigate the relationships among network closure, boundary spanning and team performance in crowdsourcing contests.
Findings
Whereas network closure turns out to be a negative asset for virtual teams to seek platform feedback, boundary spanning mitigates its impact on team performance. On top of such a partial mediation, boundary spanning experience and previous contest performance serve as potential moderators.
Practical implications
The findings offer helpful implications for researchers and practitioners on how to break network closure and encourage boundary spanning with the establishment of facilitating structures in crowdsourcing contests.
Originality/value
The study advances the understanding of theoretical relationships among network closure, boundary spanning and team performance in crowdsourcing contests.
Details
Keywords
Worapan Kusakunniran, Sarattha Karnjanapreechakorn, Pitipol Choopong, Thanongchai Siriapisith, Nattaporn Tesavibul, Nopasak Phasukkijwatana, Supalert Prakhunhungsit and Sutasinee Boonsopon
This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could…
Abstract
Purpose
This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could classify input retinal images into a normal class or an abnormal class, which would be further split into four stages of abnormalities automatically.
Design/methodology/approach
The proposed solution is developed based on a newly proposed CNN architecture, namely, DeepRoot. It consists of one main branch, which is connected by two side branches. The main branch is responsible for the primary feature extractor of both high-level and low-level features of retinal images. Then, the side branches further extract more complex and detailed features from the features outputted from the main branch. They are designed to capture details of small traces of DR in retinal images, using modified zoom-in/zoom-out and attention layers.
Findings
The proposed method is trained, validated and tested on the Kaggle dataset. The regularization of the trained model is evaluated using unseen data samples, which were self-collected from a real scenario from a hospital. It achieves a promising performance with a sensitivity of 98.18% under the two classes scenario.
Originality/value
The new CNN-based architecture (i.e. DeepRoot) is introduced with the concept of a multi-branch network. It could assist in solving a problem of an unbalanced dataset, especially when there are common characteristics across different classes (i.e. four stages of DR). Different classes could be outputted at different depths of the network.
Details
Keywords
Abstract
Purpose
The purpose of this paper is to examine how communication practices influence individuals’ team assembly and performance in open innovation contests.
Design/methodology/approach
This study analyzed behavioral trace data of 4,651 teams and 19,317 participants from a leading open innovation platform, Kaggle. The analyses applied weighted least squares regression and weighted mediation analysis.
Findings
Sharing online profiles positively relates to a person’s performance and likelihood of becoming a leader in open innovation teams. Team assembly effectiveness (one’s ability to team up with high-performing teammates) mediates the relationship between online profile sharing and performance. Moreover, sharing personal websites has a stronger positive effect on performance and likelihood of becoming a team leader, compared to sharing links to professional social networking sites (e.g. LinkedIn).
Research limitations/implications
As team collaboration becomes increasingly common in open innovation, participants’ sharing of their online profiles becomes an important variable predicting their success. This study extends prior research on virtual team collaboration by highlighting the role of communication practices that occur in the team pre-assembly stage, as an antecedent of team assembly. It also addresses a long-standing debate about the credibility of information online by showing that a narrative-based online profile format (e.g. a personal website) can be more powerful than a standardized format (e.g. LinkedIn).
Practical implications
Open innovation organizers should encourage online profile sharing among participants to facilitate effective team assembly in order to improve innovation outcomes.
Originality/value
The current study highlights the importance of team assembly in open innovation, especially the role of sharing online profiles in this process. It connects two areas of research that are previously distant, one on team assembly and one on online profile sharing. It also adds new empirical evidence to the discussion about online information credibility.
Details
Keywords
Recently, more and more attention has been put forth on the application and deep learning, due to the widespread practicability of neural network computation. The purpose of this…
Abstract
Purpose
Recently, more and more attention has been put forth on the application and deep learning, due to the widespread practicability of neural network computation. The purpose of this paper is developing an effective algorithm to automatically discover the optimal neural network architecture for several real applications.
Design/methodology/approach
The author proposes a novel algorithm, namely, progressive genetic-based neural architecture search (PG-NAS), as a solution to efficiently find the optimal neural network structure for given data. PG-NAS also employs several operations to effectively shrink the search space to reduce the computation cost and improve the accuracy validation.
Findings
The proposed PG-NAS could be utilized on several tasks for discovering the optimal network structure. The author reduces the demand of manual settings when implementing artificial intelligence (AI) models; hence, PG-NAS requires less human intervention than traditional machine learning. The average and top-1 metrics, such as error, loss and accuracy, are used to measure the discovered neural architectures of the proposed model over all baselines. The experimental results show that, with several real datasets, the proposed PG-NAS model consistently outperforms the state-of-the-art models in all metrics.
Originality/value
Generally, the size and the complexity of the search space for the neural network dominates the performance of computation time and resources. In this study, PG-NAS utilizes genetic operations to effectively generate the compact candidate set, i.e. fewer combinations need to be generated when constructing the candidate set. Moreover, by the proposed selector in PG-NAS, the non-promising network structure could be significantly pruned off. In addition, the accuracy derivation of each combination in the candidate set is also a performance bottleneck. The author develops a predictor network to efficiently estimate the accuracy to avoid the time-consuming derivation. The learning of the prediction process is also adjusted dynamically; this adaptive learning of the predictor could capture the pattern of training data effectively and efficiently. Furthermore, the proposed PG-NAS algorithm is applied on several real datasets to show its practicability and scalability.
Details
Keywords
This paper examines an apparent contrast in organizing innovation tournaments; seekers offer contestant-agnostic incentives to elicit greater effort from a heterogeneous pool of…
Abstract
Purpose
This paper examines an apparent contrast in organizing innovation tournaments; seekers offer contestant-agnostic incentives to elicit greater effort from a heterogeneous pool of contestants. Specifically, the study tests whether and how such incentives and the underlying heterogeneity in the contestant pool, assessed in terms of contestants' entry timing, are jointly associated with contestant effort. Thus, the study contributes to the prior literature that has looked at behavioral consequences of entry timing as well as incentives in innovation tournaments.
Design/methodology/approach
For hypothesis testing, the study uses a panel dataset of submission activity of over 60,000 contestants observed in nearly 200 innovation tournaments. The estimation employs multi-way fixed effects, accounting for unobserved heterogeneity across contestants, tournaments and submission week. The findings remain stable across a range of robustness checks.
Findings
The study finds that, on average, late entrant tends to exert less effort than an early entrant (H1). Results further show that the effort gap widens in tournaments that offer higher incentives. In particular, the effort gap between late and early entrants is significantly wider in tournaments that have attracted superior solutions from several contestants (H2), offer gain in status (H3, marginally significant) or offer a higher monetary reward (H4).
Originality/value
The study's findings counter conventional wisdom, which suggests that incentives have a positive effect on contestant behavior, including effort. In contrast, the study indicates that incentives may have divergent implications for contestant behavior, contingent on contestants' entry timing. As the study discusses, these findings have several implications for research and practice of managing innovation tournaments.
Details
Keywords
Diabetic retinopathy (DR) is one of the dangerous complications of diabetes. Its grade level must be tracked to manage its progress and to start the appropriate decision for…
Abstract
Purpose
Diabetic retinopathy (DR) is one of the dangerous complications of diabetes. Its grade level must be tracked to manage its progress and to start the appropriate decision for treatment in time. Effective automated methods for the detection of DR and the classification of its severity stage are necessary to reduce the burden on ophthalmologists and diagnostic contradictions among manual readers.
Design/methodology/approach
In this research, convolutional neural network (CNN) was used based on colored retinal fundus images for the detection of DR and classification of its stages. CNN can recognize sophisticated features on the retina and provides an automatic diagnosis. The pre-trained VGG-16 CNN model was applied using a transfer learning (TL) approach to utilize the already learned parameters in the detection.
Findings
By conducting different experiments set up with different severity groupings, the achieved results are promising. The best-achieved accuracies for 2-class, 3-class, 4-class and 5-class classifications are 86.5, 80.5, 63.5 and 73.7, respectively.
Originality/value
In this research, VGG-16 was used to detect and classify DR stages using the TL approach. Different combinations of classes were used in the classification of DR severity stages to illustrate the ability of the model to differentiate between the classes and verify the effect of these changes on the performance of the model.
Details
Keywords
Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This…
Abstract
Across disciplines, researchers and practitioners employ decision tree ensembles such as random forests and XGBoost with great success. What explains their popularity? This chapter showcases how marketing scholars and decision-makers can harness the power of decision tree ensembles for academic and practical applications. The author discusses the origin of decision tree ensembles, explains their theoretical underpinnings, and illustrates them empirically using a real-world telemarketing case, with the objective of predicting customer conversions. Readers unfamiliar with decision tree ensembles will learn to appreciate them for their versatility, competitive accuracy, ease of application, and computational efficiency and will gain a comprehensive understanding why decision tree ensembles contribute to every data scientist's methodological toolbox.
Details