Search results
1 – 10 of over 138000Panagiotis Barlas, Ivor Lanning and Cathal Heavey
Data science is the study of the generalizable extraction of knowledge from data. It includes a variety of components and develops on methods and concepts from many domains…
Abstract
Purpose
Data science is the study of the generalizable extraction of knowledge from data. It includes a variety of components and develops on methods and concepts from many domains, containing mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization and data warehousing aiming to extract value from data. The purpose of this paper is to provide an overview of open source (OS) data science tools, proposing a classification scheme that can be used to study OS data science software.
Design/methodology/approach
The proposed classification scheme is based on general characteristics, project activity, operational characteristics and data mining characteristics. The authors then use the proposed scheme to examine 70 identified Open Source Software. From this the authors provide insight about the current status of OS data science tools and reveal the state-of-the-art tools.
Findings
The features of 70 OS tools are recorded based on the criteria of the four group characteristics, general characteristics, project activity, operational characteristics and data mining characteristics. Interesting results came from the analysis of these features and are recorded here.
Originality/value
The contribution of this survey is development of a new classification scheme for examination and study of OS data science tools. In parallel, this study provides an overview of existing OS data science tools.
Details
Keywords
Sirje Virkus and Emmanouel Garoufallou
The purpose of this paper is to present the results of a study exploring the emerging field of data science from the library and information science (LIS) perspective.
Abstract
Purpose
The purpose of this paper is to present the results of a study exploring the emerging field of data science from the library and information science (LIS) perspective.
Design/methodology/approach
Content analysis of research publications on data science was made of papers published in the Web of Science database to identify the main themes discussed in the publications from the LIS perspective.
Findings
A content analysis of 80 publications is presented. The articles belonged to the six broad categories: data science education and training; knowledge and skills of the data professional; the role of libraries and librarians in the data science movement; tools, techniques and applications of data science; data science from the knowledge management perspective; and data science from the perspective of health sciences. The category of tools, techniques and applications of data science was most addressed by the authors, followed by data science from the perspective of health sciences, data science education and training and knowledge and skills of the data professional. However, several publications fell into several categories because these topics were closely related.
Research limitations/implications
Only publication recorded in the Web of Science database and with the term “data science” in the topic area were analyzed. Therefore, several relevant studies are not discussed in this paper that either were related to other keywords such as “e-science”, “e-research”, “data service”, “data curation”, “research data management” or “scientific data management” or were not present in the Web of Science database.
Originality/value
The paper provides the first exploration by content analysis of the field of data science from the perspective of the LIS.
Details
Keywords
Cleo Hughes Darden, Roni M. Ellington, Jigish Zaveri, Sanjay Bapna, Linda Akli, Stella Hargett, Prabir Bhattacharya, Ali Emdad and Asamoah Nkwanta
Muhammad Javed Ramzan, Saif Ur Rehman Khan, Inayat ur-Rehman, Muhammad Habib Ur Rehman and Ehab Nabiel Al-khannaq
In recent years, data science has become a high-demand profession, thereby attracting transmuters (individuals who want to change their profession due to industry trends) to this…
Abstract
Purpose
In recent years, data science has become a high-demand profession, thereby attracting transmuters (individuals who want to change their profession due to industry trends) to this field. The primary purpose of this paper is to guide transmuters in becoming data scientists.
Design/methodology/approach
An exploratory study was conducted to uncover the challenges faced by data scientists according to their educational backgrounds. An extensive set of responses from 31 countries was received.
Findings
The results reveal that skill requirements and tool usage vary significantly with educational background. However, regardless of differences in academic background, the data scientists surveyed spend more time analyzing data than operationalizing insight.
Research limitations/implications
The collected data are available to support replication in various scenarios, for example, for use as a roadmap for those with an educational background in art-related disciplines. Additional empirical studies can also be conducted specific to geographical location.
Practical implications
The current work has categorized data scientists by their fields of study making it easier for universities and online academies to suggest required knowledge (courses) according to prospective students' educational background.
Originality/value
The conducted study suggests the required knowledge and skills for transmuters to acquire, based on their educational background, and reports a set of motivational factors attracting them to adopt the data science field.
Details
Keywords
Ibrahim Oluwajoba Adisa, Danielle Herro, Oluwadara Abimbade and Golnaz Arastoopour Irgens
This study is part of a participatory design research project and aims to develop and study pedagogical frameworks and tools for integrating computational thinking (CT) concepts…
Abstract
Purpose
This study is part of a participatory design research project and aims to develop and study pedagogical frameworks and tools for integrating computational thinking (CT) concepts and data science practices into elementary school classrooms.
Design/methodology/approach
This paper describes a pedagogical approach that uses a data science framework the research team developed to assist teachers in providing data science instruction to elementary-aged students. Using phenomenological case study methodology, the authors use classroom observations, student focus groups, video recordings and artifacts to detail ways learners engage in data science practices and understand how they perceive their engagement during activities and learning.
Findings
Findings suggest student engagement in data science is enhanced when data problems are contextualized and connected to students’ lived experiences; data analysis and data-based decision-making is practiced in multiple ways; and students are given choices to communicate patterns, interpret graphs and tell data stories. The authors note challenges students experienced with data practices including conflict between inconsistencies in data patterns and lived experiences and focusing on data visualization appearances versus relationships between variables.
Originality/value
Data science instruction in elementary schools is an understudied, emerging and important area of data science education. Most elementary schools offer limited data science instruction; few elementary schools offer data science curriculum with embedded CT practices integrated across disciplines. This research assists elementary educators in fostering children's data science engagement and agency while developing their ability to reason, visualize and make decisions with data.
Details
Keywords
Atri Sengupta, Shashank Mittal and Kuchi Sanchita
Rapid advancement of data science has disrupted both business and employees in organizations. However, extant literature primarily focuses on the organizational level phenomena…
Abstract
Purpose
Rapid advancement of data science has disrupted both business and employees in organizations. However, extant literature primarily focuses on the organizational level phenomena, and has almost ignored the employee/individual perspective. This study thereby intends to capture the experiences of mid-level managers about these disruptions vis a vis their corresponding actions.
Design/methodology/approach
In a small-sample qualitative research design, Interpretative Phenomenological Analysis (IPA) was adopted to capture this individual-level phenomenon. Twelve mid-level managers from large-scale Indian organizations that have extensively adopted data science tools and techniques participated in a semi-structured and in-depth interview process.
Findings
Our findings unfolded several perspectives gained from their experiences, leading thereby to two emergent person-job (mis)fit process models. (1) Managers, who perceived demands-abilities misfit (D-A misfit) as a growth-alignment opportunity vis a vis their corresponding actions, which effectively trapped them into a vicious cycle; and (2) the managers, who considered D-A misfit as a psychological strain vis a vis their corresponding actions, which engaged them into a benevolent cycle.
Research limitations/implications
The present paper has major theoretical and managerial implications in the field of human resource management and business analytics.
Practical implications
The findings advise managers that the focus should be on developing an organizational learning eco-system, which would enable mid-level managers to gain their confidence and control over their job and work environment in the context of data science disruptions. Importantly, organizations should facilitate integrated workplace learning (both formal and informal) with an appropriate ecosystem to help mid-level managers to adapt to the data-science disruptions.
Originality/value
The present study offers two emergent cyclic models to the existing person–job fit literature in the context of data science disruptions. A scant attention of the earlier researchers on how individual employees actually experience disruption, and the corresponding IPA method used in the present study may add significant value to the extant literature. Further, it opens a timely and relevant future research avenues in the context of data science disruptions.
Details
Keywords
Sirje Virkus and Emmanouel Garoufallou
Data science is a relatively new field which has gained considerable attention in recent years. This new field requires a wide range of knowledge and skills from different…
Abstract
Purpose
Data science is a relatively new field which has gained considerable attention in recent years. This new field requires a wide range of knowledge and skills from different disciplines including mathematics and statistics, computer science and information science. The purpose of this paper is to present the results of the study that explored the field of data science from the library and information science (LIS) perspective.
Design/methodology/approach
Analysis of research publications on data science was made on the basis of papers published in the Web of Science database. The following research questions were proposed: What are the main tendencies in publication years, document types, countries of origin, source titles, authors of publications, affiliations of the article authors and the most cited articles related to data science in the field of LIS? What are the main themes discussed in the publications from the LIS perspective?
Findings
The highest contribution to data science comes from the computer science research community. The contribution of information science and library science community is quite small. However, there has been continuous increase in articles from the year 2015. The main document types are journal articles, followed by conference proceedings and editorial material. The top three journals that publish data science papers from the LIS perspective are the Journal of the American Medical Informatics Association, the International Journal of Information Management and the Journal of the Association for Information Science and Technology. The top five countries publishing are USA, China, England, Australia and India. The most cited article has got 112 citations. The analysis revealed that the data science field is quite interdisciplinary by nature. In addition to the field of LIS the papers belonged to several other research areas. The reviewed articles belonged to the six broad categories: data science education and training; knowledge and skills of the data professional; the role of libraries and librarians in the data science movement; tools, techniques and applications of data science; data science from the knowledge management perspective; and data science from the perspective of health sciences.
Research limitations/implications
The limitations of this research are that this study only analyzed research papers in the Web of Science database and therefore only covers a certain amount of scientific papers published in the field of LIS. In addition, only publications with the term “data science” in the topic area of the Web of Science database were analyzed. Therefore, several relevant studies are not discussed in this paper that are not reflected in the Web of Science database or were related to other keywords such as “e-science,” “e-research,” “data service,” “data curation” or “research data management.”
Originality/value
The field of data science has not been explored using bibliographic analysis of publications from the perspective of the LIS. This paper helps to better understand the field of data science and the perspectives for information professionals.
Details
Keywords
This paper aims to review and critically assess the role that data visualizations played as communication media tools to help society during a worldwide crisis. This paper…
Abstract
Purpose
This paper aims to review and critically assess the role that data visualizations played as communication media tools to help society during a worldwide crisis. This paper re-creates and analyzes several visualizations, critically and ethically assesses their strengths and limitations and provides a set of best practices that are informative, accurate, ethical and engaging at each stage in a reader’s interest.
Design/methodology/approach
The paper bases its methodology on the construct of “The Network Society” (Van Dijk, 2006; Castells, 2000, 2006) by creating a series of social networked visualizations, identifying the challenges and pitfalls associated with this communication approach and suggesting best practices in information communication technology. The case study is COVID-19.
Findings
The research in this study found that visual data dashboards and interactive Web-based charts did play a significant role in helping society understand COVID-19’s impact to make better informed decisions about society’s health and safety.
Research limitations/implications
Visual expositions of data do have strengths and weaknesses depending on how they are designed, how they communicate the story and how they are ethically deployed. Best practices are provided to help mitigate these limitations.
Practical implications
Visualizations are certainly not new, but the technology for rapidly developing and sharing them is new. Visual expositions provide an effective media for communicating complex information to a networked society.
Social implications
Visual expositions provide an effective media for communicating complex information to a networked society.
Originality/value
This paper highlights the significance of the need to understand complex data in a crisis in a visual format and to communicate the information quickly, persuasively, effectively and ethically to a networked audience.
Details
Keywords
Dan Avrahami, Dana Pessach, Gonen Singer and Hila Chalutz Ben-Gal
What do antecedents of turnover tell us when examined using human resources (HR) analytics and machine-learning tools, and what are the respective theoretical and practical…
Abstract
Purpose
What do antecedents of turnover tell us when examined using human resources (HR) analytics and machine-learning tools, and what are the respective theoretical and practical implications? Although the turnover literature is expansive, empirical evidence on turnover antecedents studied using data science tools remains limited.
Design/methodology/approach
To help reinvigorate research in this field, the authors propose a novel examination of turnover antecedents—competencies, commitment, trust and cultural values—using big data tools to develop a granular, case-dependent measure of turnover.
Findings
Using archival data from 700,000 employees of a large organization collected over a period of ten years, the authors find that turnover is generally associated with varying levels of these antecedents. However, in more fine-grained analysis, their relation to turnover is contingent upon role, person and cultural background.
Originality/value
The authors discuss the implications on turnover and strategic HR research and the potential of Artificial Intelligence and machine-learning methods in the design and implementation of managerial and HR planning initiatives.
Details
Keywords
Data science lacks a distinctive identity and a theory-informed approach, both for its own sake and to properly be applied conjointly to the social sciences. This paper’s purposes…
Abstract
Purpose
Data science lacks a distinctive identity and a theory-informed approach, both for its own sake and to properly be applied conjointly to the social sciences. This paper’s purposes are twofold: to provide (1) data science an illustration of theory adoption, able to address explanation and support prediction/prescription capacities and (2) a rationale for identification of the key phenomena and properties of data science so that the data speak through a contextual understanding of reality, broader than has been usual.
Design/methodology/approach
A literature review and a derived conceptual research model for a push–pull approach (adapted for a data science study in the management field) are presented. A real location–allocation problem is solved through a specific algorithm and explained in the light of the adapted push–pull theory, serving as an instance for a data science theory-informed application in the management field.
Findings
This study advances knowledge on the definition of data science key phenomena as not just pure “data”, but interrelated data and datasets properties, as well as on the specific adaptation of the push-pull theory through its definition, dimensionality and interaction model, also illustrating how to apply the theory in a data science theory-informed research. The proposed model contributes to the theoretical strengthening of data science, still an incipient area, and the solution of the location-allocation problem suggests the applicability of the proposed approach to broad data science problems, alleviating the criticism on the lack of explanation and the focus on pattern recognition in data science practice and research.
Research limitations/implications
The proposed algorithm requires the previous definition of a perimeter of interest. This aspect should be characterised as an antecedent to the model, which is a strong assumption. As for prescription, in this specific case, one has to take complementary actions, since theory, model and algorithm are not detached from in loco visits, market research or interviews with potential stakeholders.
Practical implications
This study offers a conceptual model for practical location–allocation problem analyses, based on the push–pull theoretical components. So, it suggests a proper definition for each component (the object, the perspective, the forces, its degrees and the nature of the movement). The proposed model has also an algorithm for computational implementation, which visually describes and explains components interaction, allowing further simulation (estimated forces degrees) for prediction.
Originality/value
First, this study identifies an overlap of push–pull theoretical approaches, which suggests theory adoption eventually as mere common sense, weakening further theoretical development. Second, this study elaborates a definition for the push–pull theory, a dimensionality and a relationship between its components. Third, a typical location–allocation problem is analysed in the light of the refactored theory, showing its adequacy for that class of problems. And fourth, this study suggests that the essence of a data science should be the study of contextual relationships among data, and that the context should be provided by the spatial, temporal, political, economic and social analytical interests.
Details