The composition of data economy: a bibliometric approach and TCCM framework of conceptual, intellectual and social structure

Purpose – The data economy mainly relies on the surveillance capitalism business model, enabling companies to monetize their data. The surveillance allows for transforming private human experiences into behavioral data that can be harnessed in the marketing sphere. This study aims to focus on investigating the domain of data economy with the methodological lens of quantitative bibliometric analysis of published literature. Design/methodology/approach – The bibliometric analysis seeks to unravel trends and timelines for the emergence of the data economy, its conceptualization, scienti ﬁ c progression and thematic synergy that could predict the future of the ﬁ eld. A total of 591 data between 2008 and June 2021 were used in the analysis with the Biblioshiny app on the web interfaced and VOSviewer version 1.6.16 to analyze data from Web of Science and Scopus. Findings – This study combined ﬁ ndable, accessible, interoperable and reusable (FAIR) data and data economy and contributed to the literature on big data, information discovery and delivery by shedding light on the conceptual, intellectual and social structure of data economy and demonstrating data relevance as a key strategic asset for companies and academia now and in the future. Research limitations/implications – Findings from this study provide a steppingstone for researchers who may engage in further empirical and longitudinal studies by employing, for example, a quantitative and systematic review approach. In addition, future research could expand the scope of this study beyond FAIR data and data economy to examine aspects such as theories and show a plausible explanation of several phenomena in the emerging ﬁ eld. can use the results of this study as a steppingstone for further empirical and longitudinal studies. Originality/value – This study con ﬁ rmed the relevance of data to society and revealed some gaps to be undertaken for the future.


Introduction
Data refers to either textual or numeric units of information presented using specific machine language systems that enable interpretation by suitable technologies (Monino, 2016). The volume of data is continuously increasing following the proliferation of digital technologies, including smartphones, The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/2398-6247.htm Information Discovery and Delivery Emerald Publishing Limited [ISSN 2398-6247] [DOI 10.1108/IDD-02-2022-0014] web services and social networks. The advancement of diver's technologies and diffusion of innovation has increased data generation of big data in academia, industry and society. Data economy as an offshoot of big data has created opportunities for data partnership and better environment, cost and userfriendly services. This development is clear visibility of hidden innovation, as suggested by Edwards-Schachter and Wallace (2017). The data economy is at the epicenter of society, from data generation, data cleaning and data engineering to innovative products and services. Through the data economy, ecosystems of the data-driven small, medium and large companies stand to allay consumers' fear of legal, privacy and security issues. Small, medium and large companies can also create a sustainable data moat for competitive advantage and to centralize their data assets with the intervention of artificial intelligence and other emerging technology. One of the earlier studies suggested prioritizing data product or service; it is essential to identify the available opportunity, build the product or design the service, evaluate the first two stages and iterate based on data and the user feedback (Glassberg, 2018). To maximize the data economy, the timely intervention of governments and societies on the political, economic and social impacts of data-driven artificial intelligence is crucial. This bibliometric study probed into data economy, value and gaps for academia and the practicing managers in a changing landscape of big data opportunity. Therefore, the following research questions guide this study: RQ1. How has data economy been conceptualized and presented by scholars?
RQ2. What are the intellectual outputs and contributions of scholars in data economy?
RQ3. What social synergy and collaborations exist in the domain of the data economy?
This study contributes to the literature on data economy in multiple ways and presents implications for educators, academic researchers, managers and policymakers. For educators, our analysis provides current teaching materials on essential areas of data economy, providing pedagogical insight relevant to enhancing students' teaching and learning experiences. Our study provides a comprehensive overview of the current state of research on the data economy for academic researchers. We reviewed 591 articles exploring how data economy has been conceptualized and presented by scholars. Our analysis suggested that scientific output about data economy has remained on the rising curve, suggesting that the emerging field has the potential to grow significantly on an annual basis. Based on the theories, contexts, characteristics and methodology (TCCM) framework, our review finds that dynamic capability theory has been a dominant theoretical underpinning in the data economy research stream. Data managers gain insights from our comprehensive business model that harmonizes the ethical, legal, technology and societal issues. This proposed business model will proffer solutions to the existing teething problems of the data economy. For policymakers, we posited that the timely intervention of governments and societies on the political, economic and social impacts of data-driven artificial intelligence is crucial; we, therefore, provided insight into developing data-driven strategies that make the stakeholders proactive. Based on the insights gained from our literature review, we develop an agenda for future research, outlining topics and potential research questions based on the TCCM framework. We suggest future research to expand on the intersection of sharing, platform and data economy. In addition, researchers can test existing theories and show a plausible explanation for their investigation of the data economy.

Review of the literature
There are numerous sources of data, and companies, financial institutions and health service providers generate large amounts of data through their interactions with suppliers, customers and employees. Data is a crucial factor in production that complements physical capital and labor (Opher et al., 2016). It is nondepletable, and its increased use increases its value. As an asset, its value can deplete over time, as the data becomes less relevant, and its value depends on its unique characteristics. Data is also regarded as a nonrival asset, as multiple users can use it simultaneously (Agata, 2020). However, it is not automatically labeled as a public good because the data owners reserve the right to exclude individuals from using it, further increasing its value. According to Nobre and Tavares (2017), data can be produced and stored at low costs and households, businesses and individuals constitute the major producers and consumers of data.
2.1 FAIR data FAIR data refers to findable, accessible, interoperable and reusable data (Dunning et al., 2017). The characteristics of findable, accessible, interoperable and reusable data (FAIR) data must adhere to the FAIR principles, which are used to determine the levels of compliance. This data is assigned a globally unique and persistent identifier and is simple to execute. Suitable examples of the persistent identifier include the digital object identifier, HANDLE (a unique and persistent identifier for Internet resources) and uniform resource name systems (Dunning et al., 2017). FAIR data is also characterized by several other facets, including being indexed or registered in a searchable resource, and it must be accompanied by a description comprising different attributes.
According to Tanhua et al. (2019), FAIR data enables effective data management through the collaboration of various activities, including quality assurance and control, observations, metadata and data assembly and data publication. Effective data management aims to enhance local and interoperable data discovery access and secures archiving, resulting in long-term preservation. FAIR data is becoming a crucial tool for enabling digital transformation by supporting research and development (Wise et al., 2019). It capitalizes on analytics tools such as machine learning and artificial intelligence to enable automatic and scalable access to data and support continuous learning. Wise et al. (2019) established that the successful implementation of FAIR data principles would amplify the value of data sets within the companies and external public data by enhancing its discovery and accessibility for humans and machines. This advantage capitalizes on the ranking and rating capabilities of machine learning through algorithmic decisionmaking. According to Lahoti et al. (2019), algorithmic decision-making is continuously becoming pervasive in all aspects of life. Implementing FAIR data principles will help address its societal and ethical concerns.

Open data
Open Data (OD) is scientific data that can be published and reused without any permission or price barriers (Murray-Rust, 2008). It involves publishing data in reusable formats and enhances engagement and innovativeness. Advocacy for OD mainly focuses on the need to increase cyber scholarship. Molloy (2011) supports this assertion by observing that OD enhances science, increasing transparency and societal benefits. OD can be processed and analyzed using data mining tools and automated text analysis to derive valuable findings on business innovation drivers (Molloy, 2011).
According to Huijboom and Van den Broek (2011), OD strategies increase transparency and efficiency in data management. It also fosters services and products innovation. Companies can use available public data to create new businesses, especially digital services, by converting their creativity and ideas into practical solutions to daily challenges (Huijboom and Van den Broek, 2011). Reichman et al. (2011) indicate that OD is advantageous because its deployment is bound to enhance and accelerate scientific advancements. Linked data methods avail suitable ways to connect data from distributed sites via standard Web technologies (Reichman et al., 2011). It scales above human limitations by enabling new and improved types of synthetic data studies conducted on larger scales.
Implementing OD policies is meant to stimulate and control data publication to enhance advantageous use. They are mainly implemented within the government systems to increase participation, self-empowerment, social inclusion and interaction . These positive attributes will stimulate economic growth in the countries by supporting business innovation. The availability of OD has been continuously increasing because of the increased pressure on public organizations to publish their data (Janssen et al., 2012). The major motivation for this increase is that increasing access to publicly funded data will increase returns on public investments and enhance wealth generation by using this data to address complex problems. Kassen (2013) also indicates that OD provides a helpful platform for promoting civic engagement and enhancing research and hypothesis testing.

Data ecosystems
Data ecosystems entail the sociotechnical complex networks that allow actors to interact and collaborate to discover, publish, archive, consume or reuse data (Oliveira et al., 2019). These networks also enable them to create value, foster innovation and support new businesses. Oliveira et al. (2019) further established that the emergence of data ecosystems had been influenced by digital technologies that enhance OD production and consumption. The digital technologies supporting data systems include the Internet of Things (IoT), Web technologies and data analytics technologies. Data ecosystems also address the need for a feedback loop between the data providers and data users.
According to Rantanen et al. (2019), the significance of data ecosystems has been increasing based on the capability to enrich, use and reuse big datasets by third parties. Various data ecosystems have been formed by groups such as governments, industries and public-private partnerships. Rantanen et al. (2019) further established that the data ecosystem has immense potential to provide sustainability in business and enhance competitive advantages. Data ecosystems are formed in different ways and contribute to creating value that individual participants could not realize (Ding et al., 2011). Its key benefits revolve around the sharing of vital resources. These attributes created new business opportunities and increased access to knowledge and data.
Various data ecosystems exist, including directed data ecosystems, collaborative data ecosystems, acknowledged data ecosystems and virtual data ecosystems. Directed data ecosystems are characterized by centralized control structures and are expected within organizational settings (Curry and Sheth, 2018). Acknowledged data ecosystems comprise distributed participants, while virtual data ecosystems focus on pooling decentralized resources to meet specific goals. According to , OD ecosystems contribute to realizing the benefits of OD and value creation. They capitalize on the original basis of an ecosystem that enhances interdependencies among partners in exchange networks.

Datafication
Datafication entails quantifying human life through digital data for economic value (Mejias and Couldry, 2019). It is applicable within the business and social sciences domains, whereby the data is put in a quantified form for tabulation and analysis. Datafication extends beyond data digitization to make digital data indexable and easily searchable. It enables large-scale processing of various aspects of human life through specific forms of automatic analysis (Ruckenstein and Schüll, 2017). Datafication concept was initially applied in business, and up to date, the amount of commercial data generated exceeds that obtained through the datafication of social life.
Crucial areas in the commercial sector, such as logistics, have continuously advanced to become complex business practices because of datafication. According to Mai (2016), datafication is advantageous because it allows for sophisticated data analysis across large data sets. Its use is predicted to escalate because many digital devices are increasingly becoming connected to the internet (Mai, 2016). This transformation will allow for the digitization of all activities and further extend the scope of datafication. It will result in numerous advantages to the business world because the increased possibilities for analysis enable businesses to create new forms of value. Mai (2016) further established that datafication and predictive analysis would also escalate because more organizations will appreciate the potential to collect and compute user-generated information.
Datafication is widely deployed in social media marketing, whereby metrics from the individuals' use of social networks are quantified to determine market trends (Dourish and G omez Cruz, 2018). This aspect relates to transforming social actions into online quantified data, which allows for predictive analysis and real-time tracking (Mayer-Schoenberger and Cukier, 2013). Datafication has gradually transformed into a new paradigm for comprehending social behavior (Van Dijck, 2014). This aspect has quantified various social data such as interests, friendships, information searches, emotional searches and casual conversations.

Data economy
The data economy comprises an ecosystem of organizations that use data as their business's main object or source (Opher et al., 2016). It is centered on the production, consumption and distribution of digital data. The data economy thrives on the rapid advancements in digital technologies, especially machine learning, automation and artificial intelligence (Opher et al., 2016). There are no clear distinctions between the producers and consumers in the data economy because supply and demand do not automatically determine the price (Zech, 2016). Data is regarded as a valuable economic resource and players within the data economy can share nonpersonal data to boost economic growth and enhance innovation and interoperability (Bonti et al., 2021). The data economy enables players to derive value from data by transforming it into applications, insights and services. Bonti et al. (2021) further indicate that participation in the data economy also enables organizations to attain the full potential of their data. This potential mainly arises from sharing data within the intercompany systems, developing new capabilities and deploying emerging technologies. Consumers and companies constitute the leading players in the data economy, as they contribute to the value chain associated with data production, collection and analysis (Allen, 2016).
The data economy mainly relies on the surveillance capitalism business model, enabling companies to monetize their data. In this business model, commodities being sold constitute personal data, and this data is collected and produced through massive internet surveillance (Zuboff, 2015). The surveillance allows for transforming private human experiences into behavioral data that can be harnessed in the marketing sphere. Monetization of personal data has led to the emergence of the personal data economy, allowing individuals to share their data with businesses (Elvy, 2017). According to Lammi Pantzar (2019), the personal data economy is supported by the rapid increase in mobile and handheld devices. The devices collect personal data such as geographical information, consumer purchase behaviors and other online metrics (Lammi and Pantzar, 2019). This personal data comprising the consumer's digital tracks and actions is a viable source of economic value creation through the digital economy.

Methodology
This study focuses on investigating the domain of data economy with the methodological lens of quantitative bibliometric analysis of published literature. The bibliometric analysis seeks to unravel trends and timelines for the emergence of the data economy. Its conceptualization, scientific progression and thematic synergy could predict the field's future. A three-step approach demonstrated in recent studies to conduct a bibliometric analysis was followed by Aria and Cuccurullo (2017) and Agbo et al. (2021a), which consists of: 1 article selection and data gathering process; 2 data extraction, loading and conversion process; and 3 data synthesis process.
In this study, the three main software used were RStudio, Biblioshiny developed by Aria & Cuccurullo and VOSviewer version 1.6.16 by Van Eck & Waltman. The RStudio software is an open-source solution for data science analysis and can be downloaded free from their official website. Biblioshiny is a Web tool that can be launched from the RStudio to provide a Web interface for data visualization and VOSviewer is an opensource software downloadable and used on desktop computers.
The impact of TCCM has been emphasize in review literature . This study integrates TCCM into a bibliometric study to deepen the understanding of dominant theories, contexts, characteristics and methods employed in data economy research over the past decade (2008 -2021). The study draws on relevance of theory in domain research as demonstrated by Paul and Feliciano-Cestero (2021) and  that posits that TCCM is efficient for theme-based reviews and emphasizes the impact of TCCM and reviews that develop theories. Based on the earlier proposition on TCCM, this study used TCCM to identify the global used theories, contexts, insightful variables and methods to strike a balance in data economy research and to propose a new direction for future research through gaps identified through TCCM.

Research design
Bibliometric research designs help ascertain the alignment of the data collected and the choice of the data analysis technique (Agbo et al., 2021c). The study starts with a clear idea of research questions that require further investigation. The study chose descriptive and correlation as a subset of quantitative research design. This design intends to give a clearer picture of trends, characteristics and the relationship between authors, coauthors, institutions and stakeholders of FAIR data and data economy through existing literature. Further, the study defined the focus of the study (FAIR data and data economy) and the literature inclusion and exclusion. The study adopts secondary data to expand the scope of the existing literature. In addition, the study used search engines of two extensive databases to collect consistent, accurate and unbiased data and ensure that the results of this study can be easily reproduced (Lai et al., 2020), measured all the necessary concepts and correlate with different measures of the same concept. The data used is well organized and backed up in the cloud for easy data analysis and other researchers' validation and inputs. Olaleye (2020) summarized this process as bibliometric systematic workflow and divided the workflow into six distinct parts: 1 research design; 2 bibliometric data source; 3 bibliometric data analysis; 4 data visualisation; 5 result and interpretation; and 6 conclusion.
Biblioshiny and VOSviewer were used for the data analysis (see details below).

Article selection and data gathering process
The data used in this study were retrieved on June 12, 2021, from Elsevier's Scopus and Clarivate Analytics Web of Science (WoS) databases, respectively. According to Saqr et al. (2021), the Scopus database warehouses contains over 70 million peerreviewed articles, whereas, according to the Clarivate website, as of June 2021, WoS contains over 81 million records consisting of published materials from life sciences, biomedical sciences, engineering, social sciences, arts and humanities. Rather than collecting data from a single database to conduct a quantitative bibliometric analysis as showcased in Agbo et al., 2021b;Aria and Cuccurullo, 2017). This current study undertook a different approach by collecting data from the two earlier mentioned databases to contain the most relevant articles (Agbo et al., 2021a), which can sufficiently represent the scope of the field under investigation. The main rationale behind using two databases for the data sources was to minimize the tendency to leave out relevant data and conduct an in-depth analysis. On the other hand, the risk of collecting duplicate data in this kind of approach is mitigated by the data conversion processes, as shown in a subsequent section.
The search terms "fair data" OR "data economy" were used to search the two databases. Mainly, the search strings were applied to the title, keywords and abstract metadata of the documents. The structure of the query as used in the respective database search engine is shown in Table 1.
Furthermore, the search was limited to only documents classified as articles and conference proceedings in both databases. The authors decided to limit the data to articles and conference proceedings to allow for analysis that could provide deeper scientific insight because documents from these data points are peer reviewed. Figure 1 presents the preferred reporting items for systematic reviews and meta-analyses workflow of the data collection and screening process.

Data extraction, loading and conversion process
The data extraction, loading and conversion process are explained following the steps provided by Agbo et al. (2021a). The Bibliometrix R Library (Aria and Cuccurullo, 2017) was used to combine the two data points. The script for the conversion and combination of the data is shown here in lines 1-10 in Table 2.
Line 1 in Table 2 creates an instance of the directory where the downloaded data from the Scopus and WoS are stored and the combined result. For a detailed explanation of the scripts in each line, we refer readers to this previous study by Agbo et al. (2021a). WoS ("fair data" OR "data economy" OR "fair data" OR "data economy") Refined by: DOCUMENT TYPES: (ARTICLE OR PROCEEDINGS PAPER) Timespan: All years. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI.

393
Scopus (TITLE-ABS-KEY ("fair data" OR "data economy") OR TITLE-ABS-KEY ("fair data" OR "data economy")) AND (LIMIT-TO (DOCTYPE, "ar") OR LIMIT-TO (DOCTYPE, "cp")) 558 Figure 1 PRISMA workflow showing the data collection and screening for this study Table 2 Lines of instructions for converting and combining two data sources using RStudio software

Command line Command
Line 1: setwd("C:/Users/Intel/Desktop/. . ./de2") Line 2: getwd() Line 3: DigitScopus2 = convert2df"scopus.bib", dbsource="scopus", format="bibtex") Line 4: View(DigitScopus2) Line 5: digitwos2 = convert2df ("wos.bib", dbsource = "isi", format = "bibtex") Line 6: View(digitwos2) Line 7: CombinedData = mergeDbSources (DigitScopus2, digitwos2, remove.duplicated = TRUE) Line 8: View (CombinedData) Line 9: dim(CombinedDatabase) Line 10: library (openxlsx) Line 11: write.xlsx (CombinedData, file = "XlsCombinedData.xlsx") Line 7 is the command to combine these data and remove any duplicates. After executing this command in the case of this study, 324 duplicated documents were removed, leaving 627 data used for the data analysis. In order words, only 69 documents were distinct that are indexed in Scopus but not in WoS. This difference implies that when a bibliometric analysis is conducted with a single database such as Scopus or WoS alone, relevant data are left out, which may significantly impact the result of the study. Therefore, our choice of using two databases accords with the previous study by Agbo et al. (2021a) justifies this finding. Moreover, a closer review of the resulting data from our conversion and combination shows some irrelevant documents that can be dropped. As shown in Table 3, the data timespan is from 1970 to 2021. Because the data economy, in our opinion, is an emerging domain in the 21st century, it will make more sense to analyze somewhat recent data. Therefore, we delimited the data between 2008 and 2021 using the Biblioshiny filter function, and the results are presented in Table 4.

Data synthesis process
A total of 591 data between 2008 and 2021 were used in the analysis, as shown in Table 4. These documents emerged from journals, books, conference proceedings and book chapters. In addition, the data set consists of 2,353 authors, among which 112 are single authors. On the other hand, 1,656 authors have distinct keywords.

Data analysis
The study used the Biblioshiny app on the Web interfaced and VOSviewer version 1.6.16 to analyze Scopus and WoS data. First, the study checks for the descriptive values of the literature (details in Table 4). Second, the study carried out analytics and plots based on three different metrics: sources, authors and documents. Third, the study analyzed three knowledge structures of concepts and intellectual and social relationships. Fourth, the study used VOSviewer to filter the theories and methodologies used in FAIR data and data economy literature and later used the values to plot charts in Microsoft Excel for clarity.
Further, the study used VOSviewer for country mapping. We noticed that mapping countries using Biblioshiny could be problematic where it is not easy to differentiate countries, for example, China from Taiwan, whereas, in VOSviewer, they are treated separately. The data analysis generates literature mapping insights discussed later in the study in section four.

Results and discussion
In this section, we present the results of this study and discuss them based on the research questions to aid the flow of information and understanding.

Conceptual structure of data economy
The conceptual structure of a knowledge domain deals with the representation of concepts to describe specific classification, interrelationships and even taxonomy that can enhance interpretation and understanding of the domain. Because data economy is a vast but emerging area of diverse interest, as highlighted in the background section, this section tries to present its emergence from the conceptual point of view where the underpinning theories and scientific production of articles in the field are analyzed. In addition, the thematic clusters and field evolution based on authors' keywords are examined.
RQ1. How has data economy been conceptualized and presented by scholars? This section begins by examining how data economy has been conceptualized from the perspective of theories through the lens of TCCM framework. The TCCM has recently gained momentum in systematic reviews and bibliometric analysis studies (Sharma et al., 2020;Olaleye et al., 2021). In this framework, scholars posit that the popularity of the TCCM is hinged on the ease with which knowledge gaps can be spotted and a logical process of recommending future research direction (Singh et al., 2020). Thus, we followed the TCCM framework in this study, as detailed in the subsequent sections.

Theory development
The stream of theoretical developments in data economy research embraces theories from different disciplines ( Table 5). The integrated theories are trust, security, privacy (Meijer et al., 2014;Kobayashi et al., 2018) (Yıldırım et al., 2021) and attitudes (Tenopir et al., 2020;Baždari c et al., 2021). Figure 2 displays the prominent theories used in data economy research. Because of data dimensions, Zhao and Fan (2018) differentiated open government data into tangible, human and intangible. The study found that culture plays a crucial role in its open government data capacity. Trust has been a dominant theory in the information systems and data economy research stream (Meijer et al., 2014). Trust implies the willingness of parties (OD generators and users) to rely on each party's ability to generate and use OD transparently. This transparency, therefore, implies that the security and privacy of relevant stakeholders must be protected.
Our literature search also reveals that the dynamic capability theory has been a dominant theoretical underpinning in the data economy research stream. The core proposition of the dynamic capabilities' theory holds that firms must be able to develop their short-term competitive positions into long-term competitive advantages to survive in the face of rapidly changing business climates (O'Connor, 2008). The use and application of OD to solve different challenges also trigger uncertainties to traditionally held norms. Lee and Yoo (2019) align the dynamic capabilities theory with opening innovation and argue that survival means that firms must develop the ability to identify opportunities and threats and explore skills necessary to detect and harness market opportunities.

Context
The application of big data transcends many contexts. As our review shows, scholars have adopted OD in industrial applications (Huang et al., 2021), entrepreneurship (Aridi et al., 2021); blockchain (Hu et al., 2021), vehicle design (Urquhart et al., 2021) and health (Geneviève et al., 2021;Ochs et al., 2021). Our findings make it quite challenging to state which domains enjoy the most research output. This unclarity is because, as argued by (Alencar et al., 2014), the application of open is multisectoral, so there is seemingly a bandwagon attempt by various stakeholders to leverage the opportunities present in its adoption (Janssen et al., 2012;. As the contexts of OD applications are diverse, its benefits percolate across these contexts. Janssen et al. (2012) categorized these benefits into three thematic areas: political and social, economic and operational and technical. Per political and social benefits, they argue that OD has, among other things, ushering in more transparency, trust in government and public engagement. It has stimulated economic growth and competitiveness, new product and service design and open innovation for economic benefits. Operational and technical benefits include improvement in public policies, external policy checks by the public and the ability to reuse data. Note: Based on the occurrences and total link strength in Table 5, there is dearth of theories in data economy research stream Figure 2 Theoretical foundations of data economy research TCCM framework

Characteristics (C)
Our literature analysis also reveals the characteristics of the data economy structure and its composition. As shown in Figure 3, four themes generated by the Biblioshiny app emerged and are broadly categorized into verticaldevelopment degree (density) and horizontalrelevance (centrality). The four thematic areas identified by our literature are motor themes (upper right quadrant), niche themes (upper left quadrant), basic themes (lower right quadrant) and emerging or declining themes (lower left quadrant).
At the upper end of the significant themes are machine learning, artificial intelligence and the IoT, while data science is identified at the lower end. Several studies have identified the ever-evolving field of machine learning as critical to making sense of OD (Celi et al., 2019), just as machine learning improves itself through data sets. The clustering of machine learning and artificial intelligence confirms that artificial intelligence adopts machine learning to solve problems. In line with the basic themes, a plethora of studies have hinged the usability of OD on FAIR data principles and the ability of users to freely access these data sets (Janssen et al., 2012). Further studies would need to pay closer attention to the concepts categorized as niche and emerging or declining themes. For example, further exploration of wireless sensor networks, congestion control and FAIR data collection is necessary. Again, artificial neural networks and indoor positioning require further exploration based on their classification as niche themes.

Methodologies
Machine learning as a subfield of artificial intelligence data analysis techniques dominated the quantitative methodological approaches of extant studies (Figure 4). Machine learning approaches were text mining and natural language processing (Desai, 2015). The role of deep learning in the data economy research stream has also been underscored because most of the studies adopted it (Kiarashinejad et al., 2020;Tabernik et al., 2020). Phan et al. (2017) applied deep learning in predicting human behavior with social health workers. Adopting the restricted Boltzmann Machine predicted human behavior accurately and generated explanations for these behavioral leanings better than the conventional method.
Again, Musci et al. (2018) employed survival analysis in a longitudinal study to determine marijuana use among elementary school pupils in a US city. They found that genetics play a key role in first marijuana use. That differences in genetics also account for the effectiveness of classroom-based intervention in delaying drug use among pupils. del Pozo Cruz et al. (2020) also applied survival analysis to estimate mortality risks among adults. They found that poor diet quality and activity profile increased the likelihood of mortality rate. Predictive modeling was also used in some studies. For instance, Kaur and Kumari (2020) applied predictive modeling in an Indian study to classify diabetic and nondiabetic patients. Using the Boruta wrapper features selection algorithm performs better than manually selecting the attributes, especially with little medical knowledge.

Scientific production of data economy
As presented in Figure 5, our analysis shows that the scientific production of articles in the data economy may have commenced over a decade but received a boost in 2016. The growth of the data economy in terms of scientific output has remained on the rising curve, suggesting that the emerging field has the potential to grow significantly on an annual basis. In 2020, there were slightly over 150 articles published. When the data was collected in the middle of 2021, over 50 articles on data economy were already published, which shows that the total number may surpass previous years.

Thematic evolution of data economy
Furthermore, this study conducted the thematic evolution of data economy and FAIR data. Thematic evolution of a field is the analysis that seeks to unravel a set of themes that have evolved across subunits over a period (Chen et al., 2019). In other words, we consider a theme to have evolved from A to B if there exist common keywords within a thematic network.   This thematic analysis is based on the conetwork of authors' keywords. The intention for analyzing the thematic evolution of the field of data economy and FAIR data is to understand what constitutes the emerging field and how it has been influenced for years. As seen in Figure 6, FAIR data dominates the field, implying the foundational keyword.
Besides, it can be seen that "FAIR data" is closely linked to data management and open science, which both, to a large extent, deal with the administration of data in terms of data sources, privacy and access control. Whereas the term "data economy" relates closely with the organization and exchange of data within a secured network, the value is derived from data use. For example, the term "data economy" seems to have evolved from "data sharing" with robust backbone technology such as blockchain, smart contracts, data security, policies and governance. A noticeable term linked to the data economy is "data trading," which delineates data commercialization within fairness and secured space.
The thematic evolution maps were divided into eight distinct clusters. In order of importance based on the size of the nodes, FAIR data has the most central node, followed by data management, open science, data sharing, machine learning and data economy. The other clusters did not show any focal concept, but the nodes relate together at the same level. Interestingly, big data nodes relate closely to machine learning. Different machine learning algorithms are preferable for big data preparation, modeling, evaluation of model performance, deployment and maintenance in real life. Also, our results confirmed the best practices in the medical field as platform economy, health policy, health data and sustainability clustered together. Further, the data economy cluster shows the interwovenness of international data spaces (development of a European standard for independent and controlled data sharing), fair exchange, data governance, personal data, data trading, algorithms, ethics, political economy and ecosystems.
This study further investigated the thematic areas of data economy to unravel specific clusters of independent fields where data economy research has been conducted to date. As presented in Figure 7, the analysis revealed how the main keywords used in our search are clustered.
For example, data economy and big data are tightly coupled to the left side, whereas fair and OD are tightly coupled to the right. Besides, it is evident from Figure 4 how state-of-the-art technologies are heavily deployed to foster the development of the field of data economy. For example, artificial intelligence, data analytics, machine learning, the IoT and data mining are shown in the clusters. Hence, the data economy could focus mainly on ethical and fair data sharing with commercial values but supported with advanced technology to guarantee security and privacy concerns. On the other hand, FAIR data broadly focuses on the core principles of findability, accessibility, interoperability and reusability that make data useful (Wilkinson et al., 2016). Figure 8 shows the conceptual structure map based on correspondence analysis (CA). The CA analysis measures the proximity of variables (in this case, authors' keywords) and their association, which provides valuable insights regarding clusters of articles that demonstrate common concepts. This clustering shows how authors' keywords in articles are treated together.
Our analysis revealed two clusters highlighted in red and blue colors. The red-colored cluster depicts big data, whereas the blue-colored cluster deals with the platforms, policy and general data administration.

Intellectual outputs and prolific scholars of data economy
Intellectual outputs of the data economy in this study refer to tangible results of activities that emerge from the domain in terms of publications in conferences, journals, book chapters and other scientific publishing outlets.

RQ2. What are the intellectual outputs and contributions of scholars in the domain of data economy?
This study investigated the intellectual outputs of scholars in the field of data economy based on the number of publications as reflected in the data set to give an academic background to the data economy. In particular, the analysis of the most relevant authors in the data economy was based on the author's fractionalized number of articles. According to Aria and Cuccurullo (2017), fractional authorship measures an individual author's contributions to a published article. Based on this premise, our analysis revealed the top 20 authors' productivity over the years (2013 to mid-2021). As shown in Figure 9, the line delineates an author's productivity timeline; the round bubble shapes represent the author's production. The shape's size depicts the number of articles produced by the author per year. The color intensity of the bubble represents the total number of citations the author has received per year. The first bubble on the left-hand side of the line depicts the author's first production year. Consequently, Roos M. and Kaliyaperumal R. both have the most extended timeline It can be seen from the analysis that the work of Fensel and her colleague was the only article on data economy in 2013, which perhaps was presented at the IEEE international conference on big data (Tomic and Fensel, 2013). Besides the author's productivity timelines, the analysis based on the authors' fractionalized number of articles revealed that Schultes E. came first among the top 20 authors. Notably, Schultes began to publish articles on the data economy in 2016, with 11 articles between 2016 and 2020. However, Schultes's work received a value of 3.11 of the fractionalized frequency with many citations in 2016.
4.3 Social structure of data economy RQ3. What social synergy and collaborations exist in the domain of the data economy? Social synergy is a concept proposed by Barbara Marx Hubbard after the biting global financial crisis of 2008. Social synergy indicates the mindset of a group's creative presence and productivity through knowledge sharing. It means researchers can combine efforts to attempt more considerable academic challenges with the impact of technology and offer solutions. The earlier study by Baraibar-Diez et al. (2020) established an increasing interest in social impact research.
A social structure indicates social actors with common interests linked together by connections. Studying the social structure gives more insights than studying an isolated actor. The data economy social structure graphical representation depicts the author as nodes and their relationships as edges. The author's cocitation network in Figure 10 shows three unique social network analysis centrality measures. The measures are betweenness, closeness and page rank to understand the most critical nodes and edges and how they interact in a data-economy bibliometric network. Data economy authors' cocitation network reveals how pairs of papers are cited together in the source articles. When many authors cocite pairs of papers, then research clusters are formed. Cocitation of data economy pairs of papers formed two clusters, as shown in Figure 10. One is represented by the blue color, while the red represents the other segment. The red segment is more densely clustered than the blue segment, but the blue segment is more influential than the red cluster. Wilkinson is the most influential author in the network, with the most significant node in the blue segment and the author's role is central in the network. The node size determines the total number of citations, while the thickness of the edges between nodes determines the number of times the sources are cited together.
Regarding betweenness in network analysis, which shows the number of paths between two nodes, Wilkinson topped the list in the network with 219.97. Also, Wilkinson scored 0.014 in the closeness centrality measurement, which indicates the length of the path from one node to the other in a network. Further, Wilkinson scored 0.034 in page rank, which unravels the nodes that extend their influence beyond the direct connections into the more comprehensive network. Wilkinson's influence transcends the blue segment and extends to the red segment. Aside from Wilkinson, Johnson is the second influential author in the blue segment with 40.27 betweenness, 0.013 closeness and 0.020-page rank. Sharma had the highest betweenness with 39.07, 0.011 closeness and a 0.020-page rank in the red segment.
Collaboration in academic paper writing is an act of working with one or more authors to produce an academic paper in the form of articles, conference proceedings or book chapters. Common goals, familiar research areas and authors' high reputations could motivate factors to collaborate. Collaboration could also be a mentor-mentee process whereby senior and experienced authors collaborate with junior authors to bring them up. Collaboration facilitates training by observation and training by doing to leverage an expert. Collaboration is good, but it could translate to a positive or negative experience. It is essential to research collaborators and ensure collaborators are a great fit before they tie into their reputation.
Furthermore, it is vital to establish a reputation model for necessary quality assurance. Figure 11 showcases the data Figure 10 Authors cocitation network analysis economy authors' collaboration. The figure shows 12 clusters of the authors. The Wilkinson collaboration cluster is the biggest and most densely connected to the other clusters of the 12 clusters. Most of the clusters only have a single collaborator with a single path, and there were no betweenness values recorded for the relationship. However, they have closeness and page rank values. For example, Van has three collaborators, but the author's collaboration is stronger with Dekker than with the two other collaborators. Van has a betweenness of 0.6, a closeness of 0.0007 and a page rank of 0.037.
On the other hand, Dekker has a betweenness of 0.4, a 0.0007 closeness and a page rank of 0.035. On the contrary, Chen and Fijten did not have betweenness values. Similarly, Rantanen and Koskinen have a strong collaboration. The most significant collaboration cluster has a lot of influential authors, and Mons had the highest betweenness of 12.68, 0.0009 closeness and 0.034-page rank. Schultes is close to Mons, with 12.59 betweenness, 0.009 closeness and a 0.034-page rank. Wilkinson led the in-betweenness of the author's co-citation network with 219.97 but had 7.10 betweenness, 0.0009 closeness and 0.04 for page rank. Collaboration is growing among the authors of the data economy, but there is room for improvement.
The author's collaboration is essential, as is the collaboration between institutions. Collaboration between different institutions across borders facilitates knowledge and resource sharing. Figure 12 shows five distinct clusters of institutions' collaborations in the data economy. The orange cluster consists of the University of Miami and the University of California, San Diego (Cluster 5). This cluster has direct collaboration and has no betweenness values, but Miami has 0.004 closeness and a 0.06-page rank, while California has 0.004 closeness and a 0.06-page rank. Like the Miami collaboration, Maastricht University Medical Center has a single-path collaboration with Radboud University Medical Center in purple (Cluster 4). Maastricht has no betweenness record but has a 0.009-and 0.027-page rank.
In contrast to Maastricht, Radboud has 10 betweenness, 0.009 closeness and 0.05-page rank. Cluster 3 in green consists of Stanford University, Heriot-Watt University, the University of Oxford and Lawrence Berkeley National Laboratory. Oxford and Lawrence did not have betweenness scores, but Oxford has a 0.009 closeness and a 0.04-page rank, while Lawrence has a 0.009 closeness and a 0.03-page rank. Stanford and Heriot-Watt's betweenness are very close (11.66 and 11.46) and their page rank is (0.08 and 0.09). They both have a closeness of 0.009. Stanford and Heriot-Watt are the focal collaborators in Cluster 3. Out of the five clusters, Cluster 2 is the biggest, with six institutions spanning the European and American continents and four institutions from The Netherlands dominating the cluster.
Leiden University Medical Center had the highest betweenness of 21.74, 0.0010 closeness and 0.11-page rank, but Leiden was followed by The Netherlands eScience Center with a 3.76 betweenness, 0.0010 closeness and 0.08-page rank. Universidad Politecnica De Madrid had a similar result to The Netherlands eScience with 3.74 betweenness, 0.0010 closeness and 0.089-page rank. There is network activity outside Clusters 2 and 3; Cluster 2 collaborates with Cluster 3 with light and strong edges, which indicates the level of collaboration. The University of Helsinki is the focal collaborator in Cluster 1, with Aalto University and Exeter. The University of Helsinki had one betweenness, 0.005 closeness and a 0.086-page rank. Aalto University and the University of Exeter did not have betweenness, and they had similar closeness and page ranks of 0.005 and 0.046. The University of Helsinki had both national and international collaborations. Figure 13 depicts the focal countries in international collaboration, but the VOSviewer Figure 14 shows the comprehensive clusters with focal countries and their networks. The result also reveals the emerging countries in the field of data economy. There are countries like Estonia and Poland. Denmark, Czech Republic, Slovenia, Portugal, Norway, France, Cyprus, Greece, Sweden, Finland, Ireland, Belgium and Bulgaria in Europe. Further, Brazil and Venezuela in Southern America. Canada in Northern America. Also, Singapore, the Philippines, India, Malaysia, Taiwan, Japan, Iran and Saudi Arabia are part of the Asia Continent. More The international collaboration of data economy covers Europe, Asia, North America and Australia. Figure 14 shows two interconnected clusters. Clusters 1 with red color consists of the UK with 5.18 of betweenness, 0.17 closeness and 0.22page rank, China with 0.4 betweenness, 0.11 closeness and 0.11-page rank, and The Netherlands with no betweenness, 0.1 closeness and 0.08-page rank. The UK had the highest betweenness, closeness and page rank with a strong collaboration path and collaboration across the borders. In the second cluster with blue color, the USA had the highest betweenness with 2.05, closeness with 0.15 and 0.20 for the page rank. Canada and Australia had no betweenness values and the same closeness values of 0.11 but different page ranks. Canada page rank of 0.13 and Australia page rank of 0.010. Switzerland had 0.38 betweenness, 0.13 closeness and 0.18page rank. The betweenness, closeness and page rank of Cluster 1 are higher than those in Cluster 2. There is more pronounced international collaboration in Cluster 1 than in Cluster 2.
The collaboration world map in Figure 15 reveals six countries collaborating with single and multiple countries. Argentina had a single collaboration with New Zealand. Moreover, Australia collaborates with Argentina, Brazil, France, Greece, Ireland, Japan, New Zealand, Norway, Poland, Singapore, South Africa, Sweden and Switzerland. Out of the 13 countries, Australia collaborated with Switzerland remarkably (2).
Brazil collaborates with six countries, Argentina, Greece, New Zealand, Norway, Poland and South Africa. Also, Canada collaborates with Argentina, Australia, Brazil, France, Greece, Ireland, Japan, New Zealand, Norway, Poland, South Africa, Sweden and Switzerland. Canada had the highest collaboration with Switzerland in 13 countries. Canada's collaboration with Switzerland is similar to Australia's collaboration with Switzerland, only that Canada is higher by 1. China also collaborated with Argentina, Australia, Brazil, Canada, France, Greece, Ireland, Italy, Japan, The Netherlands, New Zealand, Norway, Poland, South Africa, Sweden and Switzerland. China had the highest collaboration with 16 countries. China had the highest collaboration with The Netherlands (2). Lastly, Figure 13 International collaboration network Figure 14 Emerging countries on data economy (VOSviewer visualization) France had a collaboration with Argentina. The bibliometric result of data economy shows that the collaborations between some countries are very low, while collaboration does not exist in some countries.

Study implications
The review contributes to education, research, managerial practice and policymaking on the data economy.
For educators, we established that research and development, human capital creation and creativity would form the new drivers of economic growth in the data-driven economy. It will also result in rapid institutional responses to economic changes through labor market adjustments, depreciation of capital investments, shortening product life cycles and extensive investment planning. The study provides educators with an example of relevant topics for inclusion in teaching and learning resources. The data economy has come to stay as a research domain. Also, this study suggests that a full-fledged program should be developed at certificate, diploma, degree and master's levels. Presently, Laurea University of Applied Sciences (UAS), Finland, has started "Introduction to Data Economy" as one online credit course as part of their UAS Master's program. These programs will create more awareness and enlightenment for the data economy. Further, it will serve as an opportunity to produce seasoned graduates of the data economy that can impact academia and industry. It will also be a means of eradicating job shortages of data economists.
For academic researchers, this study may help researchers globally to discover a suitable niche of data economy and develop it for enlightenment, economic growth, individual empowerment and broad societal benefits. Further, it may help the team leaders and research institutes to form clusters of research areas as dimensions of data economy such as business renewal and new business models for data economy, data economy legal issues, data economy ethical issues, technology integrated with data economy, innovative data economy, health data economy and many more.
This study established how FAIR data, OD, data management, data sharing, data economy, machine learning and big data are interlaced with different dimensions. The complex thematic evolution of the data economy portrays a multidisciplinary data economy. This result is consistent with the studies of Choi and Pak (2007) and Choi and Anita (2008). The authors argued that multidisciplinary, interdisciplinarity and transdisciplinarity are different concepts. They further posit that it is not necessary to interchange them in research. This study shows that data can resolve complex world problems as a multidisciplinary field of research, and this assertion is consistent with the study of Choi and Pak (2007). Also, this study shows the impact of authors, institutions and countries' collaboration to advance the data economy research domain.
This result is consistent with the study of Choi and Pak (2007). The study proposed eight motivating strategies that can influence multidisciplinary teamwork: team, enthusiasm, accessibility, motivation, workplace, objectives, role and kingship (teamwork). These strategies are proposed to promote the multidisciplinary collaboration promoters and bar the barriers to research collaboration. Further, this study integrates data economy conceptual structure, intellectual structure and social structure to advance the research domain of data economy and show their integration as interactive and holistic.
Publication of data economy is not evenly distributed (Milletler, 2019). This global research gap will motivate researchers in countries with scanty literature on data economy to focus on expanding the existing contribution in the research domain of data economy. In addition, this study gives insights into metrics for making contributions to the data economy. It gives direction to how researchers can make a theoretical, methodological, contextual and conceptual impact on the data economy (Akhavan et al., 2016).
Recent studies elucidate the essential theory in evaluating the quality and contribution of research. This study established the theoretical integration of data economy. Apart from theoretical integration, there is a need for data economy theory, as Milletler (2019) mentioned that the standard economic theories are deficient in explaining the phenomenon of data economy. There is a need for advanced theoretical and conceptual frameworks for the data economy. In addition, this shows a need for data and methodological triangulation for the research process and the epistemological development of a research question.
For data managers, the increasing use of data across the main segments of the economy implies that data has become a new form of capital for the current knowledge economies. Therefore, the vast amounts of available data can be used and reused to benefit society and increase growth opportunities. The industrialization of learning in the data economy will also accelerate innovation because of the availability of data and analysis tools. The continuous growth of the data-driven economy further implies that human capital will be continuously discounted in the subsequent years. This impact follows the continuous deployment of artificial intelligence and machine learning in data collection and analysis. Its increasing use will increase the capabilities of machine intelligence to the extent that it can match human intelligence, making machines a viable substitute for human capital.
It is suggested that the research should focus on theoretical, innovative methodologies and best practices of data economy. It is also essential to integrate applied research for the data economy for impactful managerial implications. This study proposes a comprehensive business model that harmonizes the ethical, legal, technology and societal issues. This proposed business model will proffer solutions to the existing teething problems of the data economy. To get desired results from the proposed project, professionals from ethics, legal, technology, environmentalists and experts from different fields that stake in the data economy should be part of the project. It should be a business model that can work across different fields of expertise without minor modification. Organizing workshops and seminars will contribute to the proposed project's success.
There is a growing impact of funding for data economy at the continent, international, national and local levels for policymakers. An example is the European Commission funding for data economy and the Foundation for Economic Education (Liikesivistysrahasto) in Finland, which has data economy as one of its bold themes more than two years ago. Because of the impact of the data economy research domain, the results of our study may simplify the decision-making of funding for the data economy in the future.

Conclusions
This study attempted to answer three research questions and combined two databases (WoS and Scopus), spanning 13 years focusing on FAIR data and data economy. The results from the bibliometric analysis generate interesting insights for the research community, data stakeholders and society at large. The article production from 2008 to 2021 shows exponential growth, and the first spike took place in 2016 and 2020. The growth is becoming steady, and the result certifies Roos and Kaliyaperumal as the authors with the most extended productivity timeline. Interestingly, Schultes excelled within the shortest time frame with high citations.
Also, the results show FAIR data concepts with dimensions of open science, data management, machine learning, big data, data sharing and data economy. This result indicates the importance of ethical issues related to data. The FAIR principle must predominate whether an organization is thinking of OD or a data economy. The study also reveals the importance and intervention of emerging technologies in FAIR and the data economy context. Further, the social structure of the study shows that Wilkinson is an influential author and has a cluster of collaborators in their network.
Regarding institution collaboration, institutions in The Netherlands have the most significant cluster, while Standard and Heriot-Watt top the list in their cluster and the University of Helsinki. The USA plays a central role in two clusters of countries and connects other countries from Europe, Australia and Asia. International collaboration also cuts across countries and portrays intercontinental collaboration. Our results show multilateral and bilateral collaborations. All the metrics examined signaled the advancement of data economy literature.
Further, the compound annual growth rate (CAGR) the Bibliometrix app generates for the literature on FAIR data and data economy is in tandem with the exponential growth discussed earlier. The CAGR is based on [number of articles (final year)/number of articles (initial year)]^(1/n) À 1. The computation is rooted in the study period: the number of articles in the initial year and the accumulated literature. The annual growth rate for this bibliometric study is 7.97%. This result shows the demand, value and performance of datarelated literature.
The CAGR will help researchers, journals and other data stakeholders properly understand the data literature life cycle, either growth stage, maturity or decline that needs renewal. The CAGR percentage in this study shows that the data literature is at the growth stage, and this growth needs sustainability. Data relevance is progressing, and it is a key strategic asset for companies and academia now and in the future. This study combined FAIR data and data economy. It contributed to the literature on big data, information discovery and delivery by shedding light on the importance of the data economy's conceptual, intellectual and social structure.
This bibliometric study is a road map for future researchers, but the study is not without limitations. First, the study was limited to 2008-2021 but did not consider the earlier years of the data economy. Though it is an emerging field, this study did not account for the scanty literature before 2008. Though the literature was extracted from 1970, the study cut off 38 years of literature to sanitize the data. Because English is a broader acceptable means of communication, the study excluded other languages than English. Some of the non-English articles would have contributed to this study. This study scope is also limited to FAIR data and data economy. The future researcher can work around these limitations by extending the results of this study.
Based on this study, the researchers can expand on the intersection of sharing, platform and data economy. This research shows a dearth of theory building and testing in the research domain of the data economy. Future researchers should test existing theories and show a plausible explanation of the phenomenon of their investigation. Also, future researchers need to synthesize a wide range of literature with higher-level thinking skills to build theory around the data economy. It is also essential to combine methods to strengthen the existing methodology for data economy research. This study used bibliometric methods to unravel the proper position of scholars' contribution to the data economy. Future researchers can use the insights in this study to embark on empirical and longitudinal studies. This study, without any doubt, will open further discussion on the data economy.