Information flows and topic modeling in corporate governance

Purpose –Multiple disciplines such as finance, management and economics have contributed to governance research over time. However, the full intellectual structure of the governance “field” including the exchange of knowledge across disciplines and the large variety of governance topics remains to be uncovered. To appreciate the breadth of corporate governance research, it is necessary to understand the disciplinary sources from which the research stems. This manuscript focuses on the interdisciplinary underpinnings of corporate governance research. Design/methodology/approach – This paper employs bibliometric analysis to trace the evolution of corporate governance using articles included in the ISI Web of Science database between 1990 and 2015. Journals included in these categories encompass a full range of business disciplines and provide evidence of the multi-disciplinary nature of corporate governance. It also uncovers the topics treated by disciplines under the governance umbrella using a machine learning method called latent Dirichtlet allocation (LDA). Findings – Corporate governance research deals with a number of strategy-related topics. Unlike strategy topics that reside in a single discipline, corporate governance crosses disciplinary boundaries and includes contributions from accounting, finance, economics, law andmanagement. Our analysis shows that over 80%of corporate governance articles come from outside the field of management. Our LDA solution indicates that the major topics in governance research include corporate governance theory, control of family firms, executive compensation and audit committees. Originality/value – The results illustrate that corporate governance is far more interdisciplinary than previously thought. This is an important insight for corporate governance academics and may lead to collaborative research. More importantly, this research illustrates the usefulness of LDA for investigating interdisciplinary fields. This method is easily transferable to other interdisciplinary fields and it provides a powerful alternative to existing bibliometric methods. We suggest a number of topic areas within library and information science where this method may be applied, including collection development, support for interdisciplinary faculty and basic research into emerging interdisciplinary areas.


Introduction
Corporate governance is the area of business that deals with the relationships between boards of directors, company management, shareholders, and other stakeholders. It describes the relationships between these groups that allow corporate entities to thrive and deals with a wide range of economic, financial, and legal topics. Interest in governance topics has grown substantially since 1990 (Turnbull, 1997;Durisin and Puzone, 2009;Kushkowski and Shrader, 2013). From a relatively modest beginning, a great deal of research-based knowledge now emanates from the sub-disciplines of this interdisciplinary field of study (Turnbull, 1997;Yoshikawa and Rasheed, 2009). Research in corporate governance exists in the business fields of accounting (Keasey et al., 2005), finance and law (Shleifer and Vishny, 1997), strategy (Pugliese et al., 2009) and behavioral theory (Van Ees et al., 2009).
Despite the contributions to the literature from disparate business disciplines, management researchers contend that corporate governance has its basis in the management literature (Chen and Chan, 2010). This paper describes the multi-disciplinary underpinnings of corporate governance research and traces the evolution of corporate governance research using articles included in the ISI Web of Science database between 1990 and 2015. Journals included in these categories encompass a range of business disciplines. Our results show that corporate governance research has rich, multidisciplinary roots and will hopefully spur greater interdisciplinary collaboration between governance researchers.
More importantly, for LIS researchers, the methods from this paper can be used as a supplement to existing bibliometric methods. There are a number of areas within LIS where the method we use could be applied by researchers and librarians, including collection development and management, support for interdisciplinary research, information seeking behavior, and applied and basic research into emerging and established interdisciplinary areas.
Bibliometric analyses are conducted for several reasons. They define a field and its researchers (Samiee and Chabowski, 2012), they provide a retrospective look at a given journal or set of journals (Koseoglu, 2016), they explain the evolution of new research areas (Acedo and Casillas, 2005), and they identify the intellectual structure of a field (Culnan, 1987).
The citation metrics of a field or discipline are important because they provide context for the development of a field. Disciplines are not static entities; they grow, develop, and change in response to the research done in the discipline. The heterogeneous nature of disciplinary content within corporate governance research makes determining the intellectual structure of the field more complicated.
The discovery and explanation found in intellectual structure articles is analogous to review articles in intentto describe the present state of a field. The difference is that research on intellectual structure rely on the statistical analysis of bibliometric data rather than a non-empirical reflection on the research literature ( Zupi c and Cater, 2015). To appreciate the scope of corporate governance research, it is necessary to examine and understand the varied disciplinary sources in which corporate governance research is located.
This paper investigates this progression by using the bibliometric methods outlined below. This review expands on previous bibliometric efforts by considering journal articles published in the governance field from 1990 to 2015 and all the journal citations contained in these papers. This research also identifies how the citations are related to the primary sub-disciplines of governance research (accounting, economics, finance, law, management), and present the first comprehensive analysis of the evolution of the governance field.
In addition to traditional bibliometric analysis, this research employs latent Dirichlet allocation (LDA), an objective and powerful form of analysis that allows a computer to "learn" which topics exist in a large body of text. The LDA algorithm allocates words to topics based on their similarity as derived from a set of words called a corpus. LDA is a machine learning algorithm used to explore large amounts of textual data to discern which topics emerge and how those topics are related. The computer learns what words coincide with each other and uses probabilistic topic modeling to estimate the likelihood that words are grouped in similar topic areas ( Zupi c and Cater, 2015) and how topics are grouped in documents (individual transcripts). Our corpus consists of the set of articles retrieved in a search of topics related to corporate governance.

Citation analysis
The application of quantitative analysis to a body of academic citations is often referred to as bibliometric analysis or bibliometrics ( Zupi c and Cater, 2015). In bibliometric analysis, counts are performed over a period of time to establish the total number of citations received by a particular article or author from a set of documents existing in the literature. The underlying assumption is that highly-cited authors and articles have greater impact on the development of a discipline than do those cited less frequently (Culnan, 1986). Citation analysis, therefore, offers a means by which academics interested in corporate governance can measure disciplinary contributions to a field.
Bibliometric analysis can be classified into two basic types: (1) citation analysis, which involves the counting of citations and (2) network analysis, which examines the relationships among citations and which includes co-citation analysis (Georgi et al., 2010) and related methods. Our analysis in this paper uses citation analysis. Citation counts of highly cited papers and authors are often used to indicate scholarly influence of papers and authors and the intellectual structure of academic fields. Citation analysis can also be used to evaluate the performance of individual authors, journals, and institutions (Kushkowski and Shrader, 2013). Citation and network analysis frequently uses authors, articles, or journals as source data for analyzing relationships between and within disciplines (White and McCain, 1998). For example, Ramos-Rodruguez and Ruiz-Navarro (2004) used bibliometric analysis to track the intellectual structure of strategic management research, including identifying the most influential books, articles, and authors.
At present, there is no comprehensive citation analysis describing the intellectual structure of governance research in the extant literature. Filatotchev and Boyd (2009) specifically call for interdisciplinary aspects of governance research in their guest editorial for a special review edition of Corporate Governance: An International Review. Durisin and Puzone (2009) come closest to a comprehensive analysis in their review and co-citation analysis of one thousand publications. They rank the most highly cited articles and identify seven research themes for articles published in the journal Corporate Governance: An International Review: review articles, agency theory, governance structures, board characteristics, stock ownership and performance, codes of practices, and monitoring executive compensation. Their research also looked at papers published in general management and business journals and produced a similar categorization.
Durisin and Puzone (2009) use author co-citation analysis as a method for showing relationships between topics in the field. Our paper takes a different approach by defining the Information flows and topic modeling essence of corporate governance as a set of topics. By searching topically rather than by journal, this research captures a broader collection of journals where corporate governance research is published. Previous citation analyses of the governance literature point to the need for enhancing both the scope of the methodology used and the theoretical perspectives of the analysis. For example, both Cheng and Chan (2010) and Saggese et al. (2016) argue to broaden the perspective of analysis to include aspects beyond the analysis of articles and the topic of financial control of organizations. Acedo et al. (2006) and Shafique (2013) mapped the intellectual structure of the internal capabilities of firms while calling for more interdisciplinary analyses. Likewise, Turnbull (1997), Yoshikawa andRasheed (2009) andTihanyi et al. (2014) argued for analyses to go beyond the traditional agency conflicts between shareholders and managers and include considerations of the international aspects of governance.
In response to the above, our analysis is a more comprehensive and interdisciplinary effort. It includes international and business functional areas as well as legal studies, and uses a much larger and more comprehensive key-word driven set of governance articles and considers a longitudinal time frame. Our analysis is based on the notion that widely cited authors and papers are deemed to have exerted greater influence on a particular field than less-cited papers (Sharplin and Mabry, 1985;Culnan, 1986). As a set of ideas grow into a discipline, the rigor of related articles and the focus of journals coalesce into clear patterns and networks (Kushkowski and Shrader, 2013). Along with this, our paper presumes that highly cited articles, authors, and journals define the development of a field by indicating the key intellectual roots of a discipline (White and McCain, 1998). Our paper clearly identifies the most highly cited journals in the corporate governance field and maps the metrics of the field for future scholarly research. Current advances in machine learning have led to topic modeling methods that can be used as an exploratory technique to uncover hidden or latent relationships present in large data sets (Blei, 2012). It is a method of discovery that Moro et al. (2015) claim is useful for conducting progressive and relevant research in any disciplinary field. Implementations of LDA and its variants are available from data analytics firms, academic software repositories, or from open source repositories such as Github.
The main difference between these methods and traditional statistical methods is the absence of a priori assumptions about relationships present in the data. LDA does not require researchers to make assumptions in terms of how the information in the raw data is sorted into silos (i.e. topics). LDA is an unstructured machine-learning algorithm that uses probabilistic topic modeling to estimate the likelihood that words are grouped in similar topic areas ( Zupi c and Cater, 2015), and the likelihood that topics are grouped in documents (individual interview transcripts). Observations in a large set of data (called a corpus) are explained by similarities in otherwise unclassified groups (Blei et al., 2003). Because LDA extracts patterns without a priori assumptions, it is not designed to answer specific preformulated hypotheses (Schwab and Zhang, 2018). In this way, LDA opens up large data sets to unrestrained discovery.
As an unstructured machine-learning algorithm, LDA requires a researcher to input textual data to the algorithm which uses joint probability distributions to discover the "hidden structure" of topics within documents and words within topics. The richness of LDA is that it recognizes that there can be many probabilistic topics and that words are independently distributed among topics (Sugimoto et al., 2011). LDA is used in this study to look for the most mentioned words in a large set of articles dealing with the general topic of corporate governance.
In LDA, a "word" is the basic unit of discrete data (Blei et al., 2003). A "document" or article is a sequence of words, and a "corpus" is a collection of articles. The basic unit of modeling is the topic, which is a distribution of words over the entire set of words in a corpus (Sievert and Shirley, 2014). In effect, LDA helps discover underlying themes in a set of data by generating key words. Words are allocated to topics in the analysis.
The authors worked jointly with technical experts from Kingland Systems, a data analytics firm that provides information technology services to help firms manage compliance with regulations and risk and whose clients include some of the world's largest banks, financial services firms, and insurance companies. After receiving the project data, Kingland Systems did the data analysis, and the authors collaborated with them to interpret the LDA results. Staff from Kingland provided the technical expertise, and the authors served as the subject domain experts. Each topic is a cluster of words specified by machine learning algorithm. Unimportant words (e.g. a, and, are, is, the) are often ignored or reduced in the analysis. Rather, the important key words, words that are idiosyncratic and exclusive to a topic, are identified and emphasized.
LDA does not create new words or concepts as do factor analysis and content analysis; rather, topics in a set of data are represented by existing keywords (Moro et al., 2015). In a well-formed topic model, certain topics will generate words from one conceptual area more than from another. Topics are based on probabilities assigned by the LDA software. Blei (2012, p. 78) summarizes it as "reversing the generative processwhat is the hidden structure that likely generated the observed collection?" where the observed collection is the words in the transcripts.
In LDA, the research administrator determines the number of topics, which can be decided in several ways (Chen and Wang, 2018). Too many topics dilute the meaning of each topic and too few fail to adequately separate ideas and words from each other. The appropriate number of topics for our dataset was determined by comparing the intra-topic similarity with inter-topic dissimilarity. Analyses specifying five, ten, fifteen, and twenty topic solutions were run, and the optimum number of topics was determined to be twenty as this maximized the difference between topics (Chen and Wang, 2018). Words in topics were generated by determining the mix between the probability and relevance of words belonging to topics (Sievert and Shirley, 2014).
The LDA relevance metric sets the weight given to the probability of a word belonging to a topic (Sievert and Shirley, 2014). Our LDA model produced a list of the most relevant words for the twenty topics, where relevance of word w to topic k given λ is defined as: where w kw is the probability that word w belongs to topic k and p w is the probability of word w being in the corpus. Lambda, which serves to define relevance, can be set to any value between and including 0 to 1. A lambda of 0 would equate relevance to exclusivity; the most frequent words have the same probability of being within a particular topic and appearing in the corpus. By contrast, setting lambda to 1 means a word is relevant if it appears in the corpus regardless of which or how many topics it appears in. As recommended by Sievert and Shirley (2014), the developers of the relevance measure, a lambda of 0.6 is used for these results. Dyer et al. (2017) use LDA to ascertain exactly "what is being said" in corporate reports. Zupi c and Cater (2015, p. 457) state that LDA holds tremendous potential for "expanding the scope of mapping the management and organization domain." In order to make wide application of LDA available in organization research, Zupi c and Cater indicate that Information flows and topic modeling management scholars can use available software themselves or work with information scientists on research projects using LDA. The analyses for this research was completed with technical experts from Kingland Systems, a data analytics firm headquartered in Clear Lake, Iowa, which provides information technology and financial services to help firms manage compliance with regulations and risk. For example, Kingland provides their clients full-text analysis of formal corporate reports and compliance documents. Their clients include some of the world's largest banks, financial services firms, and insurance companies. The authors worked directly with Kingland analysts, providing them the raw data, or corpus. Kingland produced the LDA results and collaborated with the authors to interpret the results. The Kingland experts directly helped the authors ascertain the number of LDA topics to best analyze the corpus and they offered advice on the relevance metric to best identify topic words. The LDA analysis was done using LDAvis software to analyze our data (Sievert and Shirley, 2014), an extension of LDA which provides LDA-based data visualization. LDAvis allows us to examine how different words contribute to the meaning of each topic, how different topics relate to each other, and the prevalence of each topic. The accompanying tables were created from the LDAvis results.

Common data for citation analysis and LDA
Initial data for this study was harvested from ISI's Web of Science database. Journal articles containing the keywords "corporate governance," "agency theory," "director," "market for corporate control," "ownership structure" or "executive compensation" were selected to capture essential elements of corporate governance. A decision was made early in the project to be economical in the choice of search terms and focus on the essence of corporate governance research. A broader keyword search would expand the results set but increase the likelihood of extraneous results.
The search was performed in the Web of Science Social Science Citation Index categories of Business Finance, Business, Management, Economics, and Law to capture multidisciplinary articles on the search topics. The base bibliographic data used for the citation analysis and LDA for this study includes bibliographic information, abstracts and cited references for 10,532 articles from ISI's Web of Science database published between 1990 and 2015. The longitudinal data allows for an examination of trends in corporate governance scholarship over time. The twenty-five-year timespan for the data coincides with the growth of corporate governance as a research field and provides convenient data cohorts. The number of articles retrieved by our search for the years 2016 to 2018 were 1,044, 1,162, and 1,243, respectively, demonstrating that corporate governance continues to be a vigorous research area.
In addition, the dataset for LDA included 1,053 full-text articles chosen randomly from the Web of Science results. The full-text articles provided a corpus of text in addition to the bibliographic article data from Web of Science. A list of the full text sources used in this project is included as part of the supplemental data available for this research.

Citation analysis dataset
The citation analysis dataset includes the title, journal title, the year of publication, and subject classifications of the journal. These classifications are the categories assigned in Anne-Wil Harzing's Journal Quality List (Harzing, 2015) or subject classifications assigned to journals by the Library of Congress and then mapped to a Harzing discipline (for journals not included in Harzing).
Classifying articles by discipline provides insights into disciplinary contributions to the literature of corporate governance (Kushkowski and Shrader, 2013). The article subject code is used as a proxy for the discipline of the journal. For example, the Strategic Management Journal subject code is "management." Subject analysis leads to a more nuanced understanding of the development of corporate governance as a field.
The articles downloaded from Web of Science included the full bibliographic data and cited references. Over 656,000 cited references were included in the set of articles analyzed to explain trends in authorship. To focus our analysis on academic literature, the following categories of materials were excluded from the analysis: books, conference proceedings, foreign language materials, newspapers, unpublished articles and unidentifiable citations resulting in a set of 415,304 cited references. In the analysis of journals that are most highly cited, publications with fewer than 100 cited references were excluded.

Results
Analysis of the database of articles traces disciplinary changes in corporate governance articles by looking at which journals publish the research. Analysis of the citation data provides insight into which journals have been most frequently cited over time. Analysis of the citation data focused on three areas. First, the subject distribution of articles in our dataset and how the subject distribution has changed over time. Second, the subject distribution of the cited references which allows us to make observations about how corporate governance literature is distributed in the academy over time. Third, the journal distribution and which journals contribute the most to corporate governance research.

Subject analysis of articles
One of the ways to describe the changes in corporate governance research is to look at the subject distribution of articles downloaded from Web of Science. The subject categories assigned to the articles provide a straightforward way of describing changes in the disciplinary makeup of corporate governance. The results report totals for the Harzing subject areas of accounting, corporate governance, economics, finance, law, management, and organizational behavior-categories which include 85% of the articles downloaded in our sample. The category of "Other" includes the remaining 15% or articles that include the Harzing categories of business history, entrepreneurship, international business, innovation, marketing, management information systems, multidisciplinary, operations research, political science, public service management, psychology, sociology and tourism.
An easy way to gauge the growth of corporate governance scholarship is to look at yearly publishing output. Figure 1 shows the number of articles produced each year by subject area.
One trend to notice is that the growth of corporate governance scholarship was relatively stable between 1990 and 2004. Beginning with 2004, the total number of articles on corporate governance topics increased steadily with finance journals publishing more than other areas. This may be the result of the academic research in the impact of Sarbanes-Oxley Act which was passed in 2002, though the Act was not included in the search terms. Articles in journals dedicated to corporate governance, however, only began to appear in 2000, even though the main academic title, Corporate Governance: An International Review, started publication in 1993. The lack of explicit corporate governance articles prior to 2000 may be due to articles in the journal and the search terms not matching up. The total number of articles has increased each year and there is a broad distribution of articles across subject areas.
Another way to view disciplinary output is to look at the percentage of articles in each subject area over time. Figure 2 shows the percentage of the total output of governance articles that come from each field.

Information flows and topic modeling
As expected, finance has contributed a large share since the beginning and has been able to maintain it over time. Management has had a constant but smaller share of articles, an indication that corporate governance research is appearing in other disciplines. The lack of corporate governance articles in the results is explained by the fact that there are few pure corporate governance journals. The number of articles published and the share of articles published by subject area tells part of the story behind the growth in corporate governance research. Figure 3 illustrates the growth rates of disciplines for subject areas in five year cohorts.
We divided our data into five periods, the first one of six years (1990)(1991)(1992)(1993)(1994)(1995) and the following of five years each. The yearly growth is not as meaningful as the variation might be due to peculiar changes in journals and to the delay in the review process. Of the single disciplines, organizational behavior grew the most during the first period with 100%. The "other" category grew of 143% (capped at 100% in the figure) mainly due to growth in corporate governance research in the myriad "other" categories. During the next period from 1996 to 2000, the overall growth was modest with the highest rate being reached by economics with around 21.9%.  Percentage of articles produced by discipline  research on the implementation and consequences of Sarbanes-Oxley, which was enacted in 2002. In the last period, law experienced a dramatic decline of 34.1% while all the other disciplines experienced a small growth rate.

Subject analysis of cited references
Governance researchers, mostly management academics, assume that a bulk of governance researcher takes place solely in management (Chen and Chang, 2010). Our results illustrate that this is not the case. The subject breakdown shown in Table 1 for cited references in articles downloaded from the ISI Web of Science database is as follows: 25% finance, 18% management, 18% economics, 14% for law, 11% for accounting and 13% for a collection of 12 other disciplines, and less than 2% corporate governance. The low number of cited references for corporate governance is a result of only two journals categorized in this subject area by Harzing. Our analysis suggests that corporate governance is far more interdisciplinary than previously thought. Management literature, in fact, accounts for less than 20% of the cited references for the articles in our sample.
Our research also included an analysis of the journals appearing in the cited references in our journal sample. Journals are included if they include 100 or more cited references, which yields a total of 289 journals and 273,893 cited references. Table 2 lists the top 50 journals ranked by number of cited references. The titles in this list account for 62% (169,238) of the cited references.
Information about journals that contribute cited references to corporate governance research provides clues about the disciplinary underpinnings of the field. Table 2 shows that the top two journals -Journal of Finance and Journal of Financial Economicsaccount for more than twice as many cited references as the next two titles which are in managementthe Academy of Management Journal and Strategic Management Journal. In the top 50 titles, 58% of the cited references come from the fields of accounting, economics, or finance, while only 23% come from the management literature. The conclusion from this analysis is that cited references in corporate governance research are predominantly from fields other than  Information flows and topic modeling management. It is apparent from these results is that corporate governance research is located in the domain of a number of disciplines.

Representing information flows
A main argument in this paper is that corporate governance research takes place in multiple disciplines. Interdisciplinary flows are illustrated with a Sankey diagram, a type of flow diagram that shows the links between cited references and their citing article. In a Sankey diagram, the width of the links is proportional to the flow quantity; hence, the wider the link the greater the flow. Information flows in the corporate governance literature by subject are represented visually in the Sankey diagram shown in Figure 4. Figure 4 shows the subject origin of cited references on the right side and the subject of the journals in which they appeared on the left. The diagram shows dispersion between subject areas and how knowledge moves between disciplines.
The value of the Sankey diagram is the ability to visualize the knowledge dispersion between subject areas. Corporate governance research does not take place in a vacuum; there is cross-fertilization of ideas between subjects that informs the way that the field develops and changes.
While a picture is worth a thousand words, a tabular version of the Sankey data provides additional insight into information flows. Table 3 enumerates both the gross numbers and percentage of citation dispersion.
The patterns of dispersion describe the extent to which research ideas are shared among disciplines that contribute to corporate governance research. Law is an example of an insular discipline -87% of the cited references in law appear in law journals. At the other end of the spectrum, only 23% of the cited references to economics appear in economics journals. Finance, management, and accounting cite their own articles at rates of 43, 50, and 54% respectively. These results show that corporate governance research takes place in multiple  Source articles and cited reference distribution disciplines and that researchers frequently cite articles outside of their "home" discipline in their research.

Latent Dirichlet allocation results
The LDA results show yet another way of describing research areas in corporate governance. The difference between citation analysis and LDA is the ability, using machine learning, to uncover aspects of corporate governance research that are not readily apparent using citation analysis. The advantage of LDA is the ability to extract from the corpus of articles information about topics that are not readily apparent by just reviewing the articles themselves. LDA requires researchers to select the number of topics desired in the results before analysis begins. We selected a twenty topic model for our LDA analysis, and the results provided twenty numbered topics without names. Labels for the topics were assigned based on the top twenty words generated by the LDAvis is software. The topics in the table are arranged in descending order of articles allocated to that topic. Results for each topic also include a list of the top five subject areas where articles on that topic are found, a concentration ratio for the top five subject areas, a list of the top 10 journals associated with the topic, and the twenty terms generated by the LDA software are shown in Table 4.
The LDA results are interesting for a number of reasons. They suggest reasons why finance articles are heavily represented in the retrieved articles. There are eight topics (1, 4, 9, 10, 12, 14, 15 and 20) where finance journals range between 28 and 65% of the journals in that category. The topics include the "Effect of corporate governance in firm performance," "Executive compensation," "Boards of directors," and "Audit committees" and show the sweep of topics covered by finance journals. Among the disciplines represented in the LDA topic journal listings, finance appears in 17 topics with a median rank of 1; economics appears in 16 topics with a median rank of 2, and management appears in 15 topics with a median rank of 3.5.
LDA analysis is useful for uncovering areas of emphasis in the corpus of articles that might not be discovered with regular bibliographic analysis. The subject distribution of the articles described using bibliometric analysis is straightforward. LDA goes beyond citation analysis to uncover topics within the corpus that may be overlooked with traditional methods of analysis.
There are similarities between the LDA results and the keywords used to search Web of Science. For example, agency theory and executive compensation are both topics in the LDA results. Corporate governance is reflected in two of the LDA topics -"Effect of corporate governance in firm performance [Topic 1]" and "Corporate governance theory [Topic 5]," reflecting a bifurcation between theoretical and practical approaches to corporate governance in the literature. LDA topics also include two topics related to family firms (Organizational theory in family firms [Topic 18] and Control of family firms [Topic 9]), two related to boards of directors (Boards of directors [Topic 10] and Fiduciary duty of corporate directors [Topic 19]), and two related to executive compensation (Executive compensation [Topic 15] and Executive compensation/CEO incentive pay [Topic 3]). Beyond the topics already mentioned, there are seven topics related to finance or legal issues [Topics 2,4,6,7,12,14,17]. The preponderance of topics related to financial and legal issues, and the heavy reliance on finance, economics and legal literature, demonstrates that corporate governance research has interdisciplinary breadth.
6. Discussion 6.1 Implications for corporate governance researchers This research provides an empirically based illustration of the interdisciplinary nature of corporate governance research. To our knowledge, this is the first time citation analysis, Information flows and topic modeling         Table 4.

Information flows and topic modeling
LDA, and Sankey diagrams have been used jointly in exploring this interdisciplinary subject. This set of tools allowed us to reveal the intellectual genealogy of the field, as well as identify the current topics and specific directions the overarching field is taking. These tools are beneficial because of their power and scope, and because they make no assumptions about the underlying structure of literature being examined. Not only did we identify top journals and articles in a traditional citation networking sense, but also we have been able to trace content in terms of topics and disciplines back to their sources. This research goes beyond citation analysis by revealing the full intellectual structure of the governance field. Citation analysis is the traditional method for exploring academic networks. LDA enhances this exploration because it frees the analysis from the assumptions of particular sub-fields. By capturing the complete picture of the topical components of a field, LDA provides a means for investigating how different journals deal with specific topics in an interdisciplinary area. It uncovers the topics emblematic of various meta-disciplines. In our research, we have used it to connect journals with related topics. Being able to look at source journals for each topic gives researchers a sense for the topics that come primarily from specific disciplines. The Sankey diagram is another powerful tool for visually displaying the nature of interdisciplinary research areas.
Our results clearly show that the majority of corporate governance research is being published in the fields of accounting, economics, and finance. This runs counter to the prevailing wisdom that management researchers dominate the publication of corporate governance scholarship (Chen and Chang, 2010). Indeed, the results found that over 80% of governance research is published outside of the management literature and that governance research predominantly occurred in a number of related academic disciplines. This research may open opportunities for collaboration among corporate governance researchers who previously had little or no knowledge of one another. As the field of governance matures and expands into sub-disciplines, researchers will be inclined to track intellectual histories and indicate long-term sources. This research provides that illustration for future researchers.
This method is transferrable to other disciplines and it follows that this combined method would be effective in understanding other interdisciplinary fields. It provides both a macro and a micro level view of the development of corporate governance research and how it is used by academics and demonstrates that citation analysis and LDA can be used together to explore interdisciplinary subjects.
Our aim with this research was to extend the work of Turnbull (1997), and Durisin and Puzone (2009), by taking their suggestions for expanding the assessment of corporate governance research into interdisciplinary and international arenas. This research shows how the data mining capabilities of LDA allowed the examination of terms in a full corpus of governance research articles, thereby providing granular levels of detail not accomplished in previous literature reviews and citation analyses.
This method succeeded because of collaboration with a data analytics firm in doing the data analysis. This follows the suggestion of Zupi c and Cater (2015) to partner with information scientists in doing LDA analyses. It illustrates a method for exploring large data sets with LDA and for data-driven research such as that performed at analytics firms. For researchers with aptitude for data gathering and analysis, LDA software is readily available through software repositories such as the University of Massachusetts Amherst (http:// mallet.cs.umass.edu/topics.php), the Stanford Artificial Intelligence Laboratory (Blei et al., 2003), and Github (http://www.github.com).

Implications for LIS researchers and librarians
Linking the methods of citation analysis and LDA to explore interdisciplinary fields provides exciting possibilities for library and information science researchers. While this paper used corporate governance as an example, the methods described in this paper are generalizable.
There are a number of ways that LIS researchers and librarians can use the methods from this paper to advance LIS scholarship.
Scholarly activity is becoming increasingly specialized. Alvesson and Sandberg (2014) describe this move toward specialization and lament that it leads to parochial thinking in the academy. They recommend strategies for "box-breaking" research that challenges established paradigms in specialized research areas. Zahra and Newey (2009) identify three modes of theory development that move disciplines forward that can succinctly be described as replication, extension, and transformation. They posit that "maximum impact is created when theory building at the intersection uncovers new phenomena that revise the boundaries of existing disciplines and fields while giving birth to new ones" (2009, p. 1059).
Using the methods described in this paper, LIS researchers and librarians have an opportunity to expand basic LIS research and applied research in other interdisciplinary areas. As experts in parsing disciplinary development, LIS researchers can advance LIS scholarship by exploring the diversity of interdisciplinary areas and viewing this exploration as what Alvesson and Sandberg call "box-breaking research." Exploration of interdisciplinary areas can result in synergies across and between disciplines. Specific areas where this research could be applied in LIS research and practice are detailed below.
6.2.1 Collection development and management. The ability to deconstruct interdisciplinary fields may lead to better collection development decisions. This research can be applied by individual libraries making collection management decisions. It may also be relevant for libraries in consortiums that are making collection management decisions across institutions. With budgets under pressure at many academic institutions, being able to pinpoint subject areas of interest to researchers may help stretch collections funds.
6.2.2 Support for interdisciplinary research. Academic librarians provide support for researchers across the disciplinary spectrum. That support includes collaboration with departmental faculty on grant projects (Brandenberg et al., 2017), and facilitating crossdisciplinary research (Williams et al., 2013;Taskin and Aydinoglu, 2015). This paper provides methods librarians can use to contribute to interdisciplinary research efforts.
6.2.3 Information Seeking. There is a strand of literature in LIS research devoted to information seeking behavior. There may be merit in using methods described in this research as another way to explore information seeking. Much of the interdisciplinary information seeking literature is focused on the behavior of researchers at the point of need (Niu and Hemminger, 2012;Delserone and Dinkelman, 2017;Wellings and Casseldon, 2019). Deconstructing interdisciplinary fields into their component disciplines, then querying researchers about how they assembled those components may provide new insights into the information seeking research process.
6.2.4 Emerging and established research areas. Exploration of emerging research areas can confirm the degree to which they are interdisciplinary. All of the above areascollections, departmental/faculty support, and information seeking are important research avenues for emerging research areas. For example, Mryglod et al. (2016), explores the evolution of scientific topics related to the Chernobyl nuclear disaster. Additionally, established research areas like gerontology may also benefit from a rigorous examination of their interdisciplinary components.

Concluding thoughts
Corporate governance is viewed by many management scholars to be a niche area. This research demonstrates that corporate governance contains a richness of interdisciplinary diversity. Knowledge of that diversity may result in greater cross-fertilization among corporate governance researchers in different component disciplines.
The methods in this paper can be replicated in other interdisciplinary fields and they can be applied in a number of LIS research and practice areas. Information flows and topic modeling