Two decades of ﬁ nancial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach

Purpose – The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning it into smart literature. This study aims to present a framework for incorporating machine learning into ﬁ nancial statement fraud (FSF) literature analysis. This framework facilitates the analysis of a large amount of literature to show the trend of the ﬁ eldandidentifythemostproductiveauthors,journalsandpotentialareasforfutureresearch. Design/methodology/approach – In this study, a framework was introduced that merges bibliometric analysis techniques such as word frequency, co-word analysis and coauthorship analysis with the Latent Dirichlet Allocation topic modeling approach. This framework was used to uncover subtopics from 20years of ﬁ nancial fraud research articles. Furthermore, the hierarchical clustering method was used on selected subtopicsto demonstrate theprimarycontextsin theliterature onFSF. Findings – This study hascontributed to the literature in two ways. First, this study hasdetermined the top journals, articles, countries and keywords based on various bibliometric metrics. Second, using topic modelingandthen hierarchy clustering,this study demonstrates thefour primarycontextsinFSFdetection. Research limitations/implications – In this study, the authors tried to comprehensively view the studies related to ﬁ nancial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The ﬁ rst limitation is due to language bias. This study has focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, the authors tried to show the top articles based on the citation criteria. However, judging based on citation alone can

1. Introduction 1.1 Description of the research topic Almost 20 years have passed since the financial crisis caused by Enron's financial fraud. Despite significant reforms such as the Sarbanes Oxley Act and the creation of the Public Company Accounting Oversight Board, which oversees auditors' quality, the news of managers cooking the books continues. Even the legendary investor, the Oracle of Omaha, has fallen victim to financial fraud (Kollewe, 2014). To combat this issue, experts suggest financial insurance (Economist, 2014) and the use of blockchain technology to revolutionize financial transactions (Aicpa.org, 2020). Providing these examples, one may ask, what are the causes of the financial fraud cycle? How can investors protect their capital by examining the quality of their financial statements? Committing financial statement fraud (FSF) is an unethical decision-making process, and detecting fraud is challenging because of its multifaceted nature. For example, 50% of fraud seen by SEC enforcement has been due to revenue recognition (e.g. cookie jar), and 35% is because of concealed liabilities and expenses. The rest is caused by inadequate disclosure in footnotes (Association of Certified Fraud Examiners, 2017).
According to the Association of Certified Fraud Examiners in 2020, three major frauds are corruption, asset misappropriation and FSF. FSF schemes are the least common with the most damaging types of fraud. FSF is the "deliberate misrepresentation of the financial condition of the company accomplished through the intentional misstatement or omission of amount or disclosures of the financial statement" to decisive financial statement users. Therefore, in this study, we intend to review FSF detection to provide a comprehensive picture of the development over 20 years. The applications of the present study can be summarized as follow: Uncovering the new trend of articles by analyzing the keywords over two decades; Grouping of the authors based upon the similarity of the content of articles; Providing the top countries' and journals' collaboration in FSF detection literature; and Explaining the primary contexts of FSF literature and providing the high-impact articles in each content based on citation rate.
Therefore, the results of this study will appeal to those interested in comprehensively exploring the FSF detection literature, such as auditors, investors, financial managers and researchers.

Literature review 2.1. A review of previous research
In this article, we use a combination of two approaches, bibliometric analysis and topic modeling, to review the research literature on detecting financial fraud. Bibliometric analysis is a quantitative method that we use to analyze a large number of financial fraud studies and uncover emerging trends in FSF detection. This approach is in line with the guideline presented in the article by Donthu et al. in 2021, and the software we use is VOSviewer. Table 1 shows the conditions required to use bibliometric analysis.

JFC
In the second approach, we will use topic modeling, which aligns with the guideline presented in Asmussen and Møller's article in 2019. Topic modeling is a natural language processing technique of the unsupervised method that extracts topics from a collection of papers. For topic modeling, we use Python, and the essential Python libraries used in the topic modeling approach include Pemdas, Gensim and pyLDAvis. Table 2 summarizes the features of the topic modeling approach. Table 3 shows five studies using bibliometric analysis and topic modeling formatted based on the research field, methodology, required data and data size. Figure 1 shows interest over time based on Google Trends data [1] for the two approaches of bibliometric analysis and topic modeling over the past five years. As is shown, despite the increasing trend for bibliometric analysis, interest in topic modeling has been more significant.  RQ8. What are the main contexts in FSF detection and which keywords have a higher frequency within these contexts?

Research innovation
RQ9. Which articles have the highest citation rate in each context related to FSF detection?

Methodology
The methodology of this study is illustrated in Figure 2. Essentially, the research process encompasses three phases. The initial phase, referred to as pre-processing, entails the Trend of interest in bibliometric analysis and topic modelling based on Google Trends JFC provision of the relevant data from articles in an Excel format, followed by data cleansing. The next stage is topic modeling, where the LDA method is applied. The clustering method is then used to identify contexts associated with financial fraud detection. Finally, the postprocessing stage highlights the most relevant articles for the selected contexts: Loading papers: Generally, two primary databases for collecting scholarly publications include Web of Science (WoS) and Scopus. While WoS is one of the largest and most reliable databases for reviewing research literature, the journals in Scopus are more comprehensive (Agarwal, 2016;Saba et al., 2020). Therefore, we have used the Scopus database to find relevant articles. Selecting papers: Distinguishing between fraud detection and FSF detection is necessary to collect the relevant articles. To investigate FSF detection, we use the keywords: "anomaly detection financial*" OR "detect fraudulent financial reporting*" OR "detect accounting fraud*" OR "detect misstatements*" OR "financial ratio risk detection*" OR "detect earning manipulation*" for the years 2001 to 2021. The total number of articles is 1,496 articles. Then, 124 articles that are not related to detecting Topic modeling approach financial fraud are removed. Also, considering that one of the goals of this study is to categorize research based on author keywords into corresponding contexts, we excluded articles that did not have author keywords. As a result, by removing 296 articles, we selected 1,076 articles suitable for analysis. Descriptive analysis: In this study, we have used VOSviewer and Phyton software to implement three methods of bibliometric analysis, including co-word analysis, cobibliometric analysis and co-authorship analysis. Co-word analysis shows that keywords related to each other have the same thematic map. Co-authorship analysis shows the interaction between scholars in detecting financial fraud, such as affiliated institutions and countries. Bibliographic coupling assumes that two publications sharing common references are likely to have a higher similarity. Cleaning documents: In this step, we break down the keywords into individual units and convert them to lowercase. Additionally, we eliminate any non-alphabetic symbols (such as punctuation marks) and words with less than three characters, as they have minimal impact on the topics being analyzed. Subsequently, we select keywords that do not contain stop words such as "use." Then, using a process called lemmatization, we identify the root words of the keywords. Finally, we eliminate author keywords that only occur once, as they offer limited value in performing topic modeling. Vectorization: Following the document cleaning process, we compile a list of keywords indicating the frequency at which each keyword appears in the training set. Subsequently, with the use of the count vectorizer, we convert each keyword into a vector representation. Setting parameters for LDA: The LDA method has a default value set. In this study, we have changed the parameters of burn-in time, the number of iterations and seed values to achieve better results. In addition, the number of fold parameters has been removed since all the papers are used to run the model. Topic Modeling: In this study, we used the LDA method, a probabilistic and unsupervised modeling approach. This method allows us to categorize texts within a corpus into specific topics. It is vital at this stage to determine the optimal number of topics through the use of the topic coherence metric. The coherence metric demonstrates how well a topic is supported by the reference corpus (Joao Pedro, 2022). The optimal number of topics for the model can be chosen by visualizing the coherence score, which ranges from 0 to 1 on the vertical axis, with the horizontal axis representing the number of topics. The highest point on the graph indicates the ideal number of topics (Yadav, 2022). LDA model outputs: The result produced by LDA are the LDA components that demonstrate the significance of each keyword in relation to various topics. The LDA-matrix shows the significance of articles to each topic based on the keyword list used in the research papers. To clarify the outcome, an inter topic distance map has been used which visualizes the value of keywords in different topics based on their frequency of occurrence (Yadav, 2022). Hierarchical clustering: We use hierarchical clustering to categorize such as topics into a single cluster. Clusters featuring similar topics create a context that is linked to articles. Afterwards, by introducing a new column, we demonstrate the affiliation of each article to its corresponding cluster. JFC Selecting the relevant topics: Once the membership of articles within each cluster is established, it becomes imperative to assign labels to each cluster. These labels signify the context of articles that share similar themes. The determination of the labels is based on the occurrence and recognition of significant keywords. Hence, we use the word cloud visualization approach which displays the frequency of keywords in each context. Validating the results: The exploratory search ends with labeling the contexts, and it is necessary to check the validity of the results. Validity shows the extent to which the contexts' labeling reflects the collection of keywords in each context. The validity of the results in this study is investigated through semantic validation, in which experts review the keywords in the word cloud to see if the naming of the contexts based on the observed keywords is justified (Asmussen and Møller, 2019). For this purpose, we will ask three finance and financial fraud experts to express their agreement with our labeling. We will also use the following formula to check the overall level of agreement or disagreement (Neuman, 2013): When the validity percentage in the above formula is higher than 60%, it indicates the proper validity for labeling the contexts: Introducing articles with the highest citation rate: In the last step, the top articles in each context are shown based on their citation rates. The citation rate of each article is calculated by dividing the total citations by the number of years that have passed since its publication. This analysis helps us to introduce the most important research topics within the field of study.    Figure 5 shows the collaboration between top 20 countries in publishing scholarly papers on FSF detection based on authors' affiliation information. The size of the nodes represents the number of published articles and the width of the edges represents the cooperation between countries. The USA, the UK and     Figure 8 demonstrates the network structure of the keywords based on the co-occurrence and shows how often each keyword is associated with other keywords. For this purpose, we set the minimum number of co-occurrence to 28, based on which 50 keywords with the highest cooccurrence are shown in Figure 8. Different colors of the nodes represent the initial date of their publication.

Analysis of publications
The results in Figure 8 show that keywords such as big data, machine learning and decision tree are among the keywords that have recently attracted the attention of  In contrast, keywords such as computer crime, administrative data processing and financial data processing have been used since 2012.
4.1.7 Keyword trend (RQ7). Table 4 shows the top 20 keywords in FSF detection. It shows that keywords such as anomaly detection, fraud detection, machine learning and data mining have the highest frequency in FSF detection.
In the same line, Figure 9 shows the trend of six keywords based on the frequency of usage from 2001 to 2021. As it can be seen, the keywords of anomaly detection, machine learning and fraud detection have an upward trend, indicating the increasing attention they have received from researchers over the years.

Topic modeling using Latent Dirichlet Allocation
4.2.1 Coherence scores. The coherence score is used to determine the optimal number of topics in a reference corpus and was calculated for 100 possible topics. The score reached its Decision tree 10 20 Internal control 10 Source: Author contribution Topic modeling approach maximum at 0.65, indicating that 42 topics are optimal. Figure 10 represents these 42 topics in a two-dimensional graph, with the intertopic distance map used to evaluate the content of each topic based on its keyword values.
4.2.2 Hierarchical clustering. At this stage, hierarchical clustering is used to reveal the primary contexts related to identifying financial fraud by grouping similar topics. In this method, each article is initially considered a cluster, and at each stage, articles closer to each other form a larger cluster. To find the optimal number of clusters, we have used Dendrogram analysis, a prevalent hierarchical clustering method. The optimal number of clusters is identified in the largest vertical difference between nodes by crossing the horizontal line. Accordingly, the optimal number of clusters is four, and each article belongs to a specific cluster from one to four. Also, four clusters have been formed due to merging similar topics, so it can be concluded that the articles in each of the four clusters refer to different contexts of FSF detection.
Four contexts covering 10%, 80%, 6% and 4.6% of all published articles, respectively. Therefore, contexts two and four have the highest and the lowest number of articles, respectively. In context one, the journals Lecture Notes in Computer Science, Procedia Computer Science and ACM International Conference Proceeding Series have the highest published articles. Moreover, in context two, the journals Lecture Notes in Computer Science, ACM International Conference Proceeding Series, Audit and Journal of Financial Crime have the highest number of published articles. In context three, IEEE Access, Lecture Notes in Computer Science, Journal of Critical Reviews and Studies in Computational Intelligence have the highest published articles. Finally, in context four, the Managerial Auditing Journal, Issues in Accounting Education and the Journal of Financial Crime have the highest number of published articles. Therefore, Lecture Notes

Select relevant topics (RQ8)
The article analyzes the content of various topics based on the frequency of keywords. Out of 3,406 unique keywords, the most repeated keywords in the field of FSF detection from 2001 to 2021 include anomaly detection, fraud, data mining, deep learning audit, clustering audit, Benford law, outlier detection and machine learning. These keywords are shown in the word cloud graph in Figure 11. 4.3.1 Selected relevant labels for context one. In this part, we aim to select relevant labels based on the frequency of keywords in each context. Figure 12 shows the keywords leading to the creation of context one.
Two FSF detection methods are human detection and machine detection. The results in Figure 13 show that by applying topic modeling, the keywords related to machine detection are more similar and, therefore, are in context one. In other words, human detection, such as the Whistleblowing system, can be a useful tool in detecting FSF, as it can help to bring attention to fraudulent activity that might otherwise go undetected. However, human detection is not the only method of financial fraud detection, and it is not always a reliable or effective method on its own. Financial fraud detection often involves combining different techniques, such as data analysis and machine learning algorithms.
The first group of keywords consists of feature selection, principal component analysis, feature extraction and dimensionality reduction. Generally, the issues related to FSF detection include the study of extensive financial data where the identification of financial variables and financial ratios is necessary. Then, by applying data mining techniques, organizations are classified into two categories fraudulent and non-fraudulent. However, if Topic modeling approach the data set includes many irrelevant and correlated features, a curse of dimensionality will appear, reducing the classification's performance. Therefore, removing the number of irrelevant features from the data set is necessary by using dimensionality reduction techniques such as feature selection and principal component analysis (see, for example, Gupta and Mehta, 2020).
The second group of keywords consists of text mining, neural network, classification, one class classification, artificial intelligence, time series analysis, graph mining, visual analytics, random forest, regression, unsupervised learning, decision tree, k-means, fuzzy logic, supervised learning, time series prediction and correlation. Generally, there are two supervised and unsupervised learning approaches for artificial intelligence and machine learning. In supervised learning, data sets have labels and include classification algorithms (e.g. support vector machine, decision tree and random forest) and regression algorithms (e. g. linear regression and logistic regression). The unsupervised learning approach analyzes unlabeled data sets and includes methods such as clustering and association (see, for example, Ashtiani and Raahemi, 2021). According to these explanations, it can be argued that the keywords of the first context are related to the title of "fraud detection techniques" for cluster one.
4.3.2 Selected relevant label for context two. Figure 13 shows the frequency of keywords leading to the creation of context two.
Cluster two includes keywords such as fraudulent financial reporting, FSF, earning management, corporate governance, cooking the books, fraud prevention, fraud triangle, fraud diamond, the pentagon model. The bankruptcy of companies such as Enron and WorldCom increased attention to the quality of financial reports, and researchers began to investigate the causal factors associated with the increased probability of fraud and the consequences of financial fraud (see, for example, Rezaee, Z., 2005). Other studies have tried  Figure 14 shows the keywords leading to the creation of context three based on the frequency of keywords.
Cluster three includes the keywords of digital forensics, network security, wireless sensor network, cloud computing, data privacy, malware, DDoS, information security, cyber security, online transaction, website defacement attack. These keywords are related to transactions and frauds caused by computers and the internet, the disclosure of which leads to the loss of the intellectual property of the company secrets and major financial damage. Articles in this area can include a variety of topics related to digital forensics. Cyber security measures can be an important tool in preventing and detecting fraud in financial statements. For example, implementing strong access controls, monitoring financial transactions and maintaining accurate audit logs can help identify manipulation of financial data. Also, digital forensics can be used to analyze the digital evidence that has been collected, to find and recover hidden information. For example, studies focuses on data manipulation detection methods, which means the unauthorized modification of the system (e.g. data leakage malware, salami technique) to disrupt the normal function of the targeted system (see, for example, Nicholls et al., 2021). Other study objectives can be identifying factors related to preventing online fraud and data security, such as security auditing and data classification (see, for example, Soomro et al., 2021). Based on the provided explanations, we select the title "computer and online transaction fraud" for cluster three.  Figure 15 shows the keywords leading to the formation of context four based on the frequency criteria.
Context four includes the keywords of audit risk plan, audit difference, audit analytics, audit procedures, audit software, audit planning, audit risk, audit standards, auditor experience, auditor liability, audit adjustments, audit effort, external audit, auditor independence, audit sampling, audit evidence and audit committee effectiveness. In general, studies associated with auditors' fraud-related responsibilities can be divided into two groups: internal audits and Figure 14. Selecting the title of "computer and online transaction fraud" for context three Figure 15. Selecting the title of "auditors' fraudrelated responsibilities" for context four external audits. Internal auditors are better positioned to detect financial fraud due to their proximity and understanding (Association of Certified Fraud Examiners, 2017). Therefore, the first group of studies has addressed the role of internal auditors in risk management (see, for example, Tamimi, 2021). In this regard, another group of studies has examined the function of internal auditors under the influence of mediating factors such as corporate governance and gender diversity (see, for example, Pazarskis et al., 2021). The second group of studies is related to external auditors. For instance, some studies have investigated the effect of external auditor quality (e.g. the auditor's quality) on the possibility of identifying financial fraud (see, for example, Qawqzeh et al., 2021). Other studies have investigated new trends in auditing financial statements (see, for example, Lim, 2021).

Validity test of the labeling for the contexts
At this stage, it is necessary to examine the validity of labeling for the four mentioned contexts. To this end, we provided the keywords of each topic to the financial experts and asked them to express their agreement with the keywords related to each context. If the calculated value of validating percentage is more than 60%, the created context will have sufficient validity (Neuman, 2013). Table 5shows the level of agreement of labeling based on the keywords of each context by three financial and financial fraud experts.

Introducing top articles for main contexts in financial statement fraud detection (RQ9)
In this section, the top articles of each context are introduced in Table 6. For this purpose, the citation rate index is used, and the articles are sorted based on this index. Articles with context number one are related to fraud detection techniques. Articles with context number two are related to the causes and deterrence of FSF. Articles with context number three are related to computer and online transaction fraud. Articles with context number four are related to auditors' fraud-related responsibilities.

Conclusion
This study aims to review the research literature concerning FSF detection from 2001 to 2021. Accordingly, 1,496 articles were extracted from the Scopus database. After Source: Author contribution JFC screening the articles, we selected 1,076 papers for analysis. To analyze the literature on FSF detection, we first used the bibliometric approach and revealed that the articles published during the past two decades have an upward trend. We also indicated the top 20 countries with the highest number of articles published. The results showed that the USA and China are the leading countries in content production in FSF detection. Also, the USA and China coauthors cooperate more with other countries. Then, we reviewed the journals and identified the top journals with more than 10 articles during the past two decades. Our results also revealed that the journals of Lecture Notes in Computer Science, ACM International Conference Proceeding Series, IEEE Access and Journal of Financial Crime have an increasing trend in publishing content FSF detection. Then, we analyzed the keywords and showed that keywords such as decision trees, machine learning and big data have recently attracted researchers' attention. Then, we introduced the 20 most frequent keywords in the literature on FSF detection. The analysis of the trends of the keywords such as anomaly detection, machine learning and fraud detection showed a growing trend in using these keywords in recent years. Finally, using the LDA modeling method, 42 related topics were identified. Finally, we identified the contexts by applying hierarchical clustering. The examination of four clusters revealed that the Journal of Financial Crime and Lecture Notes in Computer Science include more diverse topics because of their presence in most contexts. Then, using word cloud, we displayed the keywords of each context and identified the four labels of fraud detection techniques, fraud prevention and deterrence, computer and online transaction fraud and auditors' fraud-related responsibilities based on the analysis. Finally, we have introduced the top articles in each context label based on citation rate.

Limitation of the study
In this study, we tried to comprehensively view the studies related to financial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The first limitation is due to language bias. We have focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, we tried to show the top articles based on the citation criteria. However, judging based on citation alone can be misleading. Therefore, we suggest that the researchers consider other measures to check the citation quality and assess the studies' precision by applying meta-analysis.