Interca: an R library implementing “ automatic ” interpretation of results of multiple correspondence analysis (MCA)

Purpose – The purpose of this paper is to develop a software-library in the R programming language that implementstheconceptsoftheinterpretivecoordinate,interpretiveaxisandinterpretiveplane.Thisallowsfor theautomaticandreliableinterpretationofresultsfromthemultiplecorrespondenceanalysis(MCA)aspreviouslyproposedandpublished.Consequently,theuserscanseamlesslyapplytheseconceptstotheirdata, bothviaRcommandsandacorrespondinggraphicalinterface. Design/methodology/approach – Within the context of this study, and through extensive literature review, the advantages of developing software using the Shiny library were examined. This library allows for the development of full-stack applications for R users without the need for knowledge of the corresponding technologies required for the development of complex applications. Additionally, the structural components of a Shiny application were presented, leading ultimately to the proposed software application. Findings – Software utilizing the Shiny library enables nonexpert developers to rapidly develop specialized applications, either to present or to assist in the understanding of objects or concepts that are scientifically intriguing and complex. Specifically, with this proposed application, the users can promptly and effectively apply the scientificconceptsaddressedin thisstudyto theirdata. Additionally,theycan dynamicallygenerate charts and reports that are readily available for download and sharing. Research limitations/implications – The proposed package is an implementation of the fundamental conceptsofthe exploratory MCAmethod. Inthe nextstep,discoveries from thegeometricdataanalysiswill be added as features to provide more comprehensive information to the users. Practical implications – The practical implications of this work include the dissemination of the method ’ s use to a broader audience. Additionally, the decision to implement it with open-source code will result in the integration of the package ’ s functions by other third-party user packages.


Introduction
Reducing the dimension of data is an often-performed task in machine learning [1][2][3] that produces better results when creating the predictive models [4].In addition, dimensionality reduction is used in exploratory analysis to explain interdependencies and trends between variables.The multiple correspondences analysis (MCA) method is a dimensionality reduction method [5] that is applied to categorical data.The method can be used to interpret [6] the directiontrend of the data dispersion in addition to creating a new coordinate system in which the original objects are projected.The MCA method has been used and is suitable for exploration and visualization in fields of social sciences, as presented in publications such as [7].Additionally, it is particularly popular in questionnaire analysis in scientific areas such as marketing, human resource management, medicine (e.g.analysis of psychometric tests) and more.However, unlike principal component analysis (PCA), which is often considered as a similar method but suitable for quantitative variables, where reading the factorial diagrams provides a clear interpretation, in MCA users must validate the factorial diagrams, using additional indicators such as contribution (CTR) [8] and projection quality (COR).In fact the distinguished scientist of the MCA method, J.P. Benzecri, stated that the contribution is the only thing a user needs to review in order to interpret the results of the analysis [9].Additionally, we must highlight and mention the significant research noted by Ref. [10] and also by Ref. [11] regarding enhancements that either aid in the visualization or ensure the stability of the method's results.The issue of the correct interpretation of the results of the MCA has been studied and addressed in a number of scientific publications [11][12][13][14][15][16].
The interpretive diagrams method, introduced in 2022, enhances MCA results interpretation by providing a straightforward approach to determine the importance of points.This method posits that points further away from the origin of the interpretive axes are more crucial for the factorial diagrams.It introduces a geometric locus for each point on interpretive planes, indicating its importance based on its distance from the axes' origin.Thus, the farther a point's square from the origin, the greater its importance, highlighting that the points located in more distant squares hold higher importance.Utilizing R for this method's application is effective due to its open-source nature, extensive support community and ability to expand via packages, facilitating comprehensive tool development.The development of a tool as an R package, including a GUI application via the Shiny library [17], addresses the challenge of applying and accurately interpreting MCA results for users unfamiliar with statistical methods' intricacies.This approach enhances accessibility, allowing users to interact with the statistical methods through an intuitive interface without deep programming knowledge.The Shiny-based GUI application makes the package more accessible and extends its use to a wider audience, supporting the dissemination of new methods without discouraging their use due to complexity.This library eliminates the need for extensive knowledge in full-stack software development.Furthermore, it is noteworthy that its capability to accelerate software development [18] has rendered it exceptionally useful for teaching complex concepts in various scientific disciplines, including statistics, mathematics and economics [19,20].This paper presents R functions designed to implement interpretive coordinates, axes and planes, marking the first software or programming language application of interpretive charts [21].The package's uniqueness is further enhanced by a web application, enabling users to upload data, generate interpretive diagrams, download these diagrams and automatically generate reports on interpretive axes and planes.Inspired by a publication on interpretive diagrams and the aim to simplify their application, this tool leverages open-source code to ensure accessibility and facilitate further development by the scientific community.This effort underscores the importance of making the sophisticated methods like MCA more accessible and reproducible, enhancing their utility and adoption in research.The paper is organized as follows: Section 2 covers the discussion and literature review, focusing on R language implementations for interpreting MCA results.Section 3 outlines the methodology, including the concepts of interpretive coordinate, axis and plane and discusses the Shiny package's core operation.Section 4 introduces the interCa package in the application section.The conclusion in Section 5 summarizes the study and explores the package's practical and theoretical implications.

Discussionliterature review
At this point, it is crucial to investigate how the specific issue of secure interpretation of MCA results has been addressed by different scientists within the context of the literature review.The issue of interpreting the results of MCA has been addressed by various researchers, and several R packages have been developed to tackle this problem.Alberti's package, "CAinterprTools" [13], emphasizes the inclusion of supplementary visual information in the created diagrams to aid in the interpretation of the results.However, the users are still required to perform additional numerical calculations for a comprehensive interpretation.Similarly, with the "FactoMiner" [22] package, the users are able to create traditional symmetrical factorial plots, but successful interpretation still necessitates further manipulation and calculations.The "factoshiny" [23] package leverages the "Shiny" library and incorporates graphical applications that allow users to perform MCA and generate classic biplot diagrams.However, to interpret the results, users must consult the numerical outcomes of the contribution (CTR) and quality of representation (COR) indices to draw reliable conclusions.Also, the contribution of the GDAtools package, which is based on concepts contained in Ref. [16], must be emphasized.It offers functions aimed at facilitating the interpretation of the MCA's results.For example, the functions dimcontrib() and planecontrib() can be used by the user to return a table with the contributions of points to a specific axis or a specific factorial plane, respectively.Additionally, we note the function ggcloud_variables(), which can print the cloud of point-categories of variables, with the contribution of each point being evaluated through the size of the symbol representing a point.The 'ca' package [24] provides the capability to create biplot diagrams, wherein each point's coordinate is transformed into the product of its standard coordinate and its mass.As noted in Ref. [21], this is sufficient for the successful interpretation of a single principal axis, but for the interpretation of a plane, additional calculations are necessary to compare the importance of the points.The research by Ref. [21] proposes the creation of new axes, referred to as interpretive axes.Each point on these axes is assigned its pure inertia as a coordinate, taking into account the sign of the points on the factorial axis."In this way, the new axis retains the critical information necessary for proper interpretation.Additionally, the development and use of applications with the "Shiny" library has been associated with better education and understanding of various subjects.For example, in Ref. [25], the "Shiny" library was used for educating students on confidence intervals, while in the study [26], it is mentioned that applications were created in "Shiny" to accompany students' education and increase interactivity.In the same direction, it should be also mentioned the software "Medical and Pharmaceutical Statistics" (MEPHAS) [27], which is based on Shiny.In this software, the user answers a number of questions (like a quiz) related to the analysis he or she wants to conduct, and then, after being presented with the recommended appropriate method for the analysis, he or she can carry it out directly through the corresponding application.It is Artificial intelligence with R library and MCA also worth mentioning the Radiant package [28], which includes a shiny application that allows users through a graphical interface to perform many statistical analyses and machine learning tasks.The "dplyrassist" library [29] and the "ggplotassist" application [30] both employ the "shiny" library.The former provides a graphical interface for executing numerous functions of the popular data manipulation library "dplyr", whereas the latter facilitates the construction of plots via a graphical interface, using the popular data visualization library "ggplot2".As highlighted in Ref. [31], a commonality between these applications is their ability to eliminate the necessity for programming knowledge or serve as a gentle introduction to R programming.

Methodology
This section outlines methodological aspects related to the theory of interpretive coordinates, interpretive axes and interpretive planes.It also examines the fundamental operations of the "Shiny" library.For a comprehensive mathematical analysis and proofs of the information provided, readers are encouraged to consult the original publication on interpretive diagrams [21].

Definition of interpretive coordinate
The interpretive coordinate of a point j on an interpretive axis (a) is given by the formula: e a ðjÞ ¼ signðF a ðjÞÞ$λ α $CTR a ðjÞ [21], where signðF a ðjÞÞ, is the sign of the coordinate of point j on the corresponding factorial axis, λ α is the inertia of the corresponding factorial axis, CTR a ðjÞ is the value of the contribution index of the point on the corresponding factorial axis and finally, the product λ α $CTR a ðjÞ is the pure inertia of the point on the factorial axis.In this way, the critical information necessary for proper interpretation is retained.

Definition of interpretive axis
The interpretive axis is formed by the interpretive coordinates of points located on the corresponding factorial axis.The farther a point is from the origin of the interpretive axis, the more important its role in shaping the corresponding factorial axis.

Definition of interpretive plane
The interpretive plane is created by two interpretive axes.Each point on the plane is located in a geometric locus of points that is formed by the relationship je 1 ðjÞj þ je 2 ðjÞj ¼ c, where je 1 ðjÞj is the absolute value of the interpretive coordinate of a point j on the first axis, je 2 ðjÞj is the absolute value of the interpretive coordinate of the point j on the second axis and c is a constant.In the interpretive plane, it is established that distinct points situated on the perimeter of the same square possess equivalent interpretive importance for the associated factorial plane.Furthermore, among two points situated in different squares, the point of greater interpretive importance is the one located in the more distant square.

The Shiny library
The Shiny library provides a versatile framework for building web applications using R [32][33][34].It enables users, particularly those with limited software development experience, to develop interactive web applications by leveraging R library functions.Shiny applications can showcase R functions, teach scientific methods or create professional software [33].The framework handles the integration of web technologies like HTML, CSS, JavaScript and reactive programming, streamlining the development process.Typically, a Shiny app includes two essential files: ui.R for the user interface and server.R for backend logic.
Examples for using Shiny library functions are available at https://shiny.posit.co/r/gallery/,along with detailed manuals to help understand object, data and event management in Shiny applications.

Application 4.1 The library InterCa
The InterCa library is detailed in this section, highlighting the interCa package and its functions.The package's full source code is available on GitHub at https://github.com/theintercapackage/interCa. Additionally, the corresponding Shiny application (interShiny) is accessible online at https://interca.shinyapps.io/interCa/.
The initial release of the interCa package includes four functions with documentation.The functions interca(), plot1d() and plot2d() come with examples in their documentation, executable using an auxiliary dataset from the 'ca' package [24].Also note that in the initial online version only the two last data sources are active.This dataset is based on the realworld data from the annual survey by the International Social Survey Programme (ISSP), founded in 1984.The ISSP is a worldwide initiative for social science research, collecting data from individuals in various countries on science-related topics.It involves member organizations from multiple countries responsible for local data collection.The ISSP aims to facilitate international comparisons, focuses on annual themes like social inequality and health, makes data accessible globally for research and improves survey methods.
The interca() function, which stands as the foundational function of the package, accepts two arguments: the first is the dataset to be analyzed.The data to be entered must be in the form of a data table with each column representing a variable.The second argument specifies the number of axes to retain after executing the method.The function employs MCA to generate a list of results, which encompasses the coordinates of the points on the selected axes, the interpretive coordinates, the contribution indices and the quality indices of the points' display.Additionally, the output includes a scree plot for users interested in understanding the variance proportion each factorial axis explains.The plot1d() function is tasked with creating an interpretive axis.It requires two inputs: the first is the result of the interCa() function and the second specifies which interpretive axis to generate.This function then constructs and presents the designated interpretive axis.For creating an interpretive plane, the plot2d() function is employed, taking three arguments.The initial input is a result from the interCa() function; the second and third arguments determine the x-axis and y-axis, respectively, of the interpretive plane.Executing plot2d() visualizes the specified interpretive plane.The interShiny() function initiates the package's accompanying software, enabling users to graphically generate charts, tables and reports that delve into the concepts of interpretive coordinates, axes and planes.This function does not require any arguments.Guidance for launching the application is detailed in the README.md on the package's GitHub page.The library also incorporates auxiliary functions enhancing the "shiny" application.Subsequent sections, enriched with screenshots, concisely outline the application's functionalities available to users.
In Figure 1 as shown below, the application's design includes a sidebar on the left for users to upload their data in .CSV or .XLSX formats and choose the number of factorial axes for interpretive coordinates and plots.The sidebar features checkboxes, pre-selected by default, for whether the data's first row contains variable names and for opting to display the scree plot postexecution, showing up to ten factorial axes initially.It also has a preset for using five independent variables to generate interpretive elements.User selections' results appear in the central panel, under tabs as shown in Figure 2. The first tab shows tables of points-categories coordinates, interpretive coordinates, contribution index values and display quality index values of points after hitting the "Select number of factors" button, allowing for tailored analysis.Artificial intelligence with R library and MCA Figure 3 showcases a tab titled "Interpretive Axis," where users can select the desired number of interpretive axes to generate.The application includes input validation to prevent invalid axis number entries; if a user inputs a number less than 1 or greater than the total axes selected in the sidebar, an error message appears.Upon entering a valid axis number, the users can create and view the interpretive axis.They also have options to export the interpretive axis in.PDF format, the interpretive coordinates in .XLSX format, and produce an automatic summary report of the results.Additionally, when an interpretive axis is chosen, the slider widget auto-adjusts to the average coordinate, enhancing user experience.In the "Interpretive Plane" tab as shown in Figure 4, users can select the horizontal and vertical axes for the interpretive plane, filter points by their interpretive contribution, with input validation for axis selection.The graph, point coordinates and a report can be exported to .PDF, .XLSX and .HTML files, respectively.Additionally, all tables in the application allow for sorting and searching.
Finally, in the "Data" tab, users can browse the entire set of the uploaded data.

Comparison of results between classic factorial diagrams and interpretive diagrams
Using the WG93 dataset, we demonstrate that our package, which applies interpretive diagram concepts, improves the interpretation of results over traditional methods.For instance as shown in Figure 5, although point A_5 appears to be highly important on the

Conclusions
This paper introduces an R package featuring functions for the implementation of interpretive coordinate, interpretive axis and interpretive plane concepts as published in 2022.This package stands as the first, as of the time of writing, across any software or programming language, to implement these concepts.It comprises functions that enable users to produce desired diagrams or outcomes from their data analysis.In addition, the package offers a web application built on the "Shiny" library, facilitating users to upload their data in .CSV or .XLSX formats, conduct multiple correspondence analysis, select and filter points for visualization and download the resultant charts in .PDF format as well as the relevant tables in .XLSX format.The application also supports downloading an automatic report in .HTML format that includes the results and interpretations of the interpretive diagrams.The development of this software, as discussed in the literature review section, aims to enhance user interactivity, ease the application process and improve comprehension of the implemented methods.The primary goal of this initiative is to open access to this novel approach, especially for nonspecialist users who seek to apply and integrate the analysis results into their research.Future directions will involve enhancements to the graphical user interface (GUI) and functionality, informed by ongoing research on the interpretation of MCA.
Artificial intelligence with R library and MCA

Figure 1 .Figure 2 .
Figure 1.Starting screen of the software