Classification and categorization of Brazilian agricultural startups (Agtechs)

Purpose – This paper aims to identify and analyze the agtech classification and categorization systems in the Brazilian context. Design/methodology/approach – The systematic literature review (SLR) was carried out according to the protocol of Kitchenham and Charters (2007). The classification systems found in literature were evaluated using the thinking aloud protocol, as proposed by Ericsson and Simon (1993). The responses obtained were evaluated through lexicographic analysis, described byBécue-Bertaut (2019) and content analysis, described by Bardin (2011). Findings – SLR identified four agtech classification systems. The model proposed by Dias, Jardim, and Sakuda (2019) was the one with the highest adherence to classify Brazilian agtechs. From the analysis of the systems found in literature, the authors proposed a new categorizationmodel of agricultural startups (agtechs). Research limitations/implications – The study has limitations in relation to the theoretical and empirical validation of themodel proposed by the authors. This limitation can be the subject of subsequent research. Practical implications – The SLR study considers the evolution of the classification systems of a new agribusiness reality, the agtechs. In addition, there is a practical contribution in proposing a new classification system that attempts to address some of the limitations found in previous studies. Originality/value – Agtechs are startups focused on developing solutions for agriculture and have shown a significant increase in recent years. However, there are few studies focused on this type of company. Even rarer are the studies that seek to classify and categorize them. The present work opens the horizon for future studies focused on this new reality.


Introduction
The agricultural sector, represented by agribusiness, is one of the main sectors supporting the Brazilian economy. While other sectors of the economy tend to experience greater impacts in times of economic recession, agribusiness is generally less impacted. Between 2000 and 2018, the Gross Domestic Product generated by agribusiness [GDP-Agribusiness] grew 320%, reaching 1.6 trillion reais (Center for Advanced Studies on Applied Economics [CEPEA], 2018), while Brazilian GDP in 2018 was 6.8 trillion reais (Brazilian Institute of Geography and Statistics [IBGE], 2018). According to Barros (2017), between 2014 and 2017, the GDP of the industrial transformation and services sectors decreased by around 12.1 and 5%, respectively, in contrast to the agricultural sector, which had an increase of 11.7%.
It is undeniable that much of GDP-Agribusiness is the result of large-scale production from large agricultural companies. However, the participation of small and medium-sized agricultural companies in GDP-Agribusiness increases every decade. According to Guilhoto, Silveira, Ichihara, and Azzoni (2006), about 10% of the GDP-Agribusiness comes from small and medium rural companies. When considering medium and small companies, from 2014 onwards, they represent 27% of all GDP (Costa & Leandro, 2016).
The adoption of new technologies, mainly tools from Agriculture -4.0, also called Digital Agriculture, such as drones, sensors, machine to machine communication (M2M) linked to the internet of things (IoT), agricultural data processing and creation of applications for management decision-making, was decisive for Brazilian agriculture to achieve high levels of production and profitability (Massruh a & Leite, 2016).
The offering of innovative (or disruptive) products and services in agribusiness is, in part, carried out primarily by startups. According to Blank and Dorf (2012), startup is a temporary organization in search of a scalable, recurring and profitable business model. In addition, a startup can be understood as a human institution designed to create products and services in situations of uncertainty (Ries, 2012).
According to Dutia (2014) and Manne and Stout (2017), startups focused on agriculture, called agtechs, are companies oriented toward technological advances in chemical, biological, administrative and mechanical processes. These advances bring greater income to agricultural crops, in addition to reducing production costs and the complexity of agricultural activity.
In recent years, billions of dollars have been invested in activities covered by agtechs. Studies indicate that this global amount ranges from US$3bn (Graff, Silva, & Zilberman, 2019) to US$6.8bn annually (AgFunder, 2019).
Although the diversity of services offered by agtechs is a differential to encourage investment and enable satisfactory gains for entrepreneurs, it hinders the classification and ordering of the business model of these startups.
According to the presented, the objective of this article is to carry out a systematic literature review (SLR) to identify and analyze the systems of classification and categorization of agtechs in the Brazilian context. This SLR sought to answer the research question: "How to classify and categorize Brazilian agtechs?" In addition, it aimed to propose a new classification system that considers the possible gaps in current systems.

Literature review
Startup is a term used to designate companies newly established in the market and that are in the process of validating their business model. They are generally characterized as innovative and disruptive, in addition to presenting high risks in the product concept and relatively low operating cost (Ries, 2012).
A startup differs from a traditional company because the latter seeks growth and profitability, while the startup aims to verify whether its business model can develop into a sustainable and profitable business. When the uncertainty regarding the validity of the business model disappears, the startup moves to a new stage, since the company's objective becomes growth and profitability, like any conventional company (Blank & Dorf, 2012).

Research method
The research method of this work considered three stages: (1) systematic literature review (SLR), carried out according to the protocol of Kitchenham and Charters (2007); (2) field research, which used three techniques: thinking aloud, as proposed by Ericsson and Simon (1993); lexicographic analysis, described by Bécue-Bertaut (2019); and content analysis, described by Bardin (2011);and (3) proposition of a categorization model.

Systematic literature review
Different references (i.e. articles published in scientific journals and academic congress, white papers and books) were examined to identify patterns and relationships about the agtechs construct. The SLR technique was used to discover the main types of agtechs classification cited by previous studies. The grouping of the mentioned categories and the development of a descriptive analysis of the construct was performed through conceptual affinity segmentation. According to Crossan and Apaydin (2010) and Pittaway and Cope (2007), the use of literature with a systematic approach provides a favorable context for a better understanding of the ideas and theories on the subject. Additionally, the SLR allows theoretical and empirical investigation, to build the relationship of the current state of the art, with a focus on future research.
The systematic literature review was conducted according to the method proposed by Kitchenham and Charters (2007), which consists of seven steps: (1) selection of the research question that will guide the study; (2) selection of terms and digital libraries; (3) identification of the inclusion and exclusion criteria of the studies; (4) identification of the quality assessment procedures of the selected studies; (5) data extraction and synthesis; (6) quantitative or qualitative analysis of the results; and (7) presentation of the summary of the documentation and availability.
Exclusion criteria adopted in this SLR were cross-references in the databases and adherence to the study area.

Field research
The classification and categorization models obtained in the SLR were analyzed considering their positive and negative aspects. This analysis was performed with the aid of the thinking aloud verbal protocol, proposed by Ericsson and Simon (1993). Five thinking aloud sessions were held with researchers in the field of agribusiness. The researchers work at the Federal University of São Carlos, Campus Lagoa do Sino. These researchers were selected for having knowledge in the area, as they carry out research in the administration course focused on agribusiness systems.
In each session, participants listened to an audio with a brief introduction about the agtechs and their classifications, the rules of the session and a reminder for the participant to speak throughout the process. After the audio, the participants received four sheets of A4 paper with the different types of classification found in the SLR. The participants also received another sheet of A4 paper with a task to be performed by the participants: to present the positive and negative aspects of each model.
The recordings were listened, transcribed and interpreted in search of similarity patterns using free software linked to the R statistical package (IRAMUTEQV R -Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires). Analysis platforms followed the guidelines developed by Bécue-Bertaut (2019) for lexicographic analysis and Bardin (2011) for content analysis.
Lexicographic analysis consisted of two steps: the first considered the preprocessing of thinking aloud sessions. In this stage, aspects related to the ironies and jokes of the participants' speeches were eliminated. This step is important, as this type of analysis considers literal aspects of the discourse. The second step was the lexicographic analysis, which initially included the elaboration of a lexical table containing the positive and negative aspects of each classification, according to each participant in the thinking aloud session. After the creation of this table, a correspondence analysis of the lexicons was performed, adopting the singular value of each speech decomposition (Greenacre, 2010). From this decomposition, frequency graphs were created to represent the dimensions of the statements obtained.
For content analysis, the transcribed material was segmented into units of analysis. These units contained segments of the interviews, considering the negative and positive aspects of each classification system. The units were then subjected to inferences and interpretations, supported by the theoretical framework studied.

Proposition of a new categorization model
Based on the analysis of the classification and categorization models identified in the literature and the results of the field research, the authors proposed a new categorization model for agricultural startups (agtechs). For the construction of this new model, both the positive and negative segments of the lexicographic and content analysis of the classification models obtained in the SLR were considered. These analyses, in turn, were originated from notes obtained in the sessions of the thinking aloud verbal protocol. Figure 1 shows the steps and respective filters used in the SLR. From this SLR, 610 entries were initially obtained, whose study spectra were related to the filters used. Although the chosen databases were appropriate for conducting the desk research, it allows the same entry to be indexed in more than one database. Therefore, an exclusion criterion was used, eliminating 326 duplicate entries. The resulting 284 articles were analyzed by reading the abstract and, within this step, 254 entries were eliminated. These exclusions were performed because, although the entries met the filters, they did not present a model of agtech classification. The remaining 30 entries were analyzed by reading their full content and checked whether their existing agtech classifications were addressed effectively or even tangentially. According to this criterion, only four entries presented a complete and structured model of agtech classification that could be used in the Brazilian context. Table 1 shows the analyzed articles.

Results of the systematic literature review
The first classification model cited was proposed by Graff et al. (2019), which considers the types and service offerings that each agtech can deliver for specific categories. The categories presented were Business & financial services; Online services and content; Biotech, genetics and heath; Chemicals; Software, data and information technology (IT);

Figure 1.
Steps and filters used in SLR INMR Electronic devices and sensors; Machinery and equipment; Agricultural production; Marketing, processing and distribution; Consumer products, services and retail; Agricultural inputs, distribution and sales; and Unspecified. Figure 2 illustrates this classification model. The second classification model was proposed by Dias, Jardim, and Sakuda (2019) and considers the segmentation of agtechs based on two dimensions. The first is linked to a classic agribusiness approach, developed at Harvard University in 1957, which correlates the production processes within the farm with upstream (activities before the production) and downstream (activities after the production) stages. The second dimension regards the operating market and technological field of these companies. This classification model offered 33 categories, divided as follows: Before the Gate (  The last classification model was proposed by Dutia (2014). According to him, this model was the precursor of the agtech classification systems. Basically, it focuses on the categorization of the agribusiness value chain, more specifically on the production chain.
The categories created by the author to classify agricultural startups were as follows: Technological inputs, with a subdivision into physical and informational inputs; Animal production; Crop production; Agricultural processing; and Manufacture and distribution. Figure 5 shows this model. Although different, these classification models were created from empirical efforts and research analysis. According to Piedade (1977), a classification system consists of the division into groups or classes, considering their differences or similarities. This division provides the elements that allow the identification of a classification procedure: the organized and systematic formation of groups and the ordering of Brazilian agricultural startups certain data sets based on shared characteristics (Araújo, 2006). In addition to understanding the agribusiness chain, it is important that the proposed categories have density and depth. According to Gardner (1996), the classification based on categorization depends on the degree of sharing crucial characteristics of the peripheral prototypes with the central one. The greater the degree of sharing, the greater the density of the categories generated. And, consequently, the better the classification will represent the reality.

Field research results
In each thinking aloud session, an additional task was proposed to each participant. The task consists of preparing a list of phrases and words that should qualify the most and least adherent system to the Brazilian context. The sentences were submitted to a lexicographic analysis, with the aid of IRAMUTEQV R . Figure 6 shows the 15 most cited words in the least adherent classification to the Brazilian context, which were: "Dutia", "generic", "application", "no", "agribusiness", "low" (both male and female words in Portuguese), "potential", "basic", "chain", "categories", "classification", "poor", "reality", "you".
When asked which classification would be the least adherent to the Brazilian context, all participants mentioned the classification of Dutia (2014). The word "generic" was used by 100% of the participants; "low" (both male and female words in Portuguese) and "application" were cited by 60% of the respondents; and "poor" and "basic" by 40% of them.
When asked about which classification would be the most adherent to the Brazilian context, 80% of the participants mentioned the classification proposed by Dias et al. (2019). The word "precision" was mentioned by 100% of the participants, while "broad/comprehensiveness" was mentioned by 75% of the respondents, and "effort" by 50% of them.  (2014) were analyzed considering the theoretical foundations of the classification, i.e. whether these classification models were based on the separation of groups according to differences or similarities; and whether the categories presented density and depth. This analysis was based on the content analysis of the thinking aloud sessions.
Participants were identified by the letter E, followed by Arabic numerals. Table 2 shows the positive and negative aspects of each classification model.

INMR
Thus, the positive and negative points were discussed according to the following criteria: rationality of categories; depth of categories; density of categories; comprehensiveness of the model; superposition of categories; and segmentation of categories. Each of these criteria will be explained in more detail below.

Rationality of categories
According to Manhein (1942) and Simon (1997), rationality is defined as several organized measures that lead to a previously defined objective, by using the most sundry components of these measures. Such measures will have the best conditions when they coordinate the means efficiently to achieve the initial objective.
In classification systems, rationality is important because it supports the formation of systematized and organized groups. It enables the proposed categories to unveil a conscience; hence, the meanings of each description are deliberately understood by individuals.
The model proposed by Graff et al. (2019) has a positive aspect regarding the rationality of the categories. According to the participant of this research [E2], positive rationality was found when there was a concern to create categories based on technological aspects. Currently, technology is responsible for increasing the productivity rates of Brazilian agriculture: "[. . .] in this classification system there is a rationalization of categories through a different technological bias. It was not considered just a categorization of input, processing and output data, but a thought to understand their motivations and reasons." [E2] [E3] and [E5], in turn, believe that this model acts rationally when, in the creation of categories, it presents an effort of reflection and thinking: "When creating a category, it's not just because something is [. . .] different that I create a category. It is necessary to seek a rational and motivational effort that brings the need for separation." [E3] "When reflecting on the construction of this classification model, it is evident that the authors tried to follow a guiding line that permeated the entire systematization. "Although it is not so complete, when presenting the classification considering aspects before, inside and after the farm, it creates categories with rationality, i.e., it considers the main points of agribusiness." [E2] According to [E5], Dutia's model (2014) has a positive aspect in relation to rationality, as it seeks to develop the categories based on the synergism between agribusiness value chain and technological application: "I understand [. . .], that all categories are related to technological application. I think this is essential, if you are talking about startup, then it is consistent with the key concept." [E5] According to the participants of the thinking aloud sessions, none of the models lacked rationality. This shows that, although they have some deficiencies, the categories were created from an awareness of productive activities and from the needs of farmers.

Depth of categories
According to Houaiss (2001), depth is an attribute of deep. Deep refers to something farreaching; very important. For Pozzebon, Freitas, and Petrini (1999), the depth of categories comprises the richness and magnitude of detailing the contents described by them. The greater the depth and volume of information within the categories, the more easily they are understood and the more beneficial and advantageous their use are (Pipino, Yang, & Wang, 2002). Depth has a positive aspect in the classification models developed by Dias et al. (2019) and Graff et al. (2019). Negative aspects in relation to depth were found in the classification models proposed by KPMG (2018) and Dutia (2014). For [E2] and [E4], the KPMG model has low depth, which may be the result of a superficial view of the agribusiness chain. In his speech, [E1] highlights the low depth of the model developed by Dutia (2014), possibly due to a simplified view of the agricultural activity. Below are the respective arguments: "The classification is interesting, but it could be a little more precise and deeper. The complete agribusiness chain still needs to be understood." [E2] "This macro classification looks like a classification made by those who are not from the area. It seems to have been created by an administrator, external to the agribusiness area, and who tries, without knowing the particularities, to make a classification." [E4] "The definition of the categories is arbitrary, each researcher can insert the startup in whatever category he/she wants, since the categories are simplistic." [E1].

Density of categories
According to Houaiss (2001), the word dense is defined as something that has a large mass in relation to the bounded volume. Density, on the other hand, refers to the level and amount of information in relation to the decision space.
In a model, the created category should present density in the information to allow the user an easy reading and understanding of their contents. Furthermore, density of categories properly guide the search for information or the resolution of problems in question (Gamez, 1998).
Density has a positive aspect in the classification model developed by Dias et al. (2019). According to [E4], these authors sought to combine similar characteristics, creating a welldefined and more usual model: "I think there has been a great advance on the issue of creating denser categories, which add similarities. This makes it smoother, helping in the startup's classification process [. . .]." [E4] INMR The models developed by KPMG (2018) and Dutia (2014) presented negative aspects regarding density. According to the participant [E4], the KPMG (2018) model, although rational, presents an incomplete consideration on the density of information presented in each category. This same participant indicates that Dutia's model fails to consider the density criterion because it focuses on farm activities and not in agribusiness as a whole. The following segments exemplify the opinion of [E4] on the respective models: "Creation requires reflection to add density, [. . .] it is not just to exclude the different, but to group the singularities. It seems to me that this classification did not bring that." [E4] "If we think about a categorization of startups for farms or production systems, perhaps it will contemplate them [. . .]. So we are not talking about startups for agribusiness or for the value chain." [E4] 5.4 Comprehensiveness of the model According to Tristão, Fachin, and Alarcon (2004), the definition and choice of the classes that constitute a classification model are related to the comprehensiveness and needs of use for each model.
In classification models, groups are created, whose information ranges from the most comprehensive (basic classes) to the most specific concepts (focused classes) (Ranganathan, 1967).
The comprehensiveness in classification models can be understood as the amplitude that a given category or class has in relation to the total links. The more comprehensive the classification model, the greater its importance to represent the analyzed environment and the greater its usability.
According to the participant [E1], the model developed by Dias et al. (2019) has a positive aspect regarding the comprehensiveness of the model, as it can be used by several actors, who may have different domains within agribusiness: "[. . .] you can provide a typology that can be widely used, from students [. . .] to doctors. In addition, it can be a great reference for investors to do market research, to decide whether to invest or not." [E1] [E3] and [E4] addressed the negative aspect in relation to the comprehensiveness of Dutia's model (2014). According to these participants, when trying to be comprehensive, the model impoverishes the description of the categories, which can lead to failures in the classification:

Superposition of categories
In classification models, the superposition is the intersection between two conceptual information. It occurs when substantial information, which characterizes a certain category, also appears in a different category. There is a conceptual inclusion relationship between them (Carlan & Medeiros, 2011).

Brazilian agricultural startups
According to Apostel (1963), the categories should be exhaustive, i.e. they must cover the entire extension of the domain to be classified. They should never be empty or superposed.
The superposition of categories presented a negative aspect in the models proposed by Graff et al. (2019), KPMG (2018) and Dutia (2014).
For [E4], the model of Graff et al. (2019) presents 12 major classifying groups and, in most of them, the substantial information that would determine the category is the same: "[. . .] my criticism would be in the sense that [. . .] we still have a large superposition of things, which seem to me to be the same things when I read." [E4] When faced with the model developed by KPMG (2018) 5.6 Segmentation of categories According to Spiteri (1998), segmentation uses global information resources to obtain specificity of the subjects and, therefore, create classes or categories that adhere to the studied reality. The segmentation process of informational content affects the user, whether in the interpretation of information or in the use of classification systems (Azevedo, 2008).
Although it is important to segment to bring specificity, classifications with many categories, which are too segmented, can be inaccurate representations of the information in the studied area (Furgeri, 2006 Thus, similarly to what was observed for the superposition of categories, there were no positive aspects mentioned in this topic by the participants in the thinking aloud sessions. INMR 6. Proposition of a model to categorize agricultural startups (agtechs) Considering the positive and negative aspects pointed out in the thinking aloud sessions, a new classification model for Brazilian agtechs was proposed, as shown in Figure 8.
The model presented has its centrality based on the links of the agribusiness production chain (supplies and equipment, before planting, production, post-production and consumption), on the operational production processes (plant, animal and forestry) and on the peripheral production services (support and regulation services). This centrality allows the model to absorb positive aspects of rationality.
The rationality in the proposed model means that the services offered by each agtech are linked to a specific agricultural activity routine, considering that each stage has a service with greater demand and/or greater impact.
Similarly, rationality is considered in this new model when proposing groups according to the operational production processes and peripheral production services. With the creation of these categories, the model seeks to better target the service provided by agtechs with their specific customers, shortening the search time for the best service provider.
Considering the depth of categories, the proposed model was based on the binomial "links in the production chain" and "operational production processes". This binomial allowed the description elaborated in each created category to present the necessary depth for the agtechs (insertion in the category that represented the majority of the services offered by the agtech or those with the greatest impact on its routine) and consumers (understanding of the service offered).
Regarding the comprehensiveness, the model proposed in this article has an amplitude that covers the entire agricultural production chain. Its configuration allows the separation of the agtechs that provide services in the initial stage of the operational production process (supplies and equipment category) from those that provide services to the final consumer (consumption category). The last positive aspect used to compose the proposed model was density. Density refers to the volume of information in relation to the decision space. The model developed in this article sought to create categories that provide a satisfactory amount of information for the decision process of both the agtechs and the users of the model. Regarding the negative points of the models found in the literature, the proposed model tried to reduce the superposition of categories and the excessive segmentation. This reduction was possible by the creation of a model whose centrality was obtained by the trinomial "links in the production chain", "operational production processes" and "peripheral production services", which allows the categorization necessary to cover the entire agricultural system.

Conclusions
The objective of this article was to conduct a systematic literature review (SLR), which sought to raise the main classification systems adherent to Brazilian context of agtechs, as well as to propose a new classification system. However, there was no intention to exhaust all debates about classifying Brazilian agtechs.
The contribution of this study was to point out what is currently being developed in relation to the classification models, their adherence to the Brazilian context, as well as the positive and negative points in the development of models applied to agribusiness. At the end of this contribution, it was possible to present a model that, after validation, can be another alternative to guide the classifiers inserted in agricultural systems.
The four classification models resulting from the SLR passed the scrutiny of researchers with business expertise in the agro-industrial systems, which allowed a critical view of their negative and positive aspects.
The main aspects raised by the researchers who participated in the thinking aloud sessions were rationality, density, depth, superposition and segmentation of the categories and comprehensiveness of the models.
Based on the lexicographic and content analyzes of the thinking aloud sessions, the model developed by Dutia (2014) was considered the least adherent to the Brazilian context, as it presented superposed categories, low density and depth, in addition to the low comprehensiveness of the model. The model developed by Dias et al. (2019), in turn, was the most effective in classifying Brazilian agtechs. The main positive characteristics of this system were the wide comprehensiveness of the model and the high density and depth of the categories.
Regarding the practical applications, this work can support researchers in the search for greater knowledge on this subject, as it highlights the current knowledge on agtech classification models.

Future research
After completing this SLR and given the findings of this article, we can suggest some guidelines for future work. The first refers to the continuation of this study, through the submission of the models proposed by Dutia (2014), KPMG (2018), Graff et al. (2019) and Dias et al. (2019) to the scrutiny of other actors involved in the innovation ecosystem of the agricultural sector, such as angel investors, students and entrepreneurs who perceives agribusiness from another perspective. New points of view are important, as they can bring new aspects that will contribute to the improvement, application and usability of these models. As a field research method, the application of the thinking aloud verbal protocol is suggested.
The second driver would be the proposition of an academic investigation to verify which aspect, among those presented in this article (rationality, density, depth, superposition and segmentation of categories and comprehensiveness of the models), have the greatest influence, both in the process of creating the model and in the success and/or failure of the adoption of the classification model created. This type of investigation seems to be relevant because, in some situations, authors may choose an aspect that does not have the greatest impact in the context to be classified. As a field research method, it is suggested to carry out a quantitative-type survey, where weight is attributed to each aspect studied, through scales ranging from (1) little important to (5) extremely important.
A third driver for future work would be to conduct a research to validate the model proposed in this article. A model is a simplified representation of a real system and, therefore, needs to be validated (mainly its components), so that there is no doubt about its representation and applicability. Initially, it is suggested to apply this validation to the same participants who collaborated with this article. In a second step, it is necessary to seek the opinion of other actors in agribusiness. As a field research method, the thinking aloud protocol is again suggested.