Agglomeration e ﬀ ects and informal ﬁ rms in the internal structure of cities

Purpose – Thispaperaimstoestimatetheeffectofagglomerationontheprobabilityofbeinganinformal ﬁ rmin Cali, Colombia. Informal ﬁ rms produce legal goods but do not comply with of ﬁ cial regulations. This issue is relevant because, similar to other developing countries, the informal sector in Colombia employs more than 50 per cent of the workforce. The results of this study demonstrate that one standard deviation increase in agglomeration reduces by 52 per cent the probability of being informal. Results are consistent with the idea that informal ﬁ rms bene ﬁ tlessfromagglomerationbecauseoflegalrestrictionsthatblocktherelationshipwithformal ﬁ rms. Design/methodology/approach – The objective of the present paper is to estimate the effect of agglomeration on the probability that a ﬁ rm – given a location – chooses to be informal. The authors deal with endogeneity issues by using soil information related to earthquake risk, which reduces the height of buildingsandtherefore increases thecost of agglomeration.The analysis focuseson Cali,Colombia, wherethe informal sector employs 60 per cent of the workforce. The registration of economic activities is used as a criterionto identifyinformal ﬁ rms, in such a way that thepercentage of informal ﬁ rmsis 42 percent. Findings – The authors ﬁ nd that the effect of agglomeration is strongly negative. The probability of being informal diminishes by 52 per cent when agglomeration increases by one standard deviation. Results in this paper shed light on how formal ﬁ rms tend to be localized in high-density commercial and industrial areas, while informal ﬁ rms are localized in low-density and peripheral areas where the land for production is cheaper andwhere they canavoid thecontrolofauthorities. Originality/value – Theory argues that spatial production externalities and commuting costs are among the main forces that shape the city ’ s internal structure. Externalities include effects that increase ﬁ rms ’ production, and therefore workers ’ income, when the size of the local economy grows. The authors now have strong evidence that ﬁ rms ’ productivity is positively related with the volume of nearby employment. Most of the empirical ﬁ ndings concern ﬁ rms in the formal sector and, accordingly, the literature says little about the effect of agglomeration on informal ﬁ rms ’ location. However, this effect is crucial for developing countries whereinformal workis the main optionforless-educated workersfacing unemployment.


Introduction
Cities can be studied as market responses to production and income opportunities (Mills, 1967). Accordingly, the size and growth of urban areas can be interpreted as responses to these opportunities [1]. Indeed, there is a strong positive relationship between productivity and economic density for different industries and levels of aggregation (Ciccone and Hall, 1996;Ciccone, 2002;Brulhart and Mathys, 2008;Melo and Noland, 2009;Combes et al., 2010;Morikawa, 2011).
Theory argues that spatial production externalities and commuting costs are among the main forces that shape the city's internal structure. Externalities include effects that increase firms' production, and therefore workers' income, when the size of the local economy grows [2]. We now have strong evidence that firms' productivity is positively related with the volume of nearby employment (Rosenthal and Strange, 2003;Combes and Gobillon, 2015). Most of the empirical findings concern firms in the formal sector and accordingly, the literature says little about the effect of agglomeration on informal firms' location (Duranton, 2009). However, this effect is crucial for developing countries where informal work is the main option for less-educated workers facing unemployment. Indeed, in several countries, more than 50 per cent of employment is in the informal sector (Maloney, 2004;Perry et al., 2007).
One of the main factors that explain informality is a significant cost related to formalization. A representative entrepreneur evaluates costs and benefits and could find that it is efficient to choose informality. Indeed, the literature points out that high taxes and social security contributions are costly regulations that lead entrepreneurs to not set up formal business and hence to not register their firms (de Soto, 2000;Maloney, 2004)[3]. Nevertheless, there is a threshold in the level of the economic activity when the incentives of formalization become more important.
The informal sector comprises a heterogeneous mixture of self-employed entrepreneurs, small and short-life firms that, although they produce legal goods, do not comply with legal regulations. When firms choose to be informal they do not have access to all markets where property rights are secure and well-defined. When a firm is informal, the owner faces restrictions in the financial system: they cannot make long-time capital investments and cannot use their property as collateral to secure loans (Feige, 1990;de Soto, 2000;Sindzingre, 2006). As a result, informal firms have lower productivity levels, lower fixed assets per worker and less access to government services than formal firms (Cárdenas and Mejia, 2007;Santa Maria and Rozo, 2009). In addition, they do not comply with labour regulations, may practice smuggling and, frequently, do not carry accounting [4].
We have evidence that formal and informal firms (of similar size and belonging to the same economic sector) display different locational patterns within an urban area (Moreno-Monroy and García, 2015). Furthermore, there is evidence (for São Paulo) that informality rates decrease on average 15 per cent faster in areas with new transport infrastructure (Moreno-Monroy and Roman, 2015) in line with the idea that informality is a choice based on a cost-benefit calculation of the entrepreneur.
The objective of the present paper is to estimate the effect of agglomeration on the probability that a firmgiven a locationchooses to be informal. We deal with endogeneity issues by using soil information related to earthquake risk, which reduces the height of buildings and therefore increases the cost of agglomeration. The analysis focuses on Cali, in the west of Colombia, where the informal sector employs 60 per cent of the workforce. Using the registration of economic activities as a criterion to identify formal and informal firms (Schneider, 2005), we identify informal firms when they are not registered in the Chamber of Commerce, in such a way that the percentage of informal firms is 42 per cent.
We find that the effect of agglomeration is strongly negative. The probability of being informal diminishes by 52 per cent when agglomeration increases by one standard deviation. Results in this paper shed light on how formal firms tend to be localized in high-AEA 27,80 density commercial and industrial areas, while informal firms are localized in low-density and peripheral areas where the land for production is cheaper and where they can avoid the control of authorities.
This paper proceeds as follows: Section 1 contains the theoretical framework, Section 2 presents the data and a discussion about the results and Section 3 concludes.

Theoretical framework
A linear city model where employment clustering is determined by an agglomeration externality was introduced by Fujita and Ogawa (1982)[5]. An improvement was made by Lucas and Rossi-Hansberg (2002) where, in a circular city model, firms and workers compete for land at different locations, and the external agglomeration effects lead firms to outbid residential use for land near production centres. The interaction between agglomeration effects and commuting costs is then the main determinant of urban structure: firms have an incentive to be close to each other to obtain benefits from agglomeration, whereas workers prefer proximity to the workplace to minimize commuting costs. Then, market prices (land rents and wages) give firms and households incentives for making land use decisions.
Workers consume residential land and a good which is produced using labour and land. If productivity increases with employment levels in neighbouring locations, firm production per unit of land at location s, x(s), is expressed as x s ð Þ ¼ A z s g n a , where z s represents the agglomeration effect on production at location s, A is a productivity constant, n is the number of employees and a < 1[6]. The profit per unit of land at location s is represented as q s ð Þ ¼ A z s g n a À w s ð Þn., where w(s) is the wage rate. Firms choose employment n to maximize profits. From the first-order condition we obtain n ¼n w; z ð Þ and q ¼q w; z ð Þ. Therefore, given w and z, the business bid rent is determined [7]. The model implies that land use depends on the difference between bids made by households and firms.
For simplicity, let us assume that informal and formal firms are identical except that g = 0 for informal firms, because they are not allowed to have formal business contracts. When g = 0 for informal firms, we assume that informal firms cannot benefit from agglomeration effects. These informal firms have the following production function: The model predicts then that these firms will be less productive and smaller. Now let us suppose that firms can choose to be formal or informal at a given cost c > 0. Given this assumption, firms will be less likely to choose being informal given higher levels of agglomeration. This happens because informal firms benefit less from positive technological and pecuniary externalities that arise in close spatial proximity due to legal restrictions that block the relationship with formal firms. Formal property allows assets to be identified and linked to other assets in the economy. Then, to benefit from external effects, the owner has incentives to formalize the business. As a result, we will find formal firms located in high density areas meanwhile informal firms will be located in low density areas. Accordingly, we aim to estimate the causal effect of agglomeration on the probability of being informal (controlling for the firm size and the economic sector).

Data
We focus on Cali, which is the third city in terms of population in Colombia. It was founded on 25 July 1536 and is located in the west of Colombia, in the Cauca Valley. The metropolitan area of Cali has a population of about 2,200,000 and a density of 21,295 persons per square Agglomeration effects and informal firms kilometre (in 2005). The city has two natural limits: the limit to the east is the Cauca River and the limit to the west is the Western Mountain Range (Cordillera Occidental) [8]. The urban area is divided into 338 administrative neighbourhoods with an average of 0.36 km 2 .
Our main source of information is the Economic Census carried out by the National Institute of Statistics (Departamento Administrativo Nacional de Estadística, DANE). The database contains population data as well as establishment-level information, including employment, economic sector (two-digit level), geographical location and compliance with legal requirements. From now on, we will refer to establishments as firms.
Information about the compliance of legal requirements enables us to identify informal firms which, although they produce legal goods and services (e.g. bread, shoe repair, groceries shop), do not fully comply with legal regulations [9]. This means that informal firms are known, but not registered in the Chamber of Commerce (Camara de Comercio de Cali), which certifies firm ownership. These firms evade taxes, have less rigorous bookkeeping, do not contribute to social security and face restrictions to formal financial credit (Cárdenas and Rozo, 2009)[10]. Furthermore, informal firms are not allowed to have commercial or financial relations with formal or public firms. In Cali, there are 22,208 informal firms, which is about 43 per cent of the 51,457 firms.
We measure agglomeration, z s , as a weighted average of the number of jobs at locations h, with weights that are a decreasing function of distance between s and h (see also Koster and Rouwendal, 2013)[11]. To be specific, z s is defined as follows: where J h denotes the number of jobs at location h, d sh denotes the distance between locations s and h, and d > 0 is a given decay parameter. The external effect is more localized for higher values of d , which implies that the value to a firm of locating near other producers is also higher. Figure 1 shows the weights as a decreasing function of distance, and it depends on the decay parameter d . The horizontal axis measures the distance between locations. For instance, if d = 5, z s will be similar at the number of jobs in the location s; if d = 0,5 then the surrounding areas are very important. Panel (a) in Figure 2 shows the agglomeration variable for d = 2. We can observe that employment density is high near the city centre; as we move away from the centre we find areas of residential use. Panel (b) shows that the proportion of informal firms is higher in areas away from the city centre; and Panel (c) shows that the east of the city is affected by a liquefaction risk, which is present in areas where the soil is saturated with water and then acts like a liquid when shaken by an earthquake [12]. From among 338 neighbourhoods, 158 are affected by liquefaction risk in the east of the city, which represents 47 per cent of the neighbourhoods. Table I presents the descriptive statistics of the main variables by neighbourhood. On average, the share of informal firms in a neighbourhood is 0.42, with a standard deviation of  Notes: 338 neighbourhoods. In levels, the agglomeration variable has a mean equal to 3,648, and the standard deviation is 4,020. We use these data to calculate the effect of a one standard deviation change Agglomeration effects and informal firms 0.31. On average, a firm has five workers and the city has around 727 jobs per neighbourhood, of which 551 are formal jobs and 176 are informal. The area of a neighbourhood is, on average, 360 m 2 . The Central Business District (CBD) is localized in the city centre of Cali, around the Plaza de Caicedo (it is identified by the orange circle in the first map of Figure 1. It is the neighbourhood with the highest employment density). The average distance between a neighbourhood's centroid and the CBD is approximately 3.8 km, and the average distance between a centroid and the main corridors of the city is 560 m. Next section describes the methodology.

Methodology
We estimate a probit model where the dependent variable, y is , is equal to 1 for an informal firm i in a neighbourhood s, and equal to 0 otherwise. We are interested in the effect of agglomeration z s , hence we assume: where X contains the effect of agglomeration z s and variables which represent firms' characteristics such as industrial sector (two-digit-level ISIC) and the number of workers. Moreover, it includes spatial variables such as distance from the city centre (in discrete categories to capture non-monocentric effects), distance from the main corridors (also in discrete categories), and X and Y coordinates, which are included to control for unobserved factors that smoothly vary over space.
The agglomeration variable, z s , depends on the value of d , which is unknown [13]. We have estimated the model for different values of d (from 0.1 to 5). The maximum fit occurs when d is 2 (see Figure A5 in the Appendix: R 2 and log-pseudo likelihood using different values of d ). We report results using this value, which implies that the agglomeration effect disappears within 2 km [Panel (a) in Figure 2 shows the geographical distribution of agglomeration and Table I shows that the logarithm of this variable has a mean of 7.81 with a standard deviation of 0.87; we use values in levels to interpret the effect of the variable on the probability of being informal][14].

Main results
The estimated marginal effects using information from 51,454 firms are presented in Table II (standard errors are clustered by neighbourhood). We control for size and industrial Notes: Marginal effects at the means of the independent variables. Clustered standard errors by neighbourhood in parenthesis; *** p < 0.01; a The F-statistic for weak instruments is higher than 10, which implies that instrument has a strong effect on agglomeration (Stock and Yogo, 2005) AEA 27,80 sector of the firm (to control the productivity), distance from CBD and distance from main corridors. In the probit estimation the marginal effect of agglomeration is negative (À0.1051), which means that one standard deviation increase in agglomeration reduces the probability of being informal by 7.8 per cent [15]. The estimated effects of the Probit model are likely to be biased because of the presence of omitted variables which are correlated with agglomeration and the probability of informality (Ellison and Glaeser, 1999;Bayer and Timmins, 2007). These omitted variables may be related to the educational level of employees and fixed assets of the firm.
Using an instrumental variable that is correlated with agglomeration but uncorrelated with any unobserved locational advantage may correct the bias. Geological variables such as soil composition, rock depth, water capacity, soil erodibility and seismic and landslide hazard have been used for coping with endogeneity (Rosenthal and Strange, 2008;Combes et al., 2010;Combes and Gobillon, 2015). Characteristics of soil were important to localize original settlements, and agglomeration processes have then developed in those areas. In that case, the instrument is relevant. Our instrument is based on liquefaction risk, which refers to the strength and stiffness of the soil when it is affected in the case of earthquakes. Liquefaction risk is present in areas where the soil is saturated with water and then acts like a liquid when shaken by an earthquake. Earthquake waves cause water pressure to increase in the sediment, so sand grains lose contact with each other and the soil loses its ability to support high buildings. It can be argued that the instrument is exogenous because there is no reason to say that the probability of being informal affects the liquefaction risk of one area.
Panel (c) in Figure 2 shows areas with liquefaction risk in Cali. This risk is present in the east of the city because of the Cauca River [16]. For most neighbourhoods the risk is either zero or one. When a proportion of the neighbourhood is affected we use the share of the area where the risk is present. Clearly, the instrument does not vary randomly over space [17]. As shown in the estimation IV-probit we also control for firm size and the individual industrial sector and the reduction of the probability of being informal is 52 per cent [18]. Table AI-col 1 in the Appendix shows the results of the first-stage estimation. The F-statistic for weak instruments is higher than 10, which implies that the instrument has a strong negative effect on agglomeration. We confirm that firm size is negatively related to the probability of being informal, as mentioned in the literature. In our results we obtain that the IV estimated coefficient is more negative than the OLS estimated coefficient. Hence, solving the endogeneity problem, allows to estimate a stronger relationship between agglomeration and informality. Table III shows that when formal agglomeration increases by one standard deviation the probability of being informal diminishes by 9.3 per cent [19], according to the probit model and 27.4 per cent [20] according to the IV-probit model. Table AI-col 2 in the Appendix shows the results of the first-stage estimation. These results allow us to conclude that the urban structure is determined by formal agglomeration and informal firms will occupy spaces that formal firms do not occupy. This means that informal firms have to make decisions in a different set of constraints, including those that link them to the formal sector, and supports the hypothesis that informal firms face restrictions that do not allow them to benefit from agglomeration externalities. These firms are marginalized from accessing the same set of external effects or participating in the same economic transactions as their formal counterparts (Moreno-Monroy, 2012). In short, formalization is fundamental to reap all the benefits associated with property rights.

Sensitivity analysis
We re-estimate the effect of agglomeration on the probability of being informal using different specifications. It is important to bear in mind that we have 338 neighbourhoods. There is extreme variation in the level of agglomeration (in log from 4.3 to 10.3). The result may be sensitive to extreme "outliers". We have estimations of the model excluding 20 neighbourhoods with the lower and higher agglomeration levels (the variation in log is now from 6.4 to 9.2). The results are similar: when the agglomeration variable increases by one standard deviation the probability of being informal diminishes by 7 per cent [21] in the probit model and 39 per cent [22] in the IV specification. Table IV shows descriptive statistics for the agglomeration variable without outliers, and Table V reports the model estimations for probit and IV-probit specification (first stage estimation of the IV model is reported in the Appendix: Table AI The validity of our instrument can be questioned as it is non-random over space. We therefore re-estimate the model for observations within 1 km from the liquefaction limit. The panel (c) in Figure 2 shows the liquefaction limit as the division between areas affected and not affected for the liquefaction risk. The control of 1 km is in order to include in the regression analysis similar neighbourhoods where the most important difference is the condition of liquefaction risk. The results are shown in Table VI. The estimated marginal effect from the probit model is À0.1103 and À0.4197 for the IV model. These results are in the same direction compared to marginal effects in Table II, -0.1051 and -0.6953, respectively (first-stage estimation of the IV model is reported in the Appendix: Table AI-col. 4.

Conclusion
Literature provides evidence about the relationship between spatial density and aggregate increasing returns. Moreover, the structure of a city is determined by a production Notes: Marginal effects at the means of the independent variables. Clustered standard errors by neighbourhood in parenthesis; *** p < 0.01 externality under which employment at any site is more productive when the employment at neighbouring sites is higher. As profit increases with productivity, firms should locate where their expected profit is highest. Nevertheless, when the property-rights system fails, the market cannot work efficiently, because the owner has de facto rights to their property but does not have a legal enforceable title. The literature does not say too much about the structure of cities where the percentage of the informal sector is significant. In particular, we analyse the case of a Colombian city where 42 per cent of firms are informal. We shed light on how formal firms tend to be localized in high density commercial areas, while informal firms are localized in low density and peripheral areas where the land is cheaper and where they can avoid the control of authorities.
We aim to estimate the effect of formal agglomeration on the probability of being informal, which could be interpreted as a local share of informal firms. The main result is that when agglomeration increases by one standard deviation the probability of being informal diminishes around 52 per cent. This may happen because informal firms have less opportunity to benefit from agglomeration effects because of legal restrictions that block their relationship with formal firms. The result explains why formal and informal firms display different locational patterns in the urban structure. We conduct an IV analysis to tackle the potential endogeneity problem. Notes: Marginal effects at the means of the independent variables. Clustered standard errors by neighbourhood in parenthesis; *** p < 0.01; ** p < 0.05; * p < 0.10 Notes: Marginal effects at the means of the independent variables. Clustered standard errors by neighbourhood in parenthesis; *** p < 0.01; ** p < 0.05; * p < 0.10 Agglomeration effects and informal firms Notes 1. In developing countries, however, migration from rural to urban areas has been associated with push rather than pull factors, because the population is expelled from rural areas rather than attracted to urban areas by the prospects of better living standards (Bairoch, 1988;Barrios et al., 2006).
2. As is well known, the literature focuses on technological spillovers, labour pooling and intermediate input linkages (Marshall, 1890;Ellison et al., 2010), and sharing, matching and learning effects, as in Duranton and Puga (2004).
3. There are different perspectives to analysing informality. Dualism assumes that informal firms do not have linkages with formal firms. Structuralism assumes that formal and informal firms are intrinsically linked (formal firms aim to reduce their input costs by promoting informal activities). Legalism focuses on the regulatory environment of the relationship between formal and informal firms (Chen, 2006;Perry et al., 2007). 5. The Alonso-Mills-Muth model has served as the most important base to analyse urban spatial structures but assumes employment clustering in a city centre (Alonso, 1969;Mills, 1967;Muth, 1969).
6. The agglomeration effect, z s , is calculated using the employment at neighbouring locations. The function is assumed linear and decays exponentially at a rate d with the distance from s.
7. The bid rent is defined as the rent per unit of land that a firm will be willing to pay.
8. Figure A1 in the Appendix shows the geographical location of the city.
9. At the moment of collecting information, business owners are informed that if they declare that the business fails to comply with legal regulations, they would not experience any negative legal consequences (the information provided is confidential). Informality is not persecuted, which leads business owners to provide truthful information.
10. Business informality is closely related to labour informality. Informal workers are characterized by lower levels of education and wages.
11. Fujita and Ogawa (1982) define the externality effect as a potential of employment.
12. Figure A2 in the Appendix shows the map for job and population density (per km 2 ) and Figure A3 shows cumulative population and employment (formal and informal) as a function of distance from the city centre in kilometres.
13. The parameter d can be estimated with non-linear regression estimation procedures, but this is cumbersome.
14. Figure A4 in the Appendix shows that the logarithm of agglomeration is approximately normally distributed.
16. It is one of the main rivers of Colombia measuring some 1,350 km in length.   Notes: Standard errors clustered by neighbourhood in parenthesis; *** p < 0.01. (þ) controlling the distance 1 km from liquefaction limit Agglomeration effects and informal firms