Abstract
Purpose
Failed deliveries (i.e. deliveries not accomplished due to the absence of customers) represent a critical issue in B2C (Businesstoconsumer) ecommerce lastmile deliveries, implying high costs for ecommerce players and negatively affecting customer satisfaction. A promising option to reduce them would be scheduling deliveries based on the probability to find customers at home. This work proposes a solution based on presence data (gathered through Internet of Things [IoT] devices) to organise the delivery tours, which aims to both minimise the travelled distance and maximise the probability to find customers at home.
Design/methodology/approach
The adopted methodology is a multimethod approach, based on interviews with practitioners. A model is developed and applied to Milan (Italy) to compare the performance of the proposed innovative solution with traditional home deliveries (both in terms of cost and delivery success rate).
Findings
The proposed solution implies a significant reduction of missed deliveries if compared to the traditional operating mode. Accordingly, even if allocating the customers to time windows based on their availability profiles (APs) entails an increase in the total travel time, the average delivery cost per parcel decreases.
Originality/value
On the academic side, this work proposes and evaluates an innovative lastmile delivery (LMD) solution that exploits new AI (Artificial Intelligence)based technological trends. On the managerial side, it proposes an efficient and effective novel option for scheduling lastmile deliveries based on the use of smart home devices, which has a significant impact in reducing costs and increasing the service level.
Keywords
Citation
Seghezzi, A. and Mangiaracina, R. (2023), "Smart home devices and B2C ecommerce: a way to reduce failed deliveries", Industrial Management & Data Systems, Vol. 123 No. 5, pp. 16241645. https://doi.org/10.1108/IMDS1020220651
Publisher
:Emerald Publishing Limited
Copyright © 2023, Arianna Seghezzi and Riccardo Mangiaracina
License
Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and noncommercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode
1. Introduction
B2C ecommerce is the online sale of products and services, directly to the final consumer. In spite of the burst of the “dot com bubble” in the early two thousands, many countries have firsthand experienced the dramatic rise of the electronic commerce since then. Online sales have been steadily growing during the last decade, and the number of online shoppers has been increasing in different industries. This widespread trend is expected to continue in the future, also due to the changing shopping behaviour of customers (Kandula et al., 2021).
Despite the intangible nature of online transactions, the management of logistics plays a crucial role in determining the success of companies selling products online (Mangiaracina et al., 2019). Moreover, the logistics service offered by etailers has emerged to be one of the key factors influencing the customers' decision to shop with them (Ma et al., 2022). Many works may be found in literature addressing the different logistics issues that B2C ecommerce opens for companies if compared to traditional commerce. Some authors focus on the design of the distribution network, to find the right number, type and location of infrastructures needed to deliver products to the final consumers (Arnold et al., 2018). Other works analyse instead the activities that are performed within the warehouses to deal with the fulfilment of B2C orders, e.g. picking (Hübner et al., 2016). Though, within the logistics field, it is the lastmile delivery (LMD) that has captured the most the attention of both academics (whose contributions on the topic have been flourishing) and practitioners (who have been striving to find strategies to efficiently and effectively manage it).
LMD represents the “last stretch” of the order fulfilment, aimed at delivering the products ordered online to the final consumers, either at their home or at a collection point (Lim et al., 2018). It has a significant impact on both efficiency – since it is very expensive – and effectiveness – since it constitutes the interface with the final customers, who directly perceive the associated service level performance (Pan et al., 2017). In all the major markets, the dominant B2C delivery mode is represented by the socalled “attended home delivery”, which requires the customers to be at home to collect the parcel and sign a delivery receipt before the courier leaves for the next destination (Han et al., 2017). This being the context, the eventual absence of the customer makes the couriers not able to accomplish the delivery. This phenomenon – referred to as “failed deliveries” – is addressed by both academic and managerial efforts, since it has strong negative effects on LMD performance. On the one hand, it implies high costs for ecommerce players, which need to reschedule the missed deliveries in subsequent tours; on the other hand, it significantly affects the satisfaction of customers, who are typically bothered not to receive their parcel. The failure rate of deliveries may reach 25% according to different authors (Edwards et al., 2010; Song et al., 2009; Van Duin et al., 2016). As a result, it often happens that parcels need to be moved two or even three times before being successfully delivered.
A possible way to reduce the occurrence of this problem could be scheduling the deliveries trying to maximise the probability to find the customers at home when parcels arrive, thus defining the delivery tours based on the probability profiles of the customers being at home. In order to build these probability profiles, data about their presence at home should be collected, aggregated and processed. A promising solution for gathering this type of data is represented by Internet of Things (IoT) smart home devices, whose diffusion has been significantly growing in recent years, also in less mature markets. The extensive use of these devices opens fruitful work area, since they allow for the development of innovative and sustainable logistics solutions in the urban freight logistics context (AlTurjman et al., 2022; Pan et al., 2021).
Driven by the continuous growth of ecommerce, the significance of the failed delivery issues and the ability of smart home objects to easily collect customer data, this research aims to exploit the potentialities of such devices in improving LMD performance (in terms of both efficiency and effectiveness). More specifically, it proposes a solution to schedule delivery tours based on customers' presence data (gathered through IoT devices), which aims to concurrently minimise the travelled distance and maximise the probability to find customers at home. A model is developed and applied to Milan (Italy), to compare the performance of the proposed innovative solution with traditional home deliveries (in terms of both costs and delivery success rate). On the academic side, this research proposes an innovative datadriven LMD solution that exploits new IoTbased technological trends and introduces advancements in literature concerning routing for B2C ecommerce. On the managerial side, this solution represents an efficient and effective novel option for scheduling lastmile deliveries relying on the use of smart home devices, which has a significant impact in reducing costs and increasing the service level.
The remainder of this paper is organised as follows: the second section presents the results of the literature review; the third section defines the research questions and the adopted methodologies; the fourth and fifth sections illustrate the model development and its application (including the sensitivity analysis); the sixth and last section summarises the conclusions stemming from the work.
2. Literature review
There is strong agreement in recognising how failed deliveries hugely impact both efficiency – increasing the operative costs faced by companies – and effectiveness – reducing the service level perceived by the customer (Wang et al., 2018) – of LMD. As a matter of fact, missed deliveries must be assigned to a subsequent delivery tour, and sometimes two or even three attempts may be necessary to succeed (Van Duin et al., 2016). Henceforth, scientific literature shows many attempts to develop strategies aimed at maximising the socalled “hit rate”, i.e. the rate of successful deliveries in LMD.
Multiple authors propose unattended delivery modes, which do not require the presence of the customer, such as parcel lockers (Tsai and Tiwasing, 2021), pickup points (Bjerkan et al., 2020) or the more innovative incar delivery (with parcels being delivered within the trunk of the customer's car while it is parked in public places) (Reyes et al., 2017). Nonetheless, home delivery still represents the preferred option for the majority of ecustomers, who typically see this service as one of the main advantages of online shopping (Kedia et al., 2017). Accordingly, both practitioners and academics are focussed on optimising such a solution. An emerging and promising trend in this direction is referred to as “datadriven lastmile delivery innovation”, and it applies data mining and data analytics tools to collect and process data aimed to improve the performance of LMD. Table 1 showcases the selected literature in this direction, classifying the major contribution along relevant axes, thus allowing to display the main differences compared to the present work.
The two earliest – and major – works are those by Pan et al. (2017) and Florio et al. (2018). Pan et al. (2017) propose exploiting data about electrical energy consumption to build customer home attendance profiles. More specifically, the model detects the presence or absence of the customer at home through a binary function, based on the combination of peaks or significant variations in electricity consumption. Though, the increasing diffusion of more innovative households, often integrated with automated devices, may make this option not very reliable. Moreover, the considered number of customers to be visited is limited (only 15). In the paper by Florio et al. (2018) customer home attendance is instead estimated based on historical data about past deliveries and preferential time windows. Both these methods introduce unreliability issues, since the time needed to gather the needed amount of data may be significant, and during such a long period, people may change their habits. Furthermore, cases with no information availability are associated with a 100% attendance probability, and this introduces biases in the results. Finally, the objective function only aims to maximise hit rate, while the travel time is introduced as a constraint, thus not considering the tradeoff between the two dimensions. In addition, both the contributions consider home deliveries only, without contemplating the – somewhere diffused – practice to ask for collection at a workplace or at addresses different from the customer's house.
This issue is overcome in the following work by Praet and Martens (2020), who instead propose analysing GPS (Global Positioning System) data about the position of the customer throughout the day, collected through their mobile. In this solution, all the potential positions where the customers spend more than 5 min are registered and used to predict future locations, with the ultimate goal to propose them a convenient locationtime combination for the subsequent deliveries. Therefore, the considered problem is different from the traditional home deliveries, since it requires onappointment deliveries, which drastically increase the LMD cost. As a matter of fact, when customers choose the preferred delivery time option, they influence the sequence of destinations to be reached in the tour and, as a consequence, companies are not able to optimise the delivery route (Boyer et al., 2009). Moreover, the paper simply introduces this locationprediction solution, but does not directly apply it to the routing, i.e. no VRP (vehicle routing problem) models are developed or applied. Finally, a weekly timehorizon is set to plan the schedule, which does not well suit most ecommerce deliveries: online players are increasingly moving towards superfast deliveries (usually no more than 72 h from the order), adhering to the standards launched by the topplayers in order to stay competitive on the market.
The research by Kandula et al. (2021) makes significant advancements in multiple directions if compared to the previous works, especially considering the complexity of the numerical experiments (i.e. from a computational viewpoint). The model is in fact applied to a vast number of customers and includes a wide variety of candidate delivery locations (not only customers' houses). Nonetheless, the data considered to estimate the presence probability are still aligned with less innovative and earlier works, since the proposed solution develops the VRP based on historical order delivery and location data. In addition, as recognised by the authors themselves, the obtained outcome results in an unequal distribution of orders among the different delivery agents, and this is an issue online players need to avoid in order to saturate – and thus reduce – the number of used vehicles (Reyes et al., 2017). Furthermore, according to the proposed model, the orders need to be assigned a priority scale a priori in order to allow the algorithm to schedule them. Also this procedure is not applicable to most of real deliveries, which are deemed to be “equally important” (Mangiaracina et al., 2019).
Making an additional step forward, the recent contribution by Chu et al. (2021) recommends collecting multiple data from multiple sources (including the distance among different customers, weather conditions, the season, the profile and expected behaviour of drivers, as well as realtime traffic data) to predict the travel time between two subsequent customers. While different cost components are included in the objective function (not limited to those associated with the travel time), the major discrepancy with the other works – including the present research – is that it focusses on ondemand food delivery (i.e. the delivery of freshly prepared meals from restaurants). The specificities of this platformbased industry generate a LMD problem that is completely different from the traditional one (pickup and delivery problem vs VRP). Among them, there are rigid precedence constraints to be respected for pickups and deliveries (i.e. the order pickup at the restaurant must happen before the delivery to the customer, so the restaurant needs to be reached before getting to the associated customer) (Yildiz and Savelsbergh, 2019). Moreover, the delivery lead times for ondemand food delivery are very short, as the meal must typically be delivered very quickly from the moment it has been cooked (often within 15 min). Henceforth, the solution and outcomes of this work are not applicable to traditional LMD.
In line with these premises, despite the advancements made by the presented papers in the datadriven LMD field, much remains to be done in this respect; accordingly, there are still some major gaps, which the present research aims to overcome along the identified directions. More in detail, there seems to be a lack of works concurrently:
proposing solutions that go well beyond the analysis of traditionally collectedandused data about past deliveries and envisaging more innovative datadriven options;
developing VRP variants building on these data;
applying the experiment to a high number of real customer positions, that are
not only referred to houses but also alternative locations such as the workplace and
which are not targeting a specific foodrelated sector, but may be applied to the generic B2C parcel delivery industry.
3. Objectives and methodologies
Based on the above, this work proposes a solution aimed at reducing missed deliveries through the analysis of the customer presence profiles, based on data collection performed by smart home devices (e.g. smart speakers), which are technically able to detect people presence at home. The main goals of the research are two: first, understanding how the probability distribution to find customers at home may be integrated into the VRP, overcoming the limits of the models currently proposed in literature and second, evaluating the effect on both effectiveness (hit rate) and efficiency (LMD cost) of implementing such a solution with respect to traditional VRPs. In other words, the following research questions are addressed:
How can customers' home attendance profiles be integrated into the VRP for LMD?
What is the impact of this innovative VRPAP on LMD performance?
In order to answer these research questions, two main steps were performed. First, an innovative VRPAP (vehicle routing problem with availability profile) – based on both the travelled distance and the expected probability to find the customer at home – was developed. It aims at maximising successful deliveries and computing the associated delivery cost and hit rate. Second, the developed model was applied to a realistic scenario in Milan area (Italy), where costs and hit rates are estimated both for the proposed innovative VRPAP and for the traditional VRP cases, to compare the performance of the two options. The used twostep methodology (development of analytical model and implementation to a realistic context) is widely adopted in literature dealing with LMD innovations (e.g. Qi et al., 2018). Similarly to the previous works in this direction (e.g. Pan et al., 2017), the application of the model does consider the optimal solutions for both the VRP and VRPAP (and no solving algorithms are applied).
The VRP formulated in this work may be considered as half way between a traditional VRP and a VRPTW: it schedules deliveries in specific timewindows, which are not imposed by the customers, but are found by the model itself based on the maximisation of the probability to find the customer at home in that period of time. Further details follow in the Model section. For literature about the VRPTW, the interested reader is referred to Baldacci et al. (2012) for a review of exact methods, to Bräysy and Gendreau (2005a) for route constructions and local search algorithms and to Bräysy and Gendreau (2005b) for metaheuristic methods.
Three additional methodologies were employed to support the development and application of the model:
Literature review, with a twofold objective. On the one side, to get a deep understanding about the LMD process, the associated failed delivery issue and the main solutions proposed and analysed by the academic community, thus grounding the research in the extant knowledge and setting the right research objectives. On the other side, to provide methodological support in the model development phase, since it allowed understating how the methodologies and models used in literature could be integrated into the present work (review about the VRP).
Interviews with practitioners operating in the business (ecommerce retailers and logistics service providers, such as express couriers), in three different moments. (i) First, during the model development: qualitative onetoone interviews were conducted to gain insights about the considered innovative solution, to build a solid base for the cost modelling phase. These interviews were semistructured, as they allow the rising of ideas and the identification of parameters and variables not previously recognised by the authors. (ii) Second, during the model application: structured interviews were performed to gather quantitative data to feed the model. These data collection interviews were supported by checklists reporting all the main variables and parameters for which numerical values were needed (Nutting et al., 2002). (iii) Third, to validate the results: once the results (both for the base case and the sensitivity analyses) were found, group interviews – in which all the practitioners discussed together guided by a moderator – allowed to both validate the outcomes and better read and interpret them. The group interview is more effective than single interviews as the participants' simultaneous interviewing allows to combine and stimulate their mutual contribution (Urciuoli and Hintsa, 2017).
Analysis of secondary sources (e.g. case studies performed by other researchers, websites of logistics service providers, journals for logistics practitioners, reports) to triangulate information coming from the literature and the interviews (Jick, 1979).
A consideration should be made about the nature of this work. Both the model itself and the application (based on actual optimisation and not on the use of solving algorithms) may be considered as not advanced if compared to other works addressing VRPs. This choice is grounded on both academic and managerial bases. On the academic side, the reason is twofold. First, this work is aligned with literature addressing the use of AI solutions to innovate LMD, whose analytical component is typically rather simple. As a matter of fact, the main aim of these works is to present a first evaluation of AI innovations for LMD and not to make an advancement in the operational research applied to distribution. An example is the work by Pan et al. (2017) – with respect to which the present research moves different steps forward – where the solution is found by optimally solving a VRP on only 15 customers. Second, as anticipated, the target academic audience of the model and of the subsequent analysis pertains to the operations management domain and not to the operational research one. Accordingly, the value of the research does not lie in the analytical component of the model itself, but in the idea behind it and in the conclusions and implications that may be drawn from the outcomes. Considering the managerial perspective, the work was developed in strict collaboration with (and is addressed to) practitioners from the sector, for whom a simple tool is more userfriendly and more easily understandable. As reported in section 6, this could be a starting point to be further developed.
4. Model development
In line with prior contributions addressing the implementation of datadriven tools for LMD, this work first briefly proposes a solution to collect and process data about the presence of customers (section 4.1) and then focusses on the definition of the new VRPAP to schedule deliveries based on those data and on the analysis of the performance of such a solution compared to the traditional VRP in terms of both delivery cost and hit rates (section 4.2).
4.1 The solution
The proposed IoTbased solution relies on five different ways through which smart home devices may detect the presence of customers at home:
Home Assistant interaction with customers or detection of any conversation in the house;
smart appliances interaction with customers or detection of customer presence (e.g. thermostat);
smartphone pairing with the Home Assistant via Bluetooth connection;
smartphone pairing with the Home Assistant via WiFi (“Wireless Fidelity”, i.e. wireless highspeed Internet) connection and
smartphone localisation detection via GPS (Global Positioning System).
These interactions can be monitored in a discrete way (suggested unit of analysis: one minute), and the customer home attendance is marked as positive if at least one of the five conditions is verified. Based on the collected data, customer home availability profiles (APs) may be built (associating to each moment of the day the probability of the customer being at home) and periodically updated. This process is shown in Figure 1. The APs of those customers to be visited in a delivery tour should be provided as an input to the VRP, to let the algorithm select the optimal sequence of deliveries (i.e. the sequence maximising the hit rate).
4.2 The model – VRPAP (vehicle routing problem with availability profile)
This work proposes a VRPAP model, to integrate the APs when organising the delivery tours. This model is composed by two substages (please refer to Figure 2).
First, the model performs a preallocation of customers to different timewindows (in which the day is divided) based on their APs, maximising the probabilities to find the customers at home.
Second, it finds the overall optimal sequence of customers to be visited during the different timewindows, thus defining the overall daily routing.
Details about both steps may be found in the following.
4.2.1 Step 1 – preallocation
The preallocation consists in creating clusters of customers to be served in specific timewindows: it aims to maximise the probability to find the customers at home in the timewindow in which the delivery is performed. The output of this stage is a set of NC clusters of customers, each of which associated with a specific timewindow when the deliveries have to be performed.
The choice to include the preallocation step has both academic and managerial reasons. On the academic side, the preallocation step is commonly applied in papers addressing datadriven last mile delivery innovations. In the recent – and representative – work by Kandula et al. (2021), the routing is preceded by a preallocation of customers to time windows, based on a defined delivery success probability threshold. From the managerial perspective, including this step was deemed valuable by the interviewed practitioners for a twofold reason: on the one hand, it dramatically reduces the required computational complexity, thus allowing for a much easier and faster realworld implementation. On the other hand, online players explicitly stated their strong interest in granting good effectiveness levels to the customers, thus suggesting the maximisation of the delivery success as a priority.
Two considerations about the way the preallocation is executed are needed.
First, about the total considered planning horizon. The horizon is 2 days with 8 working hours each (from 9 a.m. to 1 p.m. and from 2 p.m. to 6 p.m., considering a 1h break between 1 and 2 p.m.). This is in line with fast ecommerce deliveries, which are typically accomplished within two days from the moment the order is issued.
Second, about the proxy selected to describe the presence of customers within a time window, starting from the probability values associated with the single time interval τ. The considered alternatives were three:
The average among all the probability values associated with the different intervals within the time window; it could result in a biased measure in case of (positive or negative) peaks within the time window.
The maximum among those probability values; it would overestimate the probability.
The maximum among the average probability values, each computed on a specific number (Tb) of time intervals included in the time window c. For instance, if Tb is 5, the average probability associated with a time interval τ is computed considering not only the probability value at τ, but also the probabilities associated with the two previous and the two subsequent moments (from τ−2 to τ+2). This approach is the most able to detect longer intervals of high probability. Details follow in the next section.
4.2.1.1 Main variables and parameters
The used indexes – which are all discrete integer values – are the following: i departure node; j arrival node; τ time bucket; t time; c cluster;
4.2.1.2 Objective function
The objective function is the following:
The objective is to create cluster which have the maximum probabilities to meet the customer. Thus, the sum of the proxy of probabilities per each customer considered in the assigned time window (computed as anticipated) has to be optimised.
4.2.1.3 Constraints
Different constraints are set.
Equation (1) ensures that each customer is visited once. Equation (2) forces each cluster c to contain exactly NC customers. The set of equations (3) ensures that each customer is visited within two days from the order.
4.2.2 Step 2 – routing
The routing step consists in the definition of the route to be followed during the delivery tour, i.e. the sequence of customers to be visited. As for step 1, the routing phase is composed by two substages. More in detail: (1) first, the clusters of customers associated with the different timewindows – i.e. the output of the preallocation phase – are separately considered, and the optimal subrouting is defined for each of them. (2) Second, the different subroutings are combined to define the overall delivery tour.
The subroutings are the output of the developed core model, which solves a minimisation problem on a multiobjective function that includes both the maximisation of the probability to find the customer and the minimisation of the travel time, under specific constraints. Details about the main variables and parameters, the objective function and the constraints are included in the following.
4.2.2.1 Main variables and parameters
The main variables and parameters of the core model are shown in Table 3.
4.2.2.2 Objective function
The objective function is the following:
The first element of formula (1) computes the overall variable cost of the delivery tour for each used van k. The overall variable cost depending on the travel time is multiplied by (1 + φ_{jTw}); this expression works as a penalty that deters the scheduling of deliveries in moments in which they would most likely fail. The higher the failed delivery probability, the higher the associated distance to be travelled in order to successfully deliver the order to the ith customer. The second element (2) represents the fixed cost for each activated vehicle.
4.2.2.3 Constraints
Different constraints are set.
Equation (4) ensures that each node has at most one outgoing arch activated by just one vehicle. This consideration is valid for each node except for the first one (i.e. the depot), which has as many outgoing and ingoing active links as many vans are used. Equation (5) ensures that, if a node has an entrant arch, it also has an outgoing one (i.e. if the van visits one customer, it also has to start again and move from that customer to the next one). According to equation (6), if a node does not have an entrant arch, it must not have an outgoing arch. This rule is valid for every node, including the depot. In fact, if in the morning the van leaves the warehouse, at the end of the tour it must come back to the depot. Equation (7) excludes the possibility of creating loop connections: an arch that has the same node as incoming and outgoing node cannot exist. Equation (8) ensures that each van, if activated, has a connection from the first node to a customer. If instead the vehicle is not active, the connection cannot exist. On the other side, equation (9) ensures that each van, if activated, must have a connection between a customer and the break node (j = NN), meaning that all the vans have to go back to the depot. Also in this case, if the vehicle is not active, the connection cannot exist. By equation (10) each node – beside the depot and the break ones – is forced to have one and only one ingoing arch activated by just one vehicle. It means that each customer order has to be visited. Equation (11) links the arch variables to the activation of vans. A van is active only if it travels at least one arch. Equation (12) introduces capacity constraints, stating that the load of each vehicle cannot exceed the maximum allowed value. Equations (13) introduce time constraints, since they ensure that each order is delivered within two days. This consideration is valid for each node except for the depot (j = 1) and the break node (i = NN). Equation (14) sets the precedence rules among orders (each order has to be delivered exactly after the previous one), taking into account both the time needed to perform the delivery (Service Time) and the time needed to travel from the previous customer to the next one. Equation (15) bounds the overall time needed to perform all the deliveries up to the total duration of the time slot. Finally, equations (16), (17) and (18) link the probabilities associated with a time window
Once the subroutings that minimise the cost function have been defined for all the clusters of customers, the overall daily routing is derived based on the combination of the subroutings in subsequent timewindows (the ending point – i.e. the last customer to be visited – for cluster C_{i} is set as the departing point for cluster C_{i+1}).
After the LMD problem has been solved, the associated performance may be computed, both in terms of hit rate and cost (to make the comparison between the VRPAP and the traditional VRP, i.e. to answer RQ2). The cost per delivery is estimated dividing the overall cost (fixed cost of the van + variable cost depending on the travelled time) by the number of successful deliveries. The reason behind the choice of these two performance measures is twofold. On the one hand, they are widely used in literature addressing LMD solutions aimed to reduce failed deliveries, especially if considering the earliest – and seminal – papers in the field (for instance Pan et al. (2017)). Considering the delivery costs, both academics and practitioners in the LMD domain recognise how they are the major performance logisticians are interested in when evaluating an innovative solution. As a matter of fact, online players usually consider service level targets as constraints they necessarily have to meet to stay competitive on the market, and they adhere to the standards launched by the topplayers (e.g. delivery time lower than or equal to 72 h). As a result, they are “pushed (…) to cut their operational costs to the minimum” (Arnold et al., 2018). Considering the hit rate, since the major goal of the proposed solution is to reduce failed deliveries (that in other words is increasing the hit rate), it results to be a complementary useful measure to have a complete view of the overall performance. Furthermore, as Mangiaracina et al. (2019) state in their review of innovative LMD solutions aimed to reduce costs, the probability of occurrence of failed deliveries is one of the main driver of the delivery cost.
5. Model application
5.1 Base case application
The main goal of the model application is to evaluate the effect of the proposed VRPAP on LMD performance, in terms of both efficiency (delivery cost) and effectiveness (hit rate) and to compare it with a traditional VRP (only based on distances). The model has been applied to a realistic context in Milan (Italy). The context of application has been defined based on the following assumptions and data (which – as anticipated in section 3 – were mainly derived from interviews to logistics service providers, combined with the analysis of literature and secondary sources).
The 2day (9 a.m.–1 p.m.; 2 p.m.–6 p.m.) is divided in 4 2h slots (corresponding to the clusters for the preallocation), associated with 16 customers each. The time bucket considered for the optimisation (and thus for the computation of the customers' attendance probability), i.e. Tw, is 30 min (thus having a daily total number of buckets τ_{MAX} = 98), while the number of consecutive probability value to compute the average Tb = 7.
The delivery area is a 5 km^{2} area in Milan (Italy), in which the total number of customers to be visited in two days is 128. These destinations are served by one transit point, located outside the delivery area –3.3 km far from the perimeter, in correspondence of a real ecommerce depot. The distances between the depot and the customers and those among the different customers, needed to build the travel time matrix DistDc_{ij}, are the real ones, estimated thanks to integration of Google Maps API (differently from different previous works in the field, e.g. Pan et al., 2017).
Customers, who are associated with specific APs/presence probability values, have been clustered according to three main AP profile: (α) people receiving products in a place where the delivery is almost always successful, e.g. house with concierge or offices (average attendance probability 99%); (β) people who issue orders based on when they are expected – barring unforeseen circumstances – to be at home to collect the parcel (average attendance probability 83%); (γ) people who place orders independently of the probability of incurring in failed deliveries (average attendance probability 53%). The split of the customers into these three groups was the following: 50% α, 30% β, 20% γ. Example of APs of customers belonging to the three classes for two days are reported in Figure 3.
The performance and characteristics of the van are the following: fuel consumption 7 litres/100 km; fuel cost 1.52 €/litre; average speed 23.6 km/h; fixed daily “activation” cost per van 150€ (thus resulting in a variable 0.05€ cost per minute).
Based on these data assumptions, the LMD problem has been solved both through a traditional VRP (only aimed at minimising the travelling distances/time) and through the innovative VRPAP (considering both the minimisation of distances and the maximisation of the probability to find the customer at home). In both cases, as previously stated, the optimal solution was found.
Table 4 shows the results of the application to the base case scenario, which leads to two main considerations. First, the total travelled time per tour is higher in the innovative VRPAP case compared to the traditional VRP. In fact, the objective function of the traditional VRP minimises the overall travelled distance – and consequently the associated time – for a specific delivery tour. The VRPAP combines instead the distance minimisation with the hit rate maximisation. Accordingly, if a customer is associated with a very low probability to be at home in a specific moment of the day, the delivery is moved to a previous/subsequent timewindow (where this probability is higher), even if it results in a higher travelled distance. Second, the innovative VRPAP allows to significantly reduce missed deliveries compared to the traditional operating mode (97.9% successful deliveries vs 82%), thus dramatically improving effectiveness performance.
The positive effect stemming from the reduction of missed deliveries overcomes the disadvantage linked to the higher travelled time and implementing the VRPAP thus results to be beneficial. As a matter of fact, the delivery cost per parcel is lower in the innovative VRPAP case with respect to the traditional VRP (about −16%).
5.2 Sensitivity analyses
After the application to the base case, in order to evaluate the robustness of the outcomes, additional sensitivity analyses were performed in two main directions.
First, two further criteria to preallocate customers to timewindows were tested, in order to compare the stemming performance. More in detail – differently from the base case in which customers are allocated to clusters/timewindows only based on their probability to be at home (i.e. policy A, probabilitybased) – the two tested criteria integrate the evaluation of the probability to be at home with that of the travelled distance even during the preallocation phase. Both the criteria rely on two substeps. Policy B (distance first, probability second) first creates clusters of customers in order to minimise the overall distance and then creates subgroups of NC nodes each, so that their attendance probability is maximised. The first clustering process defines a number of G groups, and the second one splits each group in lists of nodes to be visited in the same timewindow. Policy C (probability first, distance second) is very similar to policy B, but what changes is the sequence of the two optimisation processes. First, customers are grouped in order to maximise their attendance probability; second, subgroups made by NC nodes are created, in which the distance is minimised. This analysis was also useful to evaluate whether the choice to preallocate customers based on the expected hit rate implies great disadvantages when considering the distance among those who do not live near each other.
Second, the percentages of customers belonging to the different clusters were varied, in order to analyse how different distributions of customer types impact the performance of the developed solution. 6 additional scenarios were considered with respect to the base case – which is referred to as scenario 0: (scenario 1: α 45%, β 30%, γ 25%; scenario 2: α 45%, β35%, γ 20%; scenario 3: α 50%, β 40%, γ 10%; scenario 4: α 60%, β 25%, γ 15%; scenario 5: α 60%, β 30%, γ 10%; scenario 6: α 70%, β 20%, γ 10%).
The sensitivity analysis was performed in a combined manner, i.e. all the customer attendance probability scenarios have been tested on all the three preallocation policies.
Figure 4 and Figure 5 show the trends of the three different preallocation policies in terms of average hit rate and average delivery cost per parcel respectively, along all the explored customer distribution scenarios. These results allow to draw different conclusions.
First, policy B (distance first, probability second) is the one associated with the best performance in terms of delivery cost. As a matter of fact, on the one hand, it allows to reduce travel time with respect to policy A, since policy A only creates clusters based on the probability to find the customers at home. Accordingly, two very close customers with different APs would be allocated to different timewindows, thus forcing the driver to travel to the same area in two different moments. On the other hand, the probability to incur in failed deliveries is lower if compared to policy C, which favours the minimisation of distances rather than the maximisation of attendance probability. Still, with all the three preallocation policies VRPAP performs better than traditional VRP, in terms of delivery cost per parcel. The outcomes of the analysis, especially for Policy B (distance first, probability second) also proved that the preallocation step as selected in the developed model is effective, and the disadvantages of preallocating customers based on the expected hit rate does not imply great disadvantages also in case of customers who do not live near each other. The interpretation of such outcome in the group interviews revealed that the negative effects of the higher travel time is somehow mitigated also due to the fact that in generic ecommerce deliveries the density is typically so high that vans end up to be assigned delivery areas that are not too vast.
Second, independently of the specific preallocation policy and from the customer base, the VRPAP outperforms the traditional VRP. The average hit rate is always higher, and the saving associated with the delivery is always positive. Second, the relative performance of the three preallocation policies is not significantly affected by the distribution of customers in the three attendance clusters. Besides small differences, policy A and policy B present similar results (in terms of both average hit rate and delivery cost), and they both perform better than policy C, independently of the considered scenario. The reason behind this lies in the preallocation logic of policy C: the second phase, which in the very end creates the clusters of customers starting from the first grouping step, is based on the minimisation of travelled distances.
Third, the lowest delivery cost for the VRPAP corresponds to scenarios 6, 5 and 3. Though, the most significant savings with respect to traditional VRP are found in scenarios 0,1 and 2. In fact, in these three cases there is the lowest number of class α customers (i.e. those with the lowest probability of failed deliveries). These are the customers for which the traditional VRP is already associated with “good” hit rates; therefore, the VRPAP – which mainly acts on this parameter – has lower room for improvement. As expected, the benefits of the VRPAP increase in contexts for which the probability of a delivery to fail is high.
Besides the specific considerations about the differences of the three policies and of the various analysed scenarios, the sensitivity analysis shows how, independently of the selected preallocation policy and from the distribution of customers in the different clusters, the innovative VRPAP performs better than the traditional one. This is true in terms of both hit rate (and thus of effectiveness improvements) and delivery cost per parcel (efficiency improvement).
6. Conclusions
LMD for B2C ecommerce is one of the most expensive processes within the whole supply chain. Among the main issues, a very significant one is represented by the socalled “failed deliveries”, i.e. the deliveries not accomplished due to the absence of the customers. They imply both high costs – since missed deliveries need to be rescheduled – and a decrease in service level – since customers are bothered if they do not receive their parcels. As a result, both ecommerce retailers and logistics service providers have been striving to find ways to increase the rate of successful deliveries, in order to improve both efficiency and effectiveness.
This paper has reached the set objectives, while answering the formulated research questions. It proposes and evaluates the performance of an innovative solution that collects data about customers' presence at home, to integrate them in scheduling lastmile deliveries. An innovative VRPAP was designed that first clusters customers based on the probability to find them at home and then defines the optimal subroutes – i.e. the sequences of customers to be visited – for each cluster. Based on these results, the model then finds the overall optimal routing to serve the whole set of customers (RQ1). The application of the model to realistic cases in Milan (Italy) shows that the proposed solution implies a significant reduction of missed deliveries with respect to the traditional operating mode, in which the probability of finding the customer at home is not considered while scheduling the deliveries, and that it allows reducing the average delivery cost per parcel (RQ2). Besides the base case application, sensitivity analyses were performed on two significant elements. First, two additional preallocation policies (combining the maximisation of the hit rate and the minimisation of travelled distance) were evaluated. Second, different distributions of customers associated with different APs were analysed. Both these analyses show that the proposed VRPAP performs better than the traditional one, in all the considered scenarios.
This work has both academic and managerial implications. On the academic side, it contributes to the literature developing an innovative probabilitybased VRP that, differently from other existing works, exploits new technological trends (i.e. the diffusion of smart home devices) and that overcomes some limits of prior papers in this direction (e.g. higher number of customers to be visited, real distances among the locations, integration of probability to be at home and travelled distance when defining the sequence of destinations to be visited). On the managerial side, it proposes a novel solution for scheduling B2C lastmile deliveries with a significant impact in both reducing operating costs and increasing the service level.
This work has some limits, which could be overcome through further developments. First, the model was applied finding the optimal solution for both the VRP and the VRPAP, and no solving algorithms were applied. As a result, the solving time for the VRP case was very significant. It could be interesting to employ some commonlyused VRP solving algorithms to perform a higher number of simulations. Second, the clusters in the preallocation step are created aiming to maximise the probability to find the customers at home in the delivery timewindow, thus mainly focussing on the delivery density. Further works could apply hierarchical clustering methods (e.g. Ward's method), in order to implement a multicriteria approach that would allow to concurrently consider multiple objectives. Third, the use of a vast amount of data about the presence of customer at home (collected through smart home devices) is not easy to be achieved and could also raise concerns related to privacy issues. On the one hand, this suggests reading the proposed urban delivery solutions in the light of the smart cities paradigm, where different data sources should be available for a responsible citywide resource planning, based on the contribution all the stakeholders (Pan et al., 2021). A conscious and informed decisionmaking process should be made by all the parties involved, especially municipalities, in promoting efficient, effective and sustainable city logistics solutions while granting customers – and thus citizens – adequate privacy levels (AlTurjman et al., 2022). On the other hand, the data collection and processing solution should be analysed and designed in order to be compliant with the active regulations in terms of privacy and data protection, based the specific country of interest (e.g. the General Data Protection Regulation in the European Union). This would include basic principles tied to transparency (clear statement of the objective), correctness (in terms of both rightness of the procedures and accuracy of the data) and privacy (concerning confidentiality), to grant the rights of the customers. Such suggestion is aligned with what recommended as further development also for other works proposing datadriven solutions in the logistics field (see for instance Konstantakopoulos et al. (2021)).
Figures
Selected literature on datadriven lastmile delivery innovative solutions
Paper  Data considered  Objective  Number of customers in experiment  Candidate delivery locations  Industry  Main – additional – limits  

Databased routing generation  VRP objective function  
Pan et al. (2017)  Electrical consumption  Yes  Max (hit rate) Constraint (travel time)  15 (general coordinates)  Home delivery  Grocery 

Florio et al. (2018)  Past deliveries and preferential time windows  Yes  Max (hit rate) Constraint (travel time)  100 (general coordinates)  Home delivery  General 

Praet and Martens (2020)  Mobile GPS data of customer position  No  No VRP, measure of accuracy of predicted locations  30 (not reported)  Every position where the customer stays for longer than 5 min  General 

Kandula et al. (2021)  Historical order delivery data and location data  Yes  Min (total travel time) Constraint (hit ratebased time windows)  >2000 (real position)  Multiple (including stores)  General 

Chu et al. (2021)  Multiple data (distance, weather, season, driver's profile, and realtime traffic data from mobile applications)  Yes  Min (total travel cost + operating cost)  15 (not reported)  Home delivery  Ondemand food delivery 

Present research  Multiple smarthome devices collected data  Yes  Min (cost) AND Max (hit rate)  128 (real position)  Home delivery + Delivery at workplace  General  To be discussed 
Source(s): Author's work
Main variables and parameters – preallocation
Main parameters  
NN  Overall number of customers to be visited 
NC  Number of customers to be visited in one timewindow 
NV  Number of available vehicles 
τ_{MAX}  Number of time buckets in one day 
Tb  Number of time buckets per each time intervals on which the average probability is computed 
C  Maximum number of clusters considered for the planning horizon 
D  Maximum number of days considered in the probability grid 
DD  Duration of the day 
Number of time intervals considered as proxy of the maximum probability in a time window  
Average presence probability of the ith customer in time window Tw  
Array (  
Array (  
to_{j}  Order time of jth customer 
Matrix (NT,  
Main variables  
Z_{i,c}  Matrix (NT,C) of Boolean variables. If x_{i,c} = 1 the ith customer is grouped in the cluster c; otherwise, Z_{i,c} = 0 denotes the ith customer is not grouped in the cluster c 
Source(s): Author's work
Main variables and parameters – core model
Main parameters  
Q  Capacity of the vehicles 
ST  Service Time, i.e. time required to park the vehicle, reach the customer's home, ring the bell, wait for the client, deliver the parcel, come back to the vehicle 
T_{w}  Unit of time considered for the optimisation (i.e. unit of time in which the daily delivery time is divided, and for which the probability of the customers to be at home are estimated) 
T_{wMAX}  Number of time intervals considered as proxy of the attendance probability in a time interval per each cluster 
Array (Tw_{MAX}) reporting the lower bound of the Tw time interval  
Array (Tw_{MAX}) reporting the upper bound of the Tw time interval  
T_{ijk}  Matrix (NN, NN, NV) of integer variables indicating the time at which vehicle k has accomplished the delivery to node j (which is reached after having left node i); it considers both travel time and service time 
DistD_{cij}  Travel time between each couple of ij customers. Since the routing is defined for each cluster, this travel times are reported into matrixes that are different for each cluster (i.e. there is one matrix (NN, NN) for each timewindow, associated to the customers to be visited in that timewindow) 
φ_{jTw}  Estimated probability of unsuccessful home delivery for the ith customer visited during T_{w} (i.e. estimated probability that the delivery has to be rescheduled in a subsequent delivery tour) 
q_{j}  Volume of products ordered by customer j 
Tinf_{j}, Tsup_{j}  Lower and upper boundaries of the delivery time window associated to customer j 
Cve  “Activation” cost per van 
Cvar  Variable cost per minute for each travelling vehicle k (including both fuel and the driver) 
Main variables  
x_{ijk}  Matrix (NN, NN, NV) of Boolean variables indicating whether an arch connecting customer i and customer j is travelled by vehicle k (i.e. if the vehicle moves from the ith customer to the jth customer) 
δ_{1,ijTwk}  Matrix (NN,NN,T_{wMAX}) of Boolean variables. The case δ_{1ijTw} = 1 means that the arch (i,j) is traversed in t < b_{Tw}. This is a necessary condition to guarantee that t < T_{w}. Otherwise, x_{ijk} = 0 denotes the arch is not traversed in T_{w} 
δ_{2,ijTwk}  Matrix (NN,NN,T_{wMAX}) of Boolean variables. The case δ_{2ijTw} = 1 means that the arch (i,j) is traversed in t > a_{τ}. This is a necessary condition to guarantee that t > τ. Otherwise, x_{ijk} = 0 denotes the arch is not traversed in T_{w} 
δ_{3,ijTwk}  Matrix (NN,NN,T_{wMAX}) of Boolean variables. The case δ_{3ijTw} = 1 means that both the Boolean variables δ_{2ijTw} and δ_{3ijTw} are activated, i.e. that the arch ij (connecting two subsequent customer locations i and j) is travelled by van k within the time interval T_{w} 
g_{k}  Array (NV) of k boolean variables, to the value 1 is assigned in case van k is used during the tour, 0 otherwise 
Source(s): Author's work
Results of the model application
Traditional VRP  Innovative VRPAP  

Total travelled time [min]  408  446 
Average hit rate [%]  82  97.9 
Delivery cost [€/parcel]  3.05  2.57 
Source(s): Author's work
References
Al‐Turjman, F., Zahmatkesh, H. and Shahroze, R. (2022), “An overview of security and privacy in smart cities' IoT communications”, Transactions on Emerging Telecommunications Technologies, Vol. 33 No. 3, p. 3677.
Arnold, F., Cardenas, I., Sörensen, K. and Dewulf, W. (2018), “Simulation of B2C ecommerce distribution in Antwerp using cargo bikes and delivery points”, European Transport Research Review, Vol. 10 No. 1, pp. 113.
Baldacci, R., Mingozzi, A. and Roberti, R. (2012), “Recent exact algorithms for solving the vehicle routing problem under capacity and time window constraints”, European Journal of Operational Research, Vol. 218 No. 1, pp. 16.
Bjerkan, K.Y., Bjørgen, A. and Hjelkrem, O.A. (2020), “Ecommerce and prevalence of last mile practices”, Transportation Research Procedia, Vol. 46, pp. 293300.
Boyer, K.K., Prud’homme, A.M. and Chung, W. (2009), “The last mile challenge: evaluating the effects of customer density and delivery window patterns”, Journal of Business Logistics, Vol. 30 No. 1, pp. 185201.
Bräysy, O. and Gendreau, M. (2005a), “Vehicle routing problem with time windows, Part I: route construction and local search algorithms”, Transportation Science, Vol. 39 No. 1, pp. 104118.
Bräysy, O. and Gendreau, M. (2005b), “Vehicle routing problem with time windows, Part II: Metaheuristics”, Transportation Science, Vol. 39 No. 1, pp. 119139.
Chu, H., Zhang, W., Bai, P. and Chen, Y. (2021), “Datadriven optimization for lastmile delivery”, Complex and Intelligent Systems, pp. 114.
Edwards, J., McKinnon, A., Cherrett, T., McLeod, F. and Song, L. (2010), “Carbon dioxide benefits of using collectiondelivery points for failed home deliveries”, Transportation Research Record, Vol. 10 No. 1901, pp. 136143.
Florio, A.M., Feillet, D. and Hartl, R.F. (2018), “The delivery problem: optimizing hit rates in ecommerce deliveries”, Transportation Research Part B: Methodological, Vol. 117, pp. 455472.
Han, S., Zhao, L., Chen, K., Luo, Z.W. and Mishra, D. (2017), “Appointment scheduling and routing optimization of attended home delivery system with random customer behaviour”, European Journal of Operational Research, Vol. 262 No. 3, pp. 966980.
Hübner, A., Kuhn, H. and Wollenburg, J. (2016), “Last mile fulfilment and distribution in omnichannel grocery retailing a strategic planning framework”, International Journal of Retail and Distribution Management, Vol. 44 No. 3, pp. 228247.
Jick, T.D. (1979), “Mixing qualitative and quantitative methods: triangulation in action”, Administrative Science Quarterly, Vol. 24 No. 4, pp. 602611.
Kandula, S., Krishnamoorthy, S. and Roy, D. (2021), “A prescriptive analytics framework for efficient Ecommerce order delivery”, Decision Support Systems, Vol. 147, 113584.
Kedia, A., Kusumastuti, D. and Nicholson, A. (2017), “Acceptability of collection and delivery points from consumers' perspective: a qualitative case study of Christchurch city”, Case Studies on Transport Policy, Vol. 5 No. 4, pp. 587595.
Konstantakopoulos, G.D., Gayialis, S.P., Kechagias, E.P., Papadopoulos, G.A. and Tatsiopoulos, I.P. (2021), “An algorithmic approach for sustainable and collaborative logistics: a case study in Greece”, International Journal of Information Management Data Insights, Vol. 1 No. 1, 100010.
Lim, S.F.W., Jin, X. and Srai, J.S. (2018), “Consumerdriven ecommerce: a literature review, design framework, and research agenda on lastmile logistics models”, International Journal of Physical Distribution and Logistics Management, Vol. 48 No. 3, pp. 308332.
Ma, B., Teo, C.C. and Wong, Y.D. (2022), “Consumers' preference for urban lastmile delivery: effects of value perception and longterm COVIDinitiated contextual shifts”, International Journal of Logistics Research and Applications, pp. 122.
Mangiaracina, R., Perego, A., Seghezzi, A. and Tumino, A. (2019), “Innovative solutions to increase lastmile delivery efficiency in B2C ecommerce: a literature review”, International Journal of Physical Distribution and Logistics Management, Vol. 49 No. 9, pp. 901920.
Nutting, P.A., Rost, K., Dickinson, M., Werner, J.J., Dickinson, P., Smith, J.L. and Gallovic, B. (2002), “Barriers to initiating depression treatment in primary care practice”, Journal of General Internal Medicine, Vol. 17 No. 2, pp. 103111.
Pan, S., Giannikas, V., Han, Y., GroverSilva, E. and Qiao, B. (2017), “Using customerrelated data to enhance egrocery home delivery”, Industrial Management and Data Systems, Vol. 117 No. 9, pp. 19171933.
Pan, S., Zhou, W., Piramuthu, S., Giannikas, V. and Chen, C. (2021), “Smart city for sustainable urban freight logistics”, International Journal of Production Research, Vol. 59 No. 7, pp. 20792089.
Praet, S. and Martens, D. (2020), “Efficient parcel delivery by predicting customers' locations”, Decision Sciences, Vol. 51 No. 5, pp. 12021231.
Qi, W., Li, L., Liu, S. and Shen, Z.J.M. (2018), “Shared mobility for lastmile delivery: design, operational prescriptions, and environmental impact”, Manufacturing and Service Operations Management, Vol. 20 No. 4, pp. 737751.
Reyes, D., Savelsbergh, M. and Toriello, A. (2017), “Vehicle routing with roaming delivery locations”, Transportation Research Part C: Emerging Technologies, Vol. 80, pp. 7191.
Song, L., Cherrett, T., McLeod, F. and Guan, W. (2009), “Addressing the last mile problem: transport impacts of collection and delivery points”, Transportation Research Record, Vol. 2097 No. 1, pp. 918.
Tsai, Y.T. and Tiwasing, P. (2021), “Customers' intention to adopt smart lockers in lastmile delivery service: a multitheory perspective”, Journal of Retailing and Consumer Services, Vol. 61, 102514.
Urciuoli, L. and Hintsa, J. (2017), “Adapting supply chain management strategies to security–an analysis of existing gaps and recommendations for improvement”, International Journal of Logistics Research and Applications, Vol. 20 No. 3, pp. 276295.
Van Duin, J.H.R., De Goffau, W., Wiegmans, B., Tavasszy, L.A. and Saes, M. (2016), “Improving home delivery efficiency by using principles of address intelligence for B2C deliveries”, Transportation Research Procedia, Vol. 12, pp. 1425.
Wang, X., Yuen, K.F., Wong, Y.D. and Teo, C.C. (2018), “An innovation diffusion perspective of econsumers’ initial adoption of selfcollection service via automated parcel station”, The International Journal of Logistics Management, Vol. 29 No. 1, pp. 237260.
Yildiz, B. and Savelsbergh, M. (2019), “Service and capacity planning in crowdsourced delivery”, Transportation Research Part C: Emerging Technologies, Vol. 100, pp. 177199.