A Comparative Study of Reinforcement Learning and Metaheuristic Algorithms for Real-Time Last-Mile Delivery Scheduling

Main Article Content

Jialu Wang

Abstract

The exponential growth of e-commerce has intensified the need for efficient real-time delivery-scheduling algorithms that can handle dynamic urban logistics environments. This study presents a comprehensive comparative analysis of reinforcement learning and metaheuristic algorithms for last-mile delivery optimization under time-sensitive constraints. We implement and evaluate Adaptive Large Neighborhood Search (ALNS) and Deep Q-Network (DQN) approaches on benchmark instances representing realistic urban delivery scenarios with 50-200 customer nodes. The experimental results demonstrate that ALNS achieves superior solution quality with an average optimality gap of 3.2% under an approximately 5-second (≤6 s) operational time budget, while DQN exhibits better runtime scalability for networks exceeding 150 customers (in the stress-test setting). The sensitivity analysis reveals critical trade-offs between computational efficiency and solution robustness under varying traffic conditions. This research provides empirical guidelines for logistics practitioners to select appropriate algorithms based on operational constraints, thereby contributing to sustainable urban transportation systems.

Article Details

Section

Articles

How to Cite

A Comparative Study of Reinforcement Learning and Metaheuristic Algorithms for Real-Time Last-Mile Delivery Scheduling. (2026). Journal of Science, Innovation & Social Impact, 2(1), 264-275. https://sagespress.com/index.php/JSISI/article/view/106

References

1. X. Liu, Y. L. Chen, L. Y. Por, and C. S. Ku, "A systematic literature review of vehicle routing problems with time windows," Sustainability, vol. 15, no. 15, p. 12004, 2023.

2. J. F. Sze, S. Salhi, and N. Wassan, "An adaptive variable neighbourhood search approach for the dynamic vehicle routing problem," Computers & Operations Research, vol. 164, p. 106531, 2024. doi: 10.1016/j.cor.2024.106531

3. B. Lin, B. Ghaddar, and J. Nathwani, "Deep reinforcement learning for the electric vehicle routing problem with time windows," IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 11528-11538, 2022. doi: 10.1109/tits.2021.3105232

4. A. M. Silva, and J. P. Pedroso, "Deep reinforcement learning for stochastic last-mile delivery with crowdshipping," EURO Journal on Transportation and Logistics, vol. 12, p. 100102, 2023.

5. R. Bai, X. Chen, Z. L. Chen, T. Cui, S. Gong, W. He, X. Jiang, H. Jin, M. Jin, G. Kendall, J. Li, Z. Lu, J. Ren, P. Weng, N. Xue, and H. Zhang, "Analytics and machine learning in vehicle routing research," International Journal of Production Research, vol. 61, no. 1, pp. 4-30, 2023.

6. J. F. Chen, L. Wang, Y. Liang, X. Xu, Y. Li, W. Wang, and S. Yang, "Order dispatching via GNN-based optimization algorithm for on-demand food delivery," IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 10, pp. 13847-13859, 2024.

7. Y. Zhang, Y. Mao, M. Qi, and J. Guo, "An adaptive large neighborhood search for the multi-depot dynamic vehicle routing problem with time windows," Computers & Industrial Engineering, vol. 191, p. 110122, 2024.

8. L. Baty, K. Jungel, P. S. Klein, A. Parmentier, and M. Schiffer, "Combinatorial optimization-enriched machine learning to solve the dynamic vehicle routing problem with time windows," Transportation Science, vol. 58, no. 4, pp. 708-725, 2024. doi: 10.1287/trsc.2023.0107

9. H. Wang, S. Wang, Y. Yang, and D. Zhang, "GCRL: Efficient delivery area assignment for last-mile logistics with group-based cooperative reinforcement learning," 2023 IEEE 39th International Conference on Data Engineering (ICDE), pp. 3522-3534, 2023. doi: 10.1109/icde55515.2023.00269

10. C. Tilk, N. Bianchessi, M. Drexl, S. Irnich, and S. Mancini, "An adaptive large neighborhood search heuristic for last-mile deliveries under stochastic customer availability," Transportation Research Part B: Methodological, vol. 170, pp. 1-28, 2023.

11. U. Bauer, S. Irnich, and P. Fontaine, "Deep Q-learning for same-day delivery with vehicles and drones," European Journal of Operational Research, vol. 298, no. 3, pp. 910-926, 2022.

12. J. Su, and S. Dong, "Multi-objective optimization for dynamic logistics scheduling based on hierarchical deep reinforcement learning," Scientific Reports, vol. 15, p. 18309, 2025. doi: 10.1038/s41598-025-18309-y

13. T. A. M. Toffolo, T. Vidal, and T. Wauters, "A hybrid genetic search and dynamic programming-based split algorithm for the multi-trip time-dependent vehicle routing problem," European Journal of Operational Research, vol. 317, no. 3, pp. 1003-1014, 2024.

14. S. Ge, X. Zhou, and T. Qiu, "MADRL-based order dispatching in MoD systems with bipartite graph splitting," IEEE Transactions on Intelligent Transportation Systems, 2024. doi: 10.1109/tsc.2024.3495538

15. D. Goeke, R. Roberti, and M. Schneider, "Covering delivery problem with electric vehicle and parcel lockers: Variable neighborhood search approach," Computers & Operations Research, vol. 159, p. 106228, 2023.