Reinforcement Learning for Lifelong Multi-Agent Pathfinding in AutoStore system

Djupesland, Elias Eriksen

dc.contributor.author	Djupesland, Elias Eriksen
dc.date.accessioned	2023-06-07T06:22:23Z
dc.date.issued	2023-05-25
dc.date.submitted	2023-06-06T22:00:03Z
dc.identifier.uri	https://hdl.handle.net/11250/3070253
dc.description	Postponed access: the file will be accessible after 2026-05-01
dc.description.abstract	AutoStore (AS) uses a cubic system for warehouse automation, utilizing robots to retrieve and organize objects in a three-dimensional grid of bins in a Manhattan geometry environment. Lifelong Multi-Agent Pathfinding (LMAPF) is a crucial element to the system’s effectiveness, where robots are assigned a new goal upon completion of a given goal. Solving LMAPF in the cubic system requires coordination and agile decision-making, which is challenging for a system with a large set of agents. PRIMAL2 is a distributed Reinforcement Learning (RL) framework for LMAPF that allows agents to learn decentralized policies for planning paths online in partially observable environments. This thesis extends PRIMAL2 specifically for the AS cube environment. The main differences for the AS environment are; the z-axis of bins, low-density of unreachable cells, sparsely occupied with agents, and no corridors. Agents spend time changing velocity, changing x/y driving direction, and delivering or retrieving bins to their assigned goal. We modify the PRIMAL2 architecture by altering observation and action space to improve agent co- ordination and local behavior for the specific AS environment. Furthermore, we devise a data transformer to convert real-world historical warehouse operations, sourced from multiple AS clients, from a continuous environment to a discrete representation suitable for training simulations. Our simulated implementation of PRIMAL2 on AS environments indicates that Machine Learning (ML) constitutes a viable approach to LMAPF in AS systems, converging to a model adept at efficiently solving and accomplishing tasks. The model exhibits generalizability, performing well across environments with varying agent densities and configurations.
dc.language.iso	eng
dc.publisher	The University of Bergen
dc.rights	Copyright the Author. All rights reserved
dc.subject	Reinforcement Learning
dc.subject	Convolutional Neural Network
dc.subject	Lifelong Multi-Agent Pathfinding
dc.subject	AutoStore
dc.subject	Imitation Learning
dc.subject	Deep Neural Network
dc.subject	Safe Reinforcement Learning
dc.subject	ODrM*
dc.subject	Artificial Intelligence
dc.subject	Behavioral Cloning
dc.subject	Markov Decision Process
dc.subject	Automated storage and retrieval systems
dc.subject	Incremental Search
dc.subject	Deep Learning
dc.subject	PRIMAL
dc.subject	Machine Learning
dc.subject	Deep Reinforcement Learning
dc.subject	Multi-agent Reinforcement Learning
dc.subject	Rolling-Horizon Collision Resolution
dc.subject	Multi-Agent Pathfinding
dc.subject	Asynchronous Advantage Actor-Critic Algorithm
dc.subject	Priority Based Search
dc.subject	Stock Keeping Units
dc.subject	Warehouse Management System
dc.subject	Visual Geometry Group
dc.subject	Large Neighborhood Search
dc.subject	Long Short-Term Memory
dc.title	Reinforcement Learning for Lifelong Multi-Agent Pathfinding in AutoStore system
dc.type	Master thesis
dc.date.updated	2023-06-06T22:00:03Z
dc.rights.holder	Copyright the Author. All rights reserved
dc.description.degree	Masteroppgave i informatikk
dc.description.localcode	INF399
dc.description.localcode	MAMN-INF
dc.description.localcode	MAMN-PROG
dc.subject.nus	754199
fs.subjectcode	INF399
fs.unitcode	12-12-0
dc.date.embargoenddate	2026-05-01