dc.contributor.author | Djupesland, Elias Eriksen | |
dc.date.accessioned | 2023-06-07T06:22:23Z | |
dc.date.issued | 2023-05-25 | |
dc.date.submitted | 2023-06-06T22:00:03Z | |
dc.identifier.uri | https://hdl.handle.net/11250/3070253 | |
dc.description | Postponed access: the file will be accessible after 2026-05-01 | |
dc.description.abstract | AutoStore (AS) uses a cubic system for warehouse automation, utilizing robots to retrieve and organize objects in a three-dimensional grid of bins in a Manhattan geometry environment. Lifelong Multi-Agent Pathfinding (LMAPF) is a crucial element to the system’s effectiveness, where robots are assigned a new goal upon completion of a given goal. Solving LMAPF in the cubic system requires coordination and agile decision-making, which is challenging for a system with a large set of agents. PRIMAL2 is a distributed Reinforcement Learning (RL) framework for LMAPF that allows agents to learn decentralized policies for planning paths online in partially observable environments. This thesis extends PRIMAL2 specifically for the AS cube environment. The main differences for the AS environment are; the z-axis of bins, low-density of unreachable cells, sparsely occupied with agents, and no corridors. Agents spend time changing velocity, changing x/y driving direction, and delivering or retrieving bins to their assigned goal. We modify the PRIMAL2 architecture by altering observation and action space to improve agent co- ordination and local behavior for the specific AS environment. Furthermore, we devise a data transformer to convert real-world historical warehouse operations, sourced from multiple AS clients, from a continuous environment to a discrete representation suitable for training simulations. Our simulated implementation of PRIMAL2 on AS environments indicates that Machine Learning (ML) constitutes a viable approach to LMAPF in AS systems, converging to a model adept at efficiently solving and accomplishing tasks. The model exhibits generalizability, performing well across environments with varying agent densities and configurations. | |
dc.language.iso | eng | |
dc.publisher | The University of Bergen | |
dc.rights | Copyright the Author. All rights reserved | |
dc.subject | Reinforcement Learning | |
dc.subject | Convolutional Neural Network | |
dc.subject | Lifelong Multi-Agent Pathfinding | |
dc.subject | AutoStore | |
dc.subject | Imitation Learning | |
dc.subject | Deep Neural Network | |
dc.subject | Safe Reinforcement Learning | |
dc.subject | ODrM* | |
dc.subject | Artificial Intelligence | |
dc.subject | Behavioral Cloning | |
dc.subject | Markov Decision Process | |
dc.subject | Automated storage and retrieval systems | |
dc.subject | Incremental Search | |
dc.subject | Deep Learning | |
dc.subject | PRIMAL | |
dc.subject | Machine Learning | |
dc.subject | Deep Reinforcement Learning | |
dc.subject | Multi-agent Reinforcement Learning | |
dc.subject | Rolling-Horizon Collision Resolution | |
dc.subject | Multi-Agent Pathfinding | |
dc.subject | Asynchronous Advantage Actor-Critic Algorithm | |
dc.subject | Priority Based Search | |
dc.subject | Stock Keeping Units | |
dc.subject | Warehouse Management System | |
dc.subject | Visual Geometry Group | |
dc.subject | Large Neighborhood Search | |
dc.subject | Long Short-Term Memory | |
dc.title | Reinforcement Learning for Lifelong Multi-Agent Pathfinding in AutoStore system | |
dc.type | Master thesis | |
dc.date.updated | 2023-06-06T22:00:03Z | |
dc.rights.holder | Copyright the Author. All rights reserved | |
dc.description.degree | Masteroppgave i informatikk | |
dc.description.localcode | INF399 | |
dc.description.localcode | MAMN-INF | |
dc.description.localcode | MAMN-PROG | |
dc.subject.nus | 754199 | |
fs.subjectcode | INF399 | |
fs.unitcode | 12-12-0 | |
dc.date.embargoenddate | 2026-05-01 | |