Vis enkel innførsel

dc.contributor.authorDjupesland, Elias Eriksen
dc.date.accessioned2023-06-07T06:22:23Z
dc.date.issued2023-05-25
dc.date.submitted2023-06-06T22:00:03Z
dc.identifier.urihttps://hdl.handle.net/11250/3070253
dc.descriptionPostponed access: the file will be accessible after 2026-05-01
dc.description.abstractAutoStore (AS) uses a cubic system for warehouse automation, utilizing robots to retrieve and organize objects in a three-dimensional grid of bins in a Manhattan geometry environment. Lifelong Multi-Agent Pathfinding (LMAPF) is a crucial element to the system’s effectiveness, where robots are assigned a new goal upon completion of a given goal. Solving LMAPF in the cubic system requires coordination and agile decision-making, which is challenging for a system with a large set of agents. PRIMAL2 is a distributed Reinforcement Learning (RL) framework for LMAPF that allows agents to learn decentralized policies for planning paths online in partially observable environments. This thesis extends PRIMAL2 specifically for the AS cube environment. The main differences for the AS environment are; the z-axis of bins, low-density of unreachable cells, sparsely occupied with agents, and no corridors. Agents spend time changing velocity, changing x/y driving direction, and delivering or retrieving bins to their assigned goal. We modify the PRIMAL2 architecture by altering observation and action space to improve agent co- ordination and local behavior for the specific AS environment. Furthermore, we devise a data transformer to convert real-world historical warehouse operations, sourced from multiple AS clients, from a continuous environment to a discrete representation suitable for training simulations. Our simulated implementation of PRIMAL2 on AS environments indicates that Machine Learning (ML) constitutes a viable approach to LMAPF in AS systems, converging to a model adept at efficiently solving and accomplishing tasks. The model exhibits generalizability, performing well across environments with varying agent densities and configurations.
dc.language.isoeng
dc.publisherThe University of Bergen
dc.rightsCopyright the Author. All rights reserved
dc.subjectReinforcement Learning
dc.subjectConvolutional Neural Network
dc.subjectLifelong Multi-Agent Pathfinding
dc.subjectAutoStore
dc.subjectImitation Learning
dc.subjectDeep Neural Network
dc.subjectSafe Reinforcement Learning
dc.subjectODrM*
dc.subjectArtificial Intelligence
dc.subjectBehavioral Cloning
dc.subjectMarkov Decision Process
dc.subjectAutomated storage and retrieval systems
dc.subjectIncremental Search
dc.subjectDeep Learning
dc.subjectPRIMAL
dc.subjectMachine Learning
dc.subjectDeep Reinforcement Learning
dc.subjectMulti-agent Reinforcement Learning
dc.subjectRolling-Horizon Collision Resolution
dc.subjectMulti-Agent Pathfinding
dc.subjectAsynchronous Advantage Actor-Critic Algorithm
dc.subjectPriority Based Search
dc.subjectStock Keeping Units
dc.subjectWarehouse Management System
dc.subjectVisual Geometry Group
dc.subjectLarge Neighborhood Search
dc.subjectLong Short-Term Memory
dc.titleReinforcement Learning for Lifelong Multi-Agent Pathfinding in AutoStore system
dc.typeMaster thesis
dc.date.updated2023-06-06T22:00:03Z
dc.rights.holderCopyright the Author. All rights reserved
dc.description.degreeMasteroppgave i informatikk
dc.description.localcodeINF399
dc.description.localcodeMAMN-INF
dc.description.localcodeMAMN-PROG
dc.subject.nus754199
fs.subjectcodeINF399
fs.unitcode12-12-0
dc.date.embargoenddate2026-05-01


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel