Reinforcement Learning for Lifelong Multi-Agent Pathfinding in AutoStore system

Djupesland, Elias Eriksen

Djupesland, Elias Eriksen

Master thesis

Åpne

master thesis (Låst)

Permanent lenke

https://hdl.handle.net/11250/3070253

Utgivelsesdato

2023-05-25

Metadata

Vis full innførsel

Samlinger

Department of Informatics [917]

Sammendrag

AutoStore (AS) uses a cubic system for warehouse automation, utilizing robots to retrieve and organize objects in a three-dimensional grid of bins in a Manhattan geometry environment. Lifelong Multi-Agent Pathfinding (LMAPF) is a crucial element to the system’s effectiveness, where robots are assigned a new goal upon completion of a given goal. Solving LMAPF in the cubic system requires coordination and agile decision-making, which is challenging for a system with a large set of agents. PRIMAL2 is a distributed Reinforcement Learning (RL) framework for LMAPF that allows agents to learn decentralized policies for planning paths online in partially observable environments. This thesis extends PRIMAL2 specifically for the AS cube environment. The main differences for the AS environment are; the z-axis of bins, low-density of unreachable cells, sparsely occupied with agents, and no corridors. Agents spend time changing velocity, changing x/y driving direction, and delivering or retrieving bins to their assigned goal. We modify the PRIMAL2 architecture by altering observation and action space to improve agent co- ordination and local behavior for the specific AS environment. Furthermore, we devise a data transformer to convert real-world historical warehouse operations, sourced from multiple AS clients, from a continuous environment to a discrete representation suitable for training simulations. Our simulated implementation of PRIMAL2 on AS environments indicates that Machine Learning (ML) constitutes a viable approach to LMAPF in AS systems, converging to a model adept at efficiently solving and accomplishing tasks. The model exhibits generalizability, performing well across environments with varying agent densities and configurations.

Beskrivelse

Postponed access: the file will be accessible after 2026-05-01

Utgiver

The University of Bergen