LOTUS: Learning Universal Task Representations for Reinforcement Learning with Temporal Logic Guidance
Task guided agents demonstrate strong performance in a wide range of complex tasks. However, most existing task representation algorithms are tailored to specific contexts and struggle to generalize across diverse scenarios. Moreover, they typically depend on gradient signals from reinforcement learning controllers to update their weights, which can degrade both representation quality and learning efficiency.
To overcome these limitations, we propose LOTUS, a temporal logic inspired universal task representation framework that can be seamlessly integrated into any RL algorithm to enhance agent performance across diverse task settings. Specifically, we design a novel task representation architecture capable of modeling relationships and extracting task semantics from LTL formulas. We further introduce a more effective update mechanism that treats the LTL encoder as a policy, thereby improving representation capacity. To enhance stability and robustness, LOTUS leverages the bisimulation metric, which provides theoretical guarantees for LTL representation, including behavioral equivalence, optimality fidelity, and trajectory robustness.
Experimental results show that LOTUS outperforms most existing methods in learning efficiency, generalization capability, and representation quality.
Overview. The framework of LOTUS.
In ZoneEnv and LetterWorld scenarios, we sample different LTL tasks from the same multi-task distribution in every episode. For example, the sampled task can be $\lozenge(\mathrm{Black}\wedge\lozenge(\mathrm{White}\wedge\lozenge\mathrm{Red}))$ when the agent perfoming "Eventually_1_3_1_2", where "Eventually" means the task kind, "1_3" means the interval of subgoal depth can be sampled, and "1_2" is the interval of task conjuncts can be sampled.
I. Definition of the label format: In "Eventually_1_3_1_2", each part has the following meaning:
II. Explanation of task setting labels in the tables:
The performance (Our LOTUS) in the multi-task scenario with Eventually_1_3_1_2 task distribution.
The performance (Our LOTUS) in the multi-task scenario with Until_1_2_1_1 task distribution.
The performance (Our LOTUS) in the multi-task scenario with Eventually_4_8_1_2 task distribution.
The performance (Our LOTUS) in the multi-task scenario with Until_2_2_1_1 task distribution.
The performance (Our LOTUS) in the multi-task scenario with Eventually_3_3_4_4 task distribution.
The performance (Our LOTUS) in the multi-task scenario with Until_1_1_2_2 task distribution.
Successful Demo of Stack Task (Our LOTUS) and its corresponding snapshots with key task progression.
Task completion demos under different initial distributions of objects in Stack task (Our LOTUS).
Successful Demo of Task Nut Assembly (Our LOTUS) and its corresponding snapshots with key task progression.
Task completion demos under different initial distributions of objects in Task Nut Assembly (Our LOTUS).
Successful Demo of Task Cleanup (Our LOTUS) and its corresponding snapshots with key task progression.
Task completion demos under different initial distributions of objects in Task Cleanup (Our LOTUS).
Successful Demo of Task Peg Insertion (Our LOTUS) and its corresponding snapshots with key task progression.
Task completion demos under different initial distributions of objects in Task Peg Insertion (Our LOTUS).
Manipulation Performance (Our LOTUS) in Toilet task with the seen set.
Manipulation Performance (Our LOTUS) in Toilet task with the unseen set.
Snapshots of Stack task with key task progression in the Real-world performance (Our LOTUS).
Task completion demos under different initial distributions of objects in Stack task (Our LOTUS).
Snapshots of Cleanup task with key task progression in the Real-world performance (Our LOTUS).
Task completion demos under different initial distributions of objects in Cleanup task (Our LOTUS).
This website template is borrowed from Nerfies