Towards Zero-Shot Cross-Agent Transfer Learning via Aligned Latent-Space Task-Solving.
Video published with a submitted paper (coming soon) abstract:
Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state-action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.
Universal Notice Networks : Transferring learned skills through a broad panel of applications
Video published with the paper: Abstract:
Despite great achievements of reinforcement learning based works, those methods are known for their poor sample eciency. This particular drawback usually means training agents in simulated environments is the only viable option, regarding time constraints. Furthermore, reinforcement learning agents have a strong tendency to overt on their environment, observing a drastic loss of performances at test time. As a result, tying the agent logic to its current body may very well make transfer unefcient. To tackle that issue, we propose the Universal Notice Network (UNN) method to enforce separation of the neural network layers holding information to solve the task from those related to robot properties, hence enabling easier transfer of knowledge between entities. We demonstrate the eciency of this method on a broad panel of applications, we consider dierent kinds of robots, with dierent morphological structures performing kinematic, dynamic single and multi-robot tasks. We prove that our method produces zero shot (without additionnal learning) transfers that may produce better performances than state-of-the art approaches and show that a fast tuning enhances those performances.
2022
Robotic Control Of The Deformation Of Soft Linear Objects Using Deep Reinforcement Learning.
Video published with the paper: Abstract:
This paper proposes a new control framework for manipulating soft objects. A Deep Reinforcement Learning (DRL) approach is used to make the shape of a deformable object reach a set of desired points by controlling a robotic arm which manipulates it. Our framework is more easily generalizable than existing ones: it can work directly with different initial and desired final shapes without need for relearning. We achieve this by using learning parallelization, i.e., executing multiple agents in parallel on various environment instances. We focus our study on deformable linear objects. These objects are interesting in industrial and agricultural domains, yet their manipulation with robots, especially in 3D workspaces, remains challenging. We simulate the entire environment, i.e., the soft object and the robot, for the training and the testing using PyBullet and OpenAI Gym. We use a combination of state-of-the-art DRL techniques, the main ingredient being a training approach for the learning agent (i.e., the robot) based on Deep Deterministic Policy Gradient (DDPG). Our simulation results support the usefulness and enhanced generality of the proposed approach.
2013
First motion of the R.HEX robot
Here are some motions of the R.HEX (Robot HEXapod) robot desgined in the LIRMM between 2012 and 2013.
2012
Retrieving Contact Points Without Environment Knowledge
This paper paves the way for the contact retrieving of human motions without environment knowledge.
The goal is to find out what is the minimal set of contacting links of the human body that is required to perform a recorded motion.
First, we fit the capture motion to a unified representation of the human: the Master Motor Map.
Then, we pay attention to the Minimal Oriented Bounding Box of the velocity and acceleration for every links in order to determine if one part of the link is not moving, that provide an initial guess of the contacting links. Then, we find the minimal set of contacting links that ensure the balance, i.e. the dynamic equations, of the model.
Eventually, we test this method on several motions with actual and pretending to have contacts. We show that it is efficient for motions such as walking and deserve to be improve for more complex motions with a lot of contact points.
Accurate Evaluation of a Distance Function for Optimization-based Motion Planning
We propose three novel methods to evaluate a distance function for robotic motion planning based on semiinfinite programming (SIP) framework; these methods include golden section search (GSS), conservative advancement (CA) and a hybrid of GSS and CA. The distance function can have a positive and a negative value, each of which corresponds to the Euclidean distance and penetration depth, respectively. In our
approach, each robot's link is approximated and bounded by a capsule shape, and the distance between some selected link pairs is continuously evaluated along the joint's trajectory, provided by the SIP solver, and the global minimum distance is found.
This distance is fed into the SIP solver, which subsequently suggests a new trajectory. This process is iterated until no negative distance is found anywhere in the links of the robot.We have implemented the three distance evaluation methods, and
experimentally validated that the proposed methods effectively and accurately find the global minimum distances to generate a self-collision-free motion for the HRP-2 humanoid robot. Moreover, we demonstrate that the hybrid method outperforms
other two methods in terms of computational speed and reliability.
Generation of a motion database for the HOAP-3 Robot
This video shows some motions of the database developped for the HOAP-3 Robot.
In future works, those motions will be used to perform the locomotion of the robot in its environment.
Robotic imbodiment
Those motions of the database were used in an Experiment of Robotic imbodiment using an FMRI interface:
The patient, in the fMRI machine, watches the environment through the eyes of the robot.
He can go forward, turn left or right just by thinking ...
The robot is located in France (IUT Béziers) and the fMRI machine is in Israel (Weizmann Institute)
Vere Project www.vere-project.eu
This work was published on the website of the BBC and
New Scientist
2011
HRP-2 puts a ball behind an obstacle
For this videos, the robot has to put a ball in a box behind an obstacle.
The left hand has to contact the obstacle.
The motion and the left hand contact position are optimized.
How to hurt humanoid robots
This video shows the motions we generated with our method.
We use a full body dynamic motion optimization in order to reproduce and study the human walking with leg disease.
The first walking motion is without leg disease.
For the second one we locked the left knee joint to reproduce the walking motion wearing a splint.
The last motion is to simulate a broken leg (or foot), the left foot of the robot cannot support more than 85% of the total robot weight.
HRP-2 puts away a ball
In this video, the HRP-2 robot puts a ball in a box under a desk.
To perform this task, it leans on the desk like human does.
In this case, we do not use any kind of balance stabilizer,
that is why there is some oscillations at the end of the motion.
Walking on a fifteen-centimeter platform with HRP-2
This video presents the last results we get with our motion generation method.
The platform of 15 cm high is used during the motion instead of being avoided.
The motion is generated considering the full-body rigid model of the robot.
We add some constraints to avoid too much impacting the ground.
2010
Generation of Dynamic Multi-contact motion
This video shows the results of our work on the multi-contact motion generation. This work has been submitted for publication,
Some part of this video was published with an article of newscientist available here
Generation of Dynamic Multi-Contact Motions: 2D case studies
This video shows the effectiveness of the motion planning process, presented in Humanoids'10.
We can generate a dynamical multi-contact motion ensuring the balance and the physical limits of the robot
This work was the first step of a 3D multi-contact motion planner.
Generation of Dynamic Motions Under Continuous Constraints: Efficient Computation Using B-Splines and Taylor polynomials
This video shows the effectiveness of the motion planning process,presented in IROS'10.
We can generate a dynamical motion with a set of continuous inequality and equality constraints
We enforce the right hand to be sustained at a given position (continuous equality) while kicking a ball and ensuring balance (continuous inequality).
2009
Fast and Safe Motion Re-planning
This video shows the effectiveness of the replanning process, presented in IROS'09 and published in IEEE T-RO.
On the left there is an optimal motion, computed knowing the position of the ball.
On the middle the same optimal motion, but with a different position of the ball.
On the right the re-planned motion, computed from the previous optimal one and knowing the actual position of the ball.
We can see, that the re-planned motion is better than the optimal one if a wrong ball position.
This motion is produced from the optimal one, in less than one second of CPU-time.
Safe Motion Planning
This video shows the results of the safe planning motion,
presented in ICRA'09 and published in IEEE T-RO.
The motion are computed offline, and are runned in an open loop mode (without any balance control).
The ball is located and a simple heuristic chooses the next step to achieve to track the ball.
We can see, that the motions are safe since they ensure the balance of the robot.
2008
Guaranteed Discretization
These videos show the beginning of my Ph-D Works. I used a simple 2-D model of the HOAP-3 robot to emphasize the fact that a classical (time-grid) discretization could be hazardous for the robot.
On the left, there is a motion planned with a classical discretization, the (balance) constraint is checked only for some instant.
The robot falls because there is violation of the balance constraint between two instants.
On the right, there is a motion planned with Guaranteed (time-interval) discretization, the constraint are computed by taking into account the extremaover time-interval.
The robot does not fall, because there is no constraint violation.