Why Facebook Is Training Robots To Think

Facebook's hexapod, Daisy, learning to walk

Martine Paris

On the rooftop of the building that houses the Facebook AI Research (FAIR) lab in Mountain View, California, there is a bootcamp for robots where the sun beams down on Daisy, a hexapod who is learning how to walk on a dirt jogging path. Her foot has become stuck in mulch as she struggles to wrestle free. A team of Facebook AI researchers eagerly look on, watching to see what she will do next as she moves forward with the curiosity and experimentation of a toddler. One flight down, Daisy’s counterpart Pluto, a red arm robot, is learning how to reach for an object in its playpen. Toys are strewn everywhere.

What’s going on?

Facebook's robotic arm, Pluto, reaching for a toy

Martine Paris

Facebook is leading an effort to teach robots how to think for themselves and develop human-like intuition that will enable them to navigate unknown circumstances. The approach is called embodied AI, which means giving software a physical body to explore, and is a more flexible way of learning than with pre-programmed AI which is limited by deterministic algorithms and canned data sets.

However, with all of the recent scrutiny by the G7 and U.S. government, the mere mention of Facebook training robots might strike fear in the hearts of those already concerned about the social network. Yet, what I witnessed during my visit to the Facebook Robotics Lab made me think we’re quite far from the sci-fi AI apocalypse often tweeted about by Elon Musk.

What follows is an edited transcript of my discussion with Facebook AI researchers, Roberto Calandra and Franziska Meier, who are working with physical robots in physical environments, and Dhruv Batra, who gave me an exclusive interview on milestones achieved by Habitat, an embodied agent platform that trains virtual robots in virtual environments.

MORE FOR YOU

JPMorgan Joins Goldman Sachs In Serious Bitcoin Halving Price Warning

Google Makes A Major New Sale Offer To Pixel 8 Buyers

Drugs Like Ozempic And Mounjaro Could Treat Other Conditions Here s What Scientists Are Looking At

Many people would be surprised to hear that Facebook has a robotics lab as most think of it primarily as a communications platform. What is Facebook's interest in robots?

Franziska Meier: Advances in AI have been pushed by structured data sets, but if you think about how intelligence manifests, it’s about how we interact in the real world. Robots also learn by interacting in the real world and so we are working with them to build better models faster.

Pluto, our red arm robot has been learning how to reach for certain things but has no idea what commands to send to its motors to move to a certain location. The robot needs to learn about itself. Curiosity plays a key role. “With the hand?” it considers (Meier shows me a demo). In self-supervised learning, the robot does something, observes the changes and tries to make sense of that data.

With robots, you don't have structured data sets, instead you start with a bad model and use it to predict forward the consequences of your actions. The robot tries action sequences that it thinks might get it to the location. When it ends up somewhere else, it gets new data on observations which it uses to update the model.

What are the goals of your team?

Roberto Calandra: FAIR aims to advance AI by doing open research. We care about publishing papers, open sourcing code and creating data sets for the overall benefit of the scientific community. The main challenge we are trying to solve is understanding the algorithmic ways humans learn and how we can reproduce the same level of intelligence in machines to make society better.

You’re a public company with shareholder obligations, how does your work benefit Facebook specifically?

Meier: We don’t work directly on any specific products, but by advancing algorithms that can learn from self-supervised data, our research could appear in machine translation or mapping between different languages.

^{Facebook PR added that FAIR research can also be found woven throughout the Facebook experience including ranking recommendations in the news feed, dealing with objectionable content, and powering new experiences like the AI camera in Facebook’s home device, Portal, which uses computer vision to pan and focus on movements during video calls.}

FAIR team with Facebook's hexapod robot, Daisy

Facebook

Facebook is training robots in both simulated and real world environments. Why?

Dhruv Batra: Building embodied AI systems for robots is the next grand challenge for AI. Once you can train systems to navigate in these spaces and interact with them, it’s groundbreaking because the robots are using their own intelligence, not a rule set. Everything is new that's being introduced and it's calculating every decision, making mistakes, exactly like a human would.

But if you try to learn exclusively with robots on a hardware platform, there are going to be mishaps. If you ask the robot to pick something up and navigate to another room, chances are it's going to try to take action and break itself or collide with other people, and that robot is expensive. It can cause damage and heavier robots can cause serious harm.

Machine learning tends to require massive amounts of experience and data to learn. Gathering that experience on real robotic platforms tends to be slow, expensive and difficult. Some agents have been trained for two billion frames of experience in simulation, which roughly corresponds to 18 years of human time. Not practical to do on an actual robot. That's why we think the way to go is to first learn in simulation as a test ground.

You can have 10,000 simulators running really fast, which means you can try out all sorts of actions in those simulations and know which ones are working, which ones are not. Once your agents are safe and effective in simulation, you can deploy robots.

Two billion frames of experience is also 10,000 frames per second. If it were 30 to 60 frames per second, the speed at which the Unity and Unreal engines run, our experiments would be orders of magnitude slower. Those engines were not made for machine learning which is why we built the Habitat simulator ourselves.

The Habitat virtual environment is 3D reconstructed and photorealistic so when we train agents in this space, we hope to generalize to the real thing.

How does the points-based rewards system work?

Batra: The embodied agent is initialized at a location and told to go to its mark. It figures out how to get there by trial and error. The only way it knows it completed a task is if it gets a reward. If it makes it, it receives points as a positive reward. If it doesn't make it, it get’s penalized by having points deducted. If it makes it some way of the of the way there, it gets partial credit.

Facebook's robotic arm, Pluto, with a Rubiks cube

Facebook

Do robots have memory?

Batra: In a way. The end product of learning is a model that can be tested on new environments. Everything that you've seen before it gets compressed into numbers that go into parameters of that model. So in some sense, that is memory, but not in the sense that you can talk to that parameter, “Have you seen this house before? Have you been here before? Please don't make the same mistake.” It doesn't have those aspects.

Can robots multitask?

Batra: A lot of models are good at tracking single incidents but they're not good at tracking the combination. The most important bit in intelligence is abstraction, which finds commonalities about the experience, what matters, and what's irrelevant to the task. We're working on this in Habitat.

How do the embodied agents navigate their worlds?

Computer vision helps with perception sensing control. Camera sensors capture height by bit by RGB. From this we know what objects exist in front, where they are, what their depth and distances are from the camera. What objects might be used for future predictions. If there is a person, what might they do two seconds, 10 seconds or 20 seconds down the line.

When we see amazing feats of mastery by Boston Dynamics robots, like a backflip off a podium, what are we looking at?

Batra: Boston Dynamics’ robots have perfect knowledge and complete sensing to navigate their environment. They don't need cameras because everything is hard-coded. They know exactly what objects are where, how high the ground is, and whether there's a pebble that's going to make it slip. What they're demonstrating is the ability to control.

When are we going to pass the Turing Test, that point in time where robots can trick humans into believing they’re human? 15 to 20 years?

Calandra: I’d be surprised if it happens in my lifetime, 15 to 20 years is very optimistic. We don’t even understand what intelligence means, not from an algorithmic perspective, and we have absolutely no idea how to reproduce it.

Batra: Also the Turing Test is not the best show of intelligence as it relies on deception. It’s easier to put on a fake display of intelligence and do rule-based things than solve tasks that require real intelligence. A robot will always do what you command, it just needs to know that you’re issuing command 11 and I’m going to do this regardless of what you said before or after, regardless of the nuances in your voice, and the words don’t really matter. This is something that has plagued the AI community over decades. A more accurate test of intelligence is the ability to do increasingly sophisticated and useful tasks independently.

Where will we be in 10 years?

Batra: That's much easier to answer. In 10 years, I think we will have made significant progress on navigation agents in indoor environments, we will start seeing things that can navigate very well in new surroundings. Simulation will be the dominant paradigm, and simulation to real robot will be the thing that people work on.

What is your next big breakthrough?

Batra: The current limitation of Habitat is that the world is static. It's a 3D reconstructed mesh which means that the agent can move but can’t touch, pick up or push objects - not yet. We are working on integrating with a physics engine to enable gravity, objects falling, colliding and so on. Still under development, but once done we'll be able to import objects like chairs and tables into indoor spaces where the agent can practice skills like pushing, poking and picking them up.

At what point do you shut down the lab? If you found a toy stacked the next morning?

Calandra: I'd be concerned because it means someone else has access to this room.

Facebook's robotic arm, Cleo, with stacked toy

Martine Paris

^{The conversation has been edited and condensed for clarity.}

Follow me on Twitter.

More From Forbes

Why Facebook Is Training Robots To Think

JPMorgan Joins Goldman Sachs In Serious Bitcoin Halving Price Warning

Google Makes A Major New Sale Offer To Pixel 8 Buyers

Drugs Like Ozempic And Mounjaro Could Treat Other Conditions Here s What Scientists Are Looking At