BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Facebook Debuts New Computer Vision Tools In Push To Interpret Images Pixel By Pixel

This article is more than 7 years old.

Facebook unveiled several new computer vision tools on Thursday to detect, delineate and label objects in an image, technology that could be used to improve image search on the social network, build experiences for vision-impaired users and interpret live videos in real-time.

Facebook AI Research (FAIR), a group of about 60 artificial intelligence scientists and researchers, is open-sourcing its two new algorithms. The first, “SharpMask,” is built on top of an existing Facebook tool called “DeepMask” to detect and delineate every object in an image using object “layers.” The second, “MultiPathNet,” labels individual objects in an image. The social media giant said the tools are part of its broader goal to enable computers to understand images and objects at the pixel level, much like the human eye. The Menlo Park, Calif.-based company said that while its technology isn't perfect, it significantly advances its computer vision abilities. Facebook said it expects the technology will also be applied more broadly to commerce, health and augmented reality.

Researchers have seen major advancements in image classification (naming the objects in an image) and detection (locating the objects), thanks to the evolution of deep neural networks, which can be trained to automatically learn new patterns. In its computer vision system, Facebook is focusing on a newer technique known as “segmentation,” which uses algorithms to precisely outline every object in an image. Facebook said it is open-sourcing its new technology, making it available to any researcher or academic in the world, in hopes that it will help the field of machine vision advance more quickly.

“We can have a lot more impact when we open source our work,” Facebook researcher Piotr Dollár said in a phone interview. “We’re developing core technologies, but the uses are very broad. The community can make improvements to our underlying techniques and think of applications we haven’t yet.”

Facebook said the technology will make it easier for its users to search for images that don't have explicit tags and could be used for tasks such as determining the nutritional value of a bagel by uploading an image. Facebook said the tools could also enable vision-impaired users to understand what is in a photo without relying on a photo caption (Facebook demoed this use case during its latest F8 developer conference in April. One can imagine, for example, being able to “swipe” an image to hear a description).

Facebook said its next challenge is applying its computer vision technology to video and supporting use cases such as classifying live videos in real-time. Better interpretation of video at scale could lead to big improvements in video curation, content monitoring and live narration.

Follow me on TwitterSend me a secure tip