Facebook wants machines to see the world through our eyes
For the past two years, Facebook AI Research (FAIR) has worked with 13 universities around the world to gather the most first-person video data-especially to train models to recognize in-depth image awareness. Dataset -trained AIs may be better at controlling robots that communicate with humans, or interpreting images from smart glasses. “Machines help us in everyday life if they really know the world through our eyes,” said Kristen Grauman of FAIR, who led the project.
Such tech can support people who need help inside the home, or guide people through tasks they know they can complete. “The video in this dataset is much closer to how people observe the world,” said Michael Ryoo, a computer vision researcher at Google Brain and Stony Brook University in New York, who has nothing to do with Ego4D.
But the potential for wrongful abuse is clear and worrying. The research was funded by Facebook, a social media giant recently accused by the Senate of putting profit rather than benefit to the people, a sentiment confirmed by MIT Technology Reviewthe own investigations.
Facebook’s business model, and other Big Tech companies, should take as much data as possible from people’s online behavior and sell it to advertisers. The AI outlined in the project can be projected into people’s day-to-day behavior on a daily basis, revealing things around a person’s home, what activities he or she enjoys, who he or she spends time with, and even where his gaze remains on an unprecedented personal information.
“There’s work to privacy that needs to be done as you take it out of the world of exploration and into something that’s a product,” Grauman said. “That work can be inspired by this project.”
Ego4D is a step change. Most previous first -person video sets consist of 100 hours of footage of people in the kitchen. The Ego4D set consists of 3025 hours of video recorded by 855 people in 73 different locations in nine countries (US, UK, India, Japan, Italy, Singapore, Saudi Arabia, Colombia and Rwanda).
Participants came in a variety of ages and backgrounds; some are recruited for their interesting insights into jobs, such as bakers, mechanics, carpenters, and landscapers.
Earlier datasets were usually composed of semi-scripted video clips only a few seconds long. For Ego4D, participants wore head-mounted cameras for up to 10 hours at a time and captured video of the person’s own unrecorded daily activity, including walking along. on a street, reading, washing, shopping, playing with pets, playing board games, and interacting with other people. Some of the shots also have accompanying audio, data about where participants ’vision is focused, and multiple perspectives on the same scene. This is the first order of its class, according to Ryoo.