Updated

Fitness trackers may be a trendy way to monitor every step we take, but these gadgets are actually pretty bad at keeping tabs on how much energy we burn, a new study suggests.

Scientists pitted 12 devices like the Fitbit Flex and Jawbone Up24 against two proven methods of monitoring energy expenditure - locking people in a room to assess every calorie consumed and burned, or asking people at home to drink specially treated water that makes it possible to detect energy output with a urine test.

In the first experiment, measurements from the fitness trackers deviated from the lab results in a typical day by underestimating energy expenditure by as much as 278 calories or overestimating by up to 204 calories. With the second experiment, the devices ranged from 69 to 590 calories lower than the urine tests.

The results are troubling because when fitness trackers overestimate exercise, people who need more exercise to maintain or lose weight might get too little activity, increasing their risk for obesity and other chronic health problems, said senior study author Motohiko Miyachi of the National Institute of Health and Nutrition in Tokyo, in an email.

Underestimating exercise might be just as dangerous for some people, said Dr. Adam Schoenfeld, a researcher at the University of California, San Francisco and author of an editorial accompanying the study in JAMA Internal Medicine.

"For example, it could be quite dangerous if someone with heart disease had inaccurate recordings of their activity and exercise that was being used to make medical decisions," Schoenfeld said by email.

"In healthy persons, use of fitness trackers may not be as risky, especially if the information collected is not used for medical decision-making," Schoenfeld added. "Still, even for healthy users, it may be difficult to promote health and wellness if these devices are proving inaccurate or variable feedback."

More on this...

To test the accuracy of fitness trackers for monitoring energy expenditure, Miyachi and colleagues asked nine men and 10 women ages 21 to 50 to wear 12 different devices while participating in the two experiments.

Eight devices used in the experiments are popular with consumers in Japan - Fitbit Flex, Jawbone UP24, Misfit Shine, Epson Pulsense PS-100, Garmin Vivofit, Tanita AM-160, Omron CalorieScan HJA-403C, and Withings Pulse O2.

The other four gadgets have been validated in previous research - Panasonic Actimarker EW 4800, Suzuken Lifecorder EX, Omron Active style Pro HJA-350IT, and ActiGraph GT3X.

For the first experiment, participants went into what's known as a metabolic chamber, a room specially designed to monitor calories consumed and burned, for 24 hours. They got three meals, and they could work at a desk, exercise on a treadmill, watch television, do housework, and sleep while they were in the room.

In this airtight chamber, scientists can use a technique known as indirect calorimetry to assess energy expenditure by measuring carbon dioxide production and oxygen consumption.

Compared with these measurements, half of the fitness trackers underestimated energy expenditure and the rest overestimated it.

For the second experiment, each participant wore the devices for 15 days and collected urine samples on eight days. Every fitness tracker underestimated energy expenditure, the study found.

It's possible some of the underestimation might be due to people removing the devices to bathe or to charge batteries, the authors note.

In addition to the small size, other limitations of the study include its reliance on participants who weren't obese and who didn't have health problems that would limit their ability to exercise, the authors also note.

Still, the findings suggest that consumers may not have an easy time finding a reliable fitness tracker to monitor exercise, Schoenfeld said.

"It is currently quite challenging to tell which fitness trackers are accurate and reliable and which are not since there aren't much data available," Schoenfeld added. "These studies demonstrate that even the most popular applications and devices may be inaccurate or highly variable."