BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

What Ghostbusters And Trump Teach Us About The Perils Of Data Analysis

Following
This article is more than 7 years old.

Earlier this week the new Ghostbusters movie reboot made history by apparently becoming the most disliked movie trailer in YouTube history. This same week Donald Trump made worldwide headlines by becoming the presumptive Republican nominee. Both have powerful lessons to teach us about the power and limitations of the world of data analysis.

The new Ghostbusters reboot trailer offers a simple but sad lesson on the increasingly toxic world of social media. As The Hollywood Reporter notes, earlier this week the trailer had accrued 29 million views and been downvoted over 620,000 times (these numbers are now even higher), suggesting that one out of every 48 people that viewed the trailer clicked on the thumbs down arrow. At first glance, this would suggest the movie is immensely disliked, apparently more than any other movie trailer in YouTube’s history. Yet, as the Reporter notes, plenty of other movies which have flopped at the box office have had dislike ratios of just a fraction of Ghostbusters.

The representativeness of social media in how well it reflects genuine societal trends is increasingly coming into question. In the case of the Ghostbusters movie, the Reporter documents two fascinating trends. The first is that sentiment towards the trailer is almost precisely the opposite on Facebook, where reaction is reported to be 90% positive and where women comprise a far greater percentage of posters. The second, and perhaps most troubling, is that more than half the videos on YouTube’s most disliked list feature women leads and in fact a large number of comments on the YouTube version of the trailer refer to creating multiple user accounts and recruiting large numbers of others to downvote the trailer in an active campaign.

Regardless of how well Ghostbusters ultimately fares at the box office, a critical takeaway from the reaction to its trailer is that online social commentary does not always reflect the genuine unvarnished view of society as a whole. Different social platforms may yield very different sentiment around a video due to differing demographics (which is itself a fascinating concept) and most critically, certain demographic groups may dominate the online conversation and/or rally inorganic campaigns to try and skew or otherwise bias the sentiment around a particular entity. Indeed, Bloomberg reports such inorganic campaigns are becoming standard practice in electoral politics.

Speaking of politics, the rise of Trump to become the likely Republican standard bearer brings with it two very different lessons in the perils of data analysis. The Obama campaign’s reinvigoration of how data is used in political campaigning ushered in the era of digital campaigning in which massive data mining of the American public is used to identify everything from on-the-fence voters to potential donors using micro modeling that attempts to predict the behavior of individual voters. His two decisive victories were billed in countless headlines, indepth biographies and Washington and data mining circles as being the result of his campaign’s mythical use of data mining, rather than being the simple result of a unique candidate appearing at just the right point in American history.

In January I wrote about how Campaign 2016 has continued to push the boundaries of modern data modeling. Cruz’s campaign in particular built what was widely claimed to be one of the more sophisticated data modeling efforts of the 2016 season, using “psychographic targeting” to build the equivalent of Myers-Briggs personality tests for American voters which was merged with more than “50,000 data points gathered from voting records, popular websites and consumer information such as magazine subscriptions, car ownership and preferences for food and clothing.” As Cruz rallied in the polls his campaign frequently cited this massive data mining effort as the secret to his success.

Trump’s campaign, on the other hand, has been described in media reports has having only a very minimal data program, centering its efforts around the candidate himself, rather than data mining of the electorate. In the end, the massive data mining efforts of campaigns like Cruz failed to make inroads against Trump’s personality-driven campaign offering a cautionary tale in the very real limits of data.

Perhaps most intriguingly, it calls into question just how decisive data mining really is in modern campaigning and whether Obama’s spectacular successes, which have been widely chalked up to his superior data mining efforts, were in fact due more to the candidate himself than the analytic prowess of his campaign.

Data mining today is frequently touted as a magic wand that can sway consumers to purchase any product. The marketing brochures of countless consumer data mining companies are filled with claims that with just a few waves of their magic data wand they can instantly fix the sales of even the worst product and get consumers to do whatever a company wants. In this way data mining is increasingly regressing backwards to the world of 1930’s behaviorism in which communications scholars believed audiences could be made to believe anything by simply assaulting them with the right amount of advertising. This so-called Magic Bullet Theory (also called “Hypodermic Needle Theory”) has long since fallen out of favor through empirical research, revealing a far more complex set of processes through which people form opinions, yet modern data mining practitioners seem increasingly to be gravitating back to this ideal of the heady days when the “uneducated masses” could be swayed against their will by a single advertisement.

There is a second lesson in Trump’s victory that is perhaps even more important to understand: the role human interpretation and bias plays in how we interpret and utilize the data available to us. As the mea culpas poured in this week from the myriad pundits, forecasters and political data analysts who had all but dismissed Trump’s campaign, a common thread emerged: just how strongly personal beliefs had colored the way analysts had been seeing the data emerging from the 2016 campaign trail. Nate Silver succinctly summarized his failure as “we basically got the Republican race wrong” and chalked it up to Trump simply being a unique candidate in unique times.

Yet, in Business Insider’s chronicle of Silver’s forecasts over the past year a far more interesting thread emerges, which is also seen in those of other political forecasters - the dismissal of data that challenged common beliefs of what made for a viable candidate in favor of findings that showed Trump could never succeed. As the available data emerging from the campaign trail strengthened around the possibility of Trump being the eventual nominee, Silver and others tended to discount the hardening numbers as not reflective of political reality. Polls and numbers which showed Trump losing tended to be favored over the growing body of data suggesting something very unique was happening. While the data was suggesting that campaign 2016 was breaking the mold of the traditional rules of American politics, the political analysts examining those data points simply shrugged them off and assumed that nothing could ever change how American politics functioned. In short – the early warning provided by data was rendered moot by the human analysts who ignored it in favor of data that made more sense to their own beliefs.

Putting this all together, this week has given us a powerful set of reminders of the very real limitations of data analytics that are all-to-often forgotten. Data can be inorganically skewed or reflect demographic biases, while even when data does reflect reality, its power is often muted by biased human analysis and, perhaps most damagingly to the campaign world, that the power of data mining may have been far oversold when it comes to politics. Which is to say, data can’t solve all the world’s problems by itself – it is merely a tool that is all-to-often misused and oversold.