BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Sometimes Data Science Is Not About Giving Answers Its About Asking Better Questions

Following
This article is more than 4 years old.

Getty

Data science has become inextricably associated with the notion of uncovering indisputable truth from reams of unquestionable data. Data has become associated with “truth” and the data scientist’s job merely to coalesce myriad unrelated numbers into quantitative certainty. In reality, not all questions have singular indisputable answers and sometimes a data scientist’s job is not to give an answer or provide certainty to decisions, but rather to help explain questions more clearly or even help others ask better questions that are more aligned with the data that is available.

Perhaps the most important lesson for new data scientists to learn is that not every question has a definitive answer. So many new analysts are led astray by misguided data science training programs in which every exercise has a “right” answer that students are merely guided along to find. The notion that some questions are far too nuanced or depend on too many unknowns is largely absent from most training curriculums that are unfortunately steeped in Silicon Valley’s ethos that code reigns supreme and that from algorithms and data come truth.

In reality, so many of the decisions that business and governmental leaders must make do not have obvious answers. If they did, algorithms could replace management today.

In fact, the questions that are readily answerable have already largely been automated at many companies and demonstrate the limitations of relying on data-driven decision making.

More and more companies are turning to algorithmic resume screening to help with hiring decisions. Yet a growing body of evidence suggests these algorithms can inadvertently be highly discriminatory, penalizing members of specific demographics by encoding certain physical traits, experiences and linguistic cues into their filters. Yet even a perfectly unbiased algorithm, if such a piece of code were possible, would struggle to truly capture the totality of each candidate and understand whether they would ultimately be the perfect fit for a given position.

In some cases, perhaps a better use of algorithms is not to filter candidates, but to examine potential resumes to better understand the kind of skillsets and experiences applicants believe would make them the best fit for the position and which hiring managers appear to be prioritizing. If a position for a data analyst primarily yields candidates claiming Python, R and SQL experience on their resumes, while the company is hoping to see advanced TensorFlow expertise and a deep mathematical background in machine learning, it could suggest to hiring managers that the position is not correctly matched to the company’s needs and help them refine the question of what a perfect candidate looks like.

Similarly, if a company is debating where to open a new store or factory or a government is deciding what kind of diplomatic action to pursue, there may not always be a clear-cut answer, meaning the best a data analyst can do is to provide additional insight that can help decision makers better understand the context and potential outcomes of available options and ask more refined and targeted questions to narrow their choices.

Most importantly, organizations rarely possess the totality of information necessary to answer their most important questions. They often have only bits and pieces of the information puzzle relating to their critical decisions and the quality of each of those pieces is often unknown.

Rather than providing answers in such cases, perhaps data scientists would do better to leverage the data they have to help their customers ask their questions more clearly, guiding them away from their original inquiries for which available data and algorithms may be unsuitable and towards questions which can be more directly addressed.

In other words, not all questions posed to data scientists are ones they can directly answer in a robust and accurate way with the data that is available to them. Rather than cobbling together a dangerously misleading answer by massaging the data into some untenable result, data scientists need to be better about pushing back and making it known that the data won't support a conclusion, instead guiding the requester to ask an alternative question that is better aligned with the data.

If a business manager asks "what do our customers like about our product" a data analyst shouldn't immediately turn to a few keyword searches of Twitter. They should explain that to answer this question accurately will likely require the collection of new data, from focus groups to surveys to in-store engagement. If the manager demands an answer within the next few hours, the analyst should help them refine their question to "what are the top adjectives used to describe our product on Twitter" with the understanding that this new question can be directly and definitively answered even if it is not entirely what the manager needs. It may be the case that in helping the manager to understand these limitations, the manager ultimately determines that analyzing tweets would be insufficient in their case and instead allocate the funding and additional time to collect new data.

In short, instead of translating unanswerable questions into answerable questions on their own, data scientists should engage with their customers in an iterative process to help their clients understand the ramifications behind each potential option, rather than making those choices for them, since they in turn may not understand the context and implications of the customer's reason for asking their question.

Putting this all together, rather than acting as false evangelists perpetuating misleading stereotypes, perhaps data scientists would do better to clearly communicate the risk and uncertainty around their results and accept that their field is not always about answering questions. Sometimes it is about helping others ask better questions.