Data Science Automation For Big Data and IoT Environments Blog

Data Science Automation For Big Data and IoT Environments

by 7wData
September 24, 2016

data science sits at the core of any analytical exercise conducted on a big data or Internet of Things (IoT) environment. data science involves a wide array of technologies, business, and machine-learning algorithms. The purpose of data science is not only to do machine learning or statistical analysis, but also to derive insights out of the data that a user with no statistics knowledge can understand.

In a fast-paced environment such as big data and IoT, where the type of data might vary over the course of time, it becomes difficult to maintain and re-create the models each and every time. This gap calls for an automated way to manage the data-science algorithms in those environments. The rise of data science was intended to move us away from a rules-based system to a system in which a machine learns rules for its automation by itself. Machine learning makes data science inherently partially automated. The half of data science that requires manual intervention is still to be automated. However, those are areas that involve the experience and wisdom of a people: a data scientist, a business expert, a software developer, a data integrator, everyone who currently contributes to making a data-science project operational. This makes it difficult to automate every aspect of data science. However, we can think of data science automation as a two level architecture, wherein:

– All the individual automated components are interconnected to form a coherent data-science system

We can think of a data-science system as automated when it’s capable enough to solve our problem whenever we throw a data set at it. Also, it should be intelligent enough to provide us with all possible solutions in a language that we can understand.

Data preparation, machine learning, domain knowledge, and result interpretation are four major tasks required to execute a data-science project successfully. All these tasks have to be converted to automated modules to create an automated data-science system (Figure 1).

Data preparation is a repetitive task that has to be done every time when creating models. Data extraction, data cleaning, and data transformations such as imputing null values and algorithm-specific transformations are some tasks that fall into this category. Many organizations automate these tasks and have branded the engine as a data science automation tool. However, most of these tools use rule-based logic for automating data-preprocessing tasks. Is this the right approach? Do we need rule-based systems to automate data science, which was born to end rule-based systems? Well, No. We need data preprocessing automated by machine learning itself. For example, the decision regarding what preprocessing function has to be applied on the data for a problem is to be made by machines themselves.

Feature engineering is another area of data preparation that requires automation. Feature engineering is a technique to convert raw data into attributes/predictors that improve the accuracy of a machine-learning project. Feature-engineering automation is still at a nascent stage and an active area of research. Data scientists from MIT are making incredible progress toward developing a “deep feature synthesis” algorithm capable of generating features from raw data.

This is an area of data-science automation where statistical routines are automated. The system executes the best algorithm based on the provided data set. It hides the intricacies and mathematical complexity of algorithms from the user, making it available to the masses. The user needs to provide the automated statistician with data.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Data Science Automation For Big Data and IoT Environments

Leave a Reply Cancel reply

Upcoming Events

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

MarkLogic World | Amsterdam

Categories

Tags

You Might Be Interested In

Why Your AIOps Deployments Could Fail

Oracle Wants Its Cloud to Grow Inside Your Data Centers

How Artificial Intelligence Can Improve Automated Customer Care

Recent Jobs

D365 Business Analyst

Judiciary Research Manager (Court Executive 2B)

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Data Science Automation For Big Data and IoT Environments

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change