Data Science Automation For Big Data and IoT Environments
- by 7wData
data science sits at the core of any analytical exercise conducted on a big data or Internet of Things (IoT) environment. data science involves a wide array of technologies, business, and machine-learning algorithms. The purpose of data science is not only to do machine learning or statistical analysis, but also to derive insights out of the data that a user with no statistics knowledge can understand.
In a fast-paced environment such as big data and IoT, where the type of data might vary over the course of time, it becomes difficult to maintain and re-create the models each and every time. This gap calls for an automated way to manage the data-science algorithms in those environments. The rise of data science was intended to move us away from a rules-based system to a system in which a machine learns rules for its automation by itself. Machine learning makes data science inherently partially automated. The half of data science that requires manual intervention is still to be automated. However, those are areas that involve the experience and wisdom of a people: a data scientist, a business expert, a software developer, a data integrator, everyone who currently contributes to making a data-science project operational. This makes it difficult to automate every aspect of data science. However, we can think of data science automation as a two level architecture, wherein:
– All the individual automated components are interconnected to form a coherent data-science system
We can think of a data-science system as automated when it’s capable enough to solve our problem whenever we throw a data set at it. Also, it should be intelligent enough to provide us with all possible solutions in a language that we can understand.
Data preparation, machine learning, domain knowledge, and result interpretation are four major tasks required to execute a data-science project successfully. All these tasks have to be converted to automated modules to create an automated data-science system (Figure 1).
Data preparation is a repetitive task that has to be done every time when creating models. Data extraction, data cleaning, and data transformations such as imputing null values and algorithm-specific transformations are some tasks that fall into this category. Many organizations automate these tasks and have branded the engine as a data science automation tool. However, most of these tools use rule-based logic for automating data-preprocessing tasks. Is this the right approach? Do we need rule-based systems to automate data science, which was born to end rule-based systems? Well, No. We need data preprocessing automated by machine learning itself. For example, the decision regarding what preprocessing function has to be applied on the data for a problem is to be made by machines themselves.
Feature engineering is another area of data preparation that requires automation. Feature engineering is a technique to convert raw data into attributes/predictors that improve the accuracy of a machine-learning project. Feature-engineering automation is still at a nascent stage and an active area of research. Data scientists from MIT are making incredible progress toward developing a “deep feature synthesis” algorithm capable of generating features from raw data.
This is an area of data-science automation where statistical routines are automated. The system executes the best algorithm based on the provided data set. It hides the intricacies and mathematical complexity of algorithms from the user, making it available to the masses. The user needs to provide the automated statistician with data.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Shift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read MoreYou Might Be Interested In
Why Your AIOps Deployments Could Fail
8 Jul, 2021One thing that has become apparent this past year is that digital transformation is a business imperative needed to thrive …
Oracle Wants Its Cloud to Grow Inside Your Data Centers
27 Jul, 2017Taking a different approach to hybrid cloud than some of its biggest competitors, Oracle has substantially beefed up the capabilities of its …
How Artificial Intelligence Can Improve Automated Customer Care
13 Aug, 2016There are notable exceptions but, in general, people are dissatisfied with the level of customer service they receive today. The …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.