Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in ecommerce
    Analytics Technology Drives Conversions for Your eCommerce Site
    5 Min Read
    CRM Analytics
    CRM Analytics Helps Content Creators Develop an Edge in a Saturated Market
    5 Min Read
    data analytics and commerce media
    Leveraging Commerce Media & Data Analytics in Ecommerce
    8 Min Read
    big data in healthcare
    Leveraging Big Data and Analytics to Enhance Patient-Centered Care
    5 Min Read
    instagram visibility
    Data Analytics Plays a Key Role in Improving Instagram Visibility
    7 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Can We Automate Data Mining?
Share
Notification Show More
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Can We Automate Data Mining?
AnalyticsBig DataBusiness IntelligenceData MiningModeling

Can We Automate Data Mining?

SandroSaitta
Last updated: April 15, 2013 4:07 pm
SandroSaitta
7 Min Read
SHARE

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors.

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors. In 2010, we continued the discussion about specific aspects of data mining which could be automated.

Recently, I re-launched the debate on the Swiss Association for Analytics. However, I think it is worth a dedicated blog post. In order to answer this big question, we need to analyze the different phases of data mining and estimate which one can be automated. For this purpose, I have chosen the CRISP-DM methodology (I guess any other data mining process would lead to similar conclusions).

Business understanding

More Read

Image

Getting Real Value from BI Investments

The Great Analytical Divide: Data Scientist vs. Value Architect
PAW London – Uplift Modelling, Text Analytics and Other Advanced Methods
Forecasting Is Harder Than It Looks
What is the Cost of Hiring Data Savvy Software Developers?

In this critical step, we transform a business problem into a data mining one. We need to understand what should be solved and why. Answers will lead to the following steps. It is clear that this step cannot be automated for a new project. The data miner has to interact with experts to define the data mining problem to solve.

Data understanding

This step consist in understanding the data, the way they have been collected, their particularities, etc. Again, the data miner works in collaboration with field experts to derive knowledge useful for preparing the data (next step). This is a manual task that cannot be automated.

Data preparation

In this step, we transform raw data into meaningful information to mine. An example is outlier detection (and removal). Some companies argue that their tools can automate this step. This is true to a certain extent, but there are limitations. Here is a simple example: what is the threshold for the variable “age” to be an outlier? 100, 110, 150 years old? This is problem dependent. The same issue happens for missing values. Detecting them is often straightforward, but deciding on the action to take needs manual intervention.

Another important aspect of data preparation is feature selection and extraction. While selection can be automated, extraction (through aggregation) needs understanding of the data. Finally, any data mining tool can automate the target variable detection. However, the final choice is left to the data miner, who knows the business problem to solve.

Modeling

This step is where we apply modeling algorithms to processed data. Among others, it involves selecting a data mining algorithm and tuning its parameters. This is certainly the task that can be the most easily automated. Some vendors claim that their tools can automate the model building process. The concept of testing several algorithms with different sets of parameters (tuning) can be automated to a certain extent. However, it supposes that there are enough data, that the choice of the algorithm is not business dependent (which is usually not the case) and that the evaluation criterion is known (see below).

data modeling

Cross Industry Standard Process for Data Mining (CRISP-DM)

Evaluation

In order to validate our data mining results, we need evaluation criteria. Although applying a criterion can be automated and different modeling algorithm can be compared, the choice of the criterion may be business dependent. In the case of forecasting, for example, different evaluation criteria exist such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Scaled Error (MASE). If we compare different forecasting algorithms on the same time series, we can use RMSE. If the goal is to compare different time series, MASE is more appropriate. This is business dependent and thus difficult to automate.

Deployment

In this phase, the goal is to transform our proof of concept or prototype into an industrialized solution. This step involves transforming our “one shot” project into a solution that can work with as few manual interventions as possible. Although standards such as Predictive Model Markup Language (PMML) are appearing, this step stills requires manual intervention. Questions such as where and how to integrate our data mining process within an overall solution/tool need to be explored.

As a conclusion, we have seen that most data mining steps from the CRISP-DM methodology cannot be automated and need manual intervention. Data preparation and modeling, to a certain extent, could be automated. However, as data mining professionals know, most of the effort in a data mining project concerns business and data understanding. Here is an excellent metaphor from Berry and Linoff (re-explained by David S. Coppock):

“The camera can relieve the photographer from having to set the shutter speed, aperture and other settings every time a picture is taken. This makes the process easier for expert photographers and makes better photography accessible to people who are not experts. But this is still automating only a small part of the process of producing a photograph. Choosing the subject, perspective and lighting, getting to the right place at the right time, printing and mounting, and many other aspects are all important in producing a good photograph.”

What about you? Do you think we can automate data mining?

TAGGED:automation
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

trusted data management
The Future of Trusted Data Management: Striking a Balance between AI and Human Collaboration
Artificial Intelligence Big Data Data Management
data analytics in ecommerce
Analytics Technology Drives Conversions for Your eCommerce Site
Analytics Exclusive
data grids in big data apps
Best Practices for Integrating Data Grids into Data-Intensive Apps
Big Data Exclusive
AI helps create discord server bots
AI-Driven Discord Bots Can Track Server Stats
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

cloud computing and robotics
Cloud ComputingITRoboticsSecurity

How Cloud Computing And Robotics Play A Role In Industrial Automation

8 Min Read
Image
AnalyticsBig DataPredictive Analytics

Predictive Analytics Presents: A Typical Day in 2020

7 Min Read
interpersonal skills in the age of AI
Artificial IntelligenceExclusive

Peak Irony: Interpersonal Skills In The Age of AI Are More Vital Than Ever

6 Min Read

Does information technology create or destroy jobs? Or is this even the right question?

8 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-24 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?