Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in ecommerce
    Analytics Technology Drives Conversions for Your eCommerce Site
    5 Min Read
    CRM Analytics
    CRM Analytics Helps Content Creators Develop an Edge in a Saturated Market
    5 Min Read
    data analytics and commerce media
    Leveraging Commerce Media & Data Analytics in Ecommerce
    8 Min Read
    big data in healthcare
    Leveraging Big Data and Analytics to Enhance Patient-Centered Care
    5 Min Read
    instagram visibility
    Data Analytics Plays a Key Role in Improving Instagram Visibility
    7 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Better than Brute Force: Big Data Analytics Tips
Share
Notification Show More
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Better than Brute Force: Big Data Analytics Tips
AnalyticsData MiningPredictive AnalyticsStatistics

Better than Brute Force: Big Data Analytics Tips

metabrown
Last updated: February 21, 2012 2:15 pm
metabrown
10 Min Read
SHARE

Long before I encountered the term “Big Data,” questions about dealing with large datasets were a routine part of my work. This situation was by no means unique. Government agencies and many businesses have been amassing repositories of detailed data for a long time.

Long before I encountered the term “Big Data,” questions about dealing with large datasets were a routine part of my work. This situation was by no means unique. Government agencies and many businesses have been amassing repositories of detailed data for a long time.

Consider, for example, the transaction history of a large retailer. A single transaction might include one item, or dozens. For each item, there may be several descriptors, such as a product ID and price. Besides the items purchased, there is the time at which the transaction took place, the register, the cashier, customer payment information and more. Each item corresponds to a lot, distributor, manufacturer, shipper and so on. The customer corresponds to another stream of information, covering previous purchases, loyalty program status and marketing history. This is adding up to a lot of data, and we’re still describing just one transaction.  A department store chain, grocer or big box retailer may handle a thousand transactions or more each day, year round, at each of hundreds, even thousands, of outlets.

More Read

Location Intelligence and Mobile BI: Advancing Customer Relations in the Finance and Banking Sector

Big Data: The Retailer’s Tool for Keeping Consumers On-Side and Happy
7 Great Benefits of Big Data in Marketing
TechAmerica and Big Data in the Public Sector
Is love for Twitter blind?

Lately, I have heard “Big Data” used to describe everything from baseball statistics, which are far smaller in scale than the retailer example, to online retailing and social media, where the data resources can become enormous. Big is relative, yet when people ask, “Can you handle our data?” they all have certain shared concerns.

The prospect who asks if you can handle a large volume of data is looking for reassurance, and proof that you have something of value to offer. Here are some thoughts behind the question…

It’s hard for us to store this much data, let alone do anything useful with it.

We’ve tried some things that didn’t work.

Some of the things that didn’t work crashed the system.

We spent a lot of money on the last thing that didn’t work. Spending a lot of money on another thing that doesn’t work could be a career-ending move.

Of course, your answer will boil down to “Of course we can handle your data!” but if you say it that way, most people won’t believe you, and for good reasons. They want an answer that addresses those unspoken concerns.

So what’s the right way to respond? Answer the question with questions. At least, that is the way to begin. (This is true whether you are a vendor, outside consultant, or an internal resource such as a business analyst.) First, ask about goals. What kind of questions does your prospect expect to answer? Why? How will the information be of use to the business? What’s the vision for integrating analytics into decision-making and business processes?

Why begin with these questions? Primarily to learn the answers, because the answers will guide you in everything else you ask and do. But there are other reasons as well. Asking questions about business goals helps your clients to become aware of gaps in their own reasoning and other challenges of meeting expectations. The process of describing goals can easily lead someone to a realization that the data doesn’t support their wants, and some redirection is necessary. You’ll be saving yourself and your client a lot of trouble by getting the big questions on the table from the start.

Your clients are, most likely, a smart bunch of people. They are experts in their own professions, not yours. They may be unsure of how to evaluate you and the services you offer. Asking questions in a respectful manner is a way to show that you care about them and what they do. If you have put serious thought into the questions you ask and listen to the answers, you’re very likely to uncover some issues of importance, including some your prospect had not previously considered. Your client develops an appreciation for what you know, what you can do for them. As you develop an understanding of needs, you prospect develops respect and trust for you.

While you ask questions about Big Data analysis goals, keep asking yourself how much data is required to address the client’s goals. Just because an organization has endless heaps of data doesn’t mean there is a reason to touch and feel every bit of it.

Is the client asking for information about every single person in the data – individually? Very often, that’s not the case. And if that’s not the case, handling the data becomes simpler. Focusing on particular segments? OK, then you need data relevant to those segments, not everybody and her mother. Is every field in the database relevant to your research? Are there open-ended text fields, and if so, do you need them? Narrowing the data to just the relevant cases and fields can easily reduce the volume of data by a factor of ten, or one-hundred, one-thousand… you get the picture.

If the goals center on understanding behaviors of large numbers of people (or transactions, or some other things) as a group, then you have discovered another fine opportunity to reduce scale. Because you don’t need to use every single individual to do a good job of describing the group. What you need is a sample. There are statistical approaches to sampling, and there are data mining approaches, use what suits your client’s needs and your own work processes.

If you don’t trust or use sampling, you don’t know drivel about data analysis; go back to school and stop wasting clients’ time and money.

What if your client really does need to be able to address every single case in a huge repository? This is certainly possible. Perhaps your client wants to rate the profitability or purchasing potential of every customer. Or maybe you’re dealing with insurance claims – which ones are potentially fraudulent? Tax payment information – who’s cheating? Situations like these do call for handling lots and lots of data.

Still, there are opportunities to minimize the resources required. Break big questions into small pieces, use sampling early and often, and educate yourself about the resource requirements for the things you do. Scoring a million cases using your predictive model, for example, may require less computing power than building the model itself on a sample of a few thousand cases. This information isn’t always easy to find; making nice with vendor tech support and you client’s IT staff may lead you to invaluable tricks of the trade.

Make it a point to minimize your demands on the data at every step of the analytic process. Exploring the data? If you’re looking to understand what’s typical, use a sample, not the whole dataset. Seeking the extreme and unusual? You’ll need more data to work with, but still may be better off with a subset of the data, at least to begin. You’re ready to build models? Often, you’ll get the best results by modeling segments one at a time. Throwing everybody into one giant equation is asking for a weak model. Take a sample of training data from a single segment and you can work faster, often producing more accurate models.

When you tell a prospect that you can handle the data, it shouldn’t mean you’ll do it by brute force. A little finesse yields stronger results with less strain on resources.

TAGGED:big data
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

trusted data management
The Future of Trusted Data Management: Striking a Balance between AI and Human Collaboration
Artificial Intelligence Big Data Data Management
data analytics in ecommerce
Analytics Technology Drives Conversions for Your eCommerce Site
Analytics Exclusive
data grids in big data apps
Best Practices for Integrating Data Grids into Data-Intensive Apps
Big Data Exclusive
AI helps create discord server bots
AI-Driven Discord Bots Can Track Server Stats
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

analyst,women,looking,at,kpi,data,on,computer,screen
Analytics

What to Know Before Recruiting an Analyst to Handle Company Data

6 Min Read
virtual assistant and big data
AnalyticsBig Data

How Virtual Assistants Use Data Analytics To Save Clients Money

8 Min Read
big data for social media
AnalyticsBig DataExclusiveSocial Data

5 Tools That Use Big Data For Social Media Optimization

7 Min Read
nature of proxy servers
Best PracticesBig DataData ManagementExclusiveITPrivacySecurity

Understanding The Nature Of Proxy Servers In The Big Data Age

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-24 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?