
In December 2014 I wrote a post titled Finding the elusive big wisdom in big data. We have made big advances since then in storing, processing and generally handling large volumes of data in a digital context, but humans still have trouble when it comes to reliably finding relevant nuggets that can improve business outcomes. I’m revisiting the idea to see how far we’ve come and how far we still have to go.
Baseball is one of the best examples of how data can impact a business. The book and movie “Moneyball” showed how Oakland Athletics’ GM Billy Beane changed the sport by introducing advanced statistical analysis instead of simply relying on reports from human scouts. Today, baseball is ruled by analysts as much as experienced baseball professionals — but could there be too much data?
Alex Spier writing in the Boston Globe’s Sunday Baseball Notes column recently pointed out that Boston Red Sox Manager Alex Cora will have 11 coaches on staff this year for 26 players. Compare that with Terry Francona in 2011, who had six for 25, and you can see the number has nearly doubled.
Spier attributes this in part to the growing amount of data that teams are collecting, which requires more people to observe, interpret, and implement a plan to leverage. As Spier wrote, “The result? Several teams now feature three hitting coaches, and staffs keep growing in an effort to distill the mountains of information into a digestible form for the 26 players on a roster.”
Baseball is like a laboratory for advanced statistical analysis, and businesses could learn a lot from watching how the sport deals with expanding datasets.
Companies shovel data into data lakes, tinker with machine learning models, and use this information to make decisions, but they still face fundamental decisions every day that involve their ever-growing mountain of information. Sure, machines and software can help sort it out — indeed, they are getting better at it — but anyone who has dealt with poorly targeted advertising or email marketing knows there’s much work to be done.
Deepak Jeevankumar, managing director at Dell Technologies Capital, says the difference in play now compared to 2014 is that companies now are focused on getting insights to the people who need it faster. “Big Data has become less relevant than ‘fast data.’ E-commerce shoppers, streaming media consumers, gamers, stock/crypto traders and enterprise marketeers are demanding fast insights and fast knowledge,” Jeevankumar told me.
He believes data needs to be analyzed at its source while streaming, and in real-time after a query is created, and that startups that create solutions that allow companies (and baseball teams) to do this will be more successful in the long run.
What’s more, companies that can achieve real-time data synthesis are going to attract more attention from investors. “Startups that analyze and extract insights from fast data have garnered extremely high public and private market investor interest,” he said, pointing to companies like Confluent and Databricks as prime successful examples of this approach.
Baseball coaches need to collect data and distill it for their players, and business analysts and managers need to do the same so their employees can make business decisions quickly and efficiently, based on accurate information.
Pam Baker, a journalist who has written two books on data (the latest, “Decision Intelligence For Dummies,” coming out soon), says there is good news and bad news for today’s world of ever-expanding datasets. “It is harder, much harder, in fact. That doesn’t mean we need less data, it means we need new and better approaches to finding answers. That’s why now there is movement away from data mining and toward decision intelligence,” Baker told me.
She says that this approach allows you to search based on the answers you are seeking, rather than sifting through mounds of data hoping to find a nugget that’s useful to you. That means querying the data with good questions and coming up with meaningful results.
You’d be right to think that asking targeted questions could create confirmation bias risk, but Baker is careful to distinguish between bias and solid analysis. “We’re not talking about gathering data to support a cognitive bias, but gathering data that is relevant to the research or question. For companies, it is the difference between being a data-driven organization and a decision-driven organization,” she said.
Baker believes that while we have made great strides with tooling, there are areas that haven’t progressed very far since my 2014 article — like predictive analytics that posits that you may be interested in an engagement ring because you bought one last spring, which probably isn’t the case.
“Predictive analytics project a future outcome based on past outcomes and assumes nothing has changed. The problem is that human behaviors just aren’t that clean and neat,” Baker explained. “We still have work to do in building nuance — more context — into our analytics. But that also can be more intrusive: Do you really want marketers to know who you bought that ring for and why? Everything is a trade-off.”
Big data analysis has indeed advanced in the seven years since I wrote that article. The underlying infrastructure, technology and the data analysis tools have all improved dramatically, but it’s by no means a problem solved. As baseball has shown us, it’s easy to get caught up in the minutiae of data, and to forget that in the end, in spite of all that tech, it’s about humans executing on the data. We can’t lose sight of that.