Open Source Junkies: How Much Analytic Power Do You Need?

Data scientists need to justify the need for the incremental risk we assume when using more complicated methods to solve a problem - not be open source junkies

by Scott Zoldi

Chief Analytics Officer

September 2, 2021

I’m a big fan of the Ford Bronco. In addition to the trusty Bronco I take off-roading, I’m near the top of the 125,000-person waiting list for the new model. In my years of negotiating impossible inclines and boulder-strewn roads I’ve realized that driving a Bronco is a lot like solving analytics challenges — it’s counterproductive to use more horsepower than you need.

How Much Predictive Power Is Enough?

A wide variety of open source analytics tools are freely available to data scientists and students, all of whom can get carried away with contests on Kaggle. This well-known competition platform for predictive modeling and analytics is owned by Google, and its prevalence in the zeitgeist of the analytics community is itself a topic of concern. My particular issue is with Kaggle’s tacit encouragement to throw as much analytic horsepower as possible to solve its puzzles, whether or not such an approach would be appropriate in the real world.

An example of how this kind of analytic overkill leads to tainted results is the data dumping trope: pouring as many data sources as possible through a model to gain a tiny improvement in its predictive power, without understanding what new (and possibly meaningless) relationships are being learned, or considering the model complexity confluence.

Analytic overkill is a winner on Kaggle, but not in the real world. Here’s my thinking, as I put forward in my article for IOT Agenda:

I have a belief that’s unorthodox in the data science world: explainability first, predictive power second, a notion that is more important than ever for companies implementing AI.

AI that is explainable should make it easy for humans to find the answers to important questions including:

Was the model built properly?
What are the risks of using the model?
When does the model degrade?

Rehab for Open Source Junkies

“Open source junkies” is the term I have for data scientists who are addicted to using excessive analytic power to solve any problem. The good news is there is a straightforward path to rehab. As expressed by AI industry luminary Andrew Ng, the idea is, “Always start with the simplest technology and then justify why you have to get more complex.” Along those lines, the model design questions we need to ask ourselves are:

How well do we understand the problem we are solving? Should we be speaking with the business to get key insights to design the model?
What are the appropriate data sources to include? What key variables / features would we derive from those sources?
How performant is our simplest model, say a regression? Does it meet the business requirements? What are the drivers of this model?
As we add complexity to the model what do we gain in prediction, and lose in explainablity? Robustness? Ethics?
Should we leap to interpretable machine learning models?

Essentially, we need to justify the need for the incremental risk we assume when using more complicated methods. As data scientists we need to ask: What are we trying to achieve, what are the right technologies to get us there, and what are the tradeoffs? Unacceptable trade-offs include GDPR violations and AI that is not ethical.

Education Is Key

Back to my Bronco analogy — if I see a hill full of boulders and want to try to drive up it, I know I can. But what is the line of course I will choose? I will go slow and steady to make my way up the hill and over the boulders, cool as a cucumber, and not gun the engine. The hot-doggers, the drivers maxing out their Bronco’s horsepower on challenging terrain, are the ones who flip over, wipe out and otherwise wreck their vehicles. In these conditions, slow is fast — and smart! When it comes to building proper artificial intelligence and machine learning technologies, slow is fast, too.

That brings us back to the importance of training. Data scientists need to have a broader perspective not just about data science, but the business and social context in which their work will be used. In my role on the Executive Board of the Jacobs School of Engineering at UC San Diego, I do my very best to connect the theoretical world with the real world. Won’t you join me on this journey?

Follow me on Twitter @ScottZoldi and on LinkedIn to keep up with my latest thoughts on delivering analytic innovation in the real world.

Scott Zoldi

Dr. Scott Zoldi is chief analytics officer at FICO, responsible for artificial intelligence (AI) and analytic innovation across FICO's product and technology solutions. Dr. Scott Zoldi has been listed as an inventor on 122 AI & software patents, in collaboration with other data and analytic scientists, and he is also named on an additional 40 patent applications in process. Scott is an industry leader in the responsible use of AI, Generative AI (GenAI), and Agentic AI, as well as an outspoken proponent of AI governance and regulation. His groundbreaking work in focused language models (FLMs) for GenAI and a patented use of blockchain technology for AI model development governance has helped propel Scott to AI visionary status. His recent awards include Constellation Research Award AI150, Tech Leadership Award from Banking Tech Awards, Tech Influencer Highly Commendable Award from DataIQ Data & AI Awards, San Diego Business Journal - Leaders of Influence in Technology (2025); Tech Leadership - Software & Services Provider from Fintech Futures, MachineCon AI100 Award, Innovator Award from American Banker (2024); Global Finance Innovator Award (FICO) (2023); and Corinium Future Thinking Award (2022). An enthusiastic member of the southern California tech community, Scott serves on the Boards of Directors of Software San Diego and the San Diego Cyber Center of Excellence. He received his Ph.D. degree in theoretical and computational physics from Duke University, and his work has been published in The Harvard Business Review and numerous scientific journals.

When not at his office or on a plane, Scott can often be found in his Ford Bronco, exploring the desert around San Diego with his family. To hear more of his views follow Scott on LinkedIn and BlueSky @ScottZoldi.

See all posts

Blog home

Take the next step

Connect with FICO for answers to all your product and solution questions. Interested in becoming a business partner? Contact us to learn more. We look forward to hearing from you.

Open Source Junkies: How Much Analytic Power Do You Need?

Data scientists need to justify the need for the incremental risk we assume when using more complicated methods to solve a problem - not be open source junkies

How Much Predictive Power Is Enough?

Rehab for Open Source Junkies

Education Is Key

Scott Zoldi

Has the Reporting of Rental Data to the Credit Reporting Agencies (CRAs) Increased?

Average U.S. FICO® Score at 716, Indicating Improvement in Consumer Credit Behaviors Despite Pandemic

FICO Statement on FHFA and FHA Updates to Credit Score Modernization

Take the next step

How Much Predictive Power Is Enough?

Rehab for Open Source Junkies

Education Is Key

Scott Zoldi

Popular Posts

Has the Reporting of Rental Data to the Credit Reporting Agencies (CRAs) Increased?

Average U.S. FICO® Score at 716, Indicating Improvement in Consumer Credit Behaviors Despite Pandemic

FICO Statement on FHFA and FHA Updates to Credit Score Modernization

Take the next step