In my previous post (Intro to ML #3), I covered confusion matrices, AUC, and why accuracy alone is a misleading metric.
This time: what happens when you take a model and actually try to use it. This is where machine learning stopped feeling like a technical exercise and started feeling like a business problem.
A Classification Model Outputs Probabilities, Not Answers
To calculate a confusion matrix, you first need to set a threshold.
Here’s something that surprised me: a classification model doesn’t directly output “positive” or “negative.” It outputs a probability — a number between 0 and 1. Something like 0.65 or 0.12. Then you apply a threshold to convert that probability into a category.
Most tools default to 0.5 — above that, it’s classified as positive. Below that, negative. But that threshold is a choice, not a given. And where you set it changes everything:
- Higher threshold (e.g., 0.8): Only flag something as positive when very confident → Precision goes up, but you miss more cases
- Lower threshold (e.g., 0.2): Flag almost anything as positive → Recall goes up, but false alarms increase
Neither is universally better. The right threshold depends entirely on the business context and what kind of error you can afford.
Recall vs. Precision: Three Business Examples
Working through concrete scenarios made the recall-precision tradeoff much more intuitive.
① Smartphone iris authentication
The catastrophic error here is authenticating a stranger — letting the wrong person unlock your phone. You need everything flagged as “authenticated” to actually be the right person. → Prioritize precision.
② Home security sensor
The catastrophic error is missing an actual intruder. A false alarm is annoying, but an undetected break-in is far worse. → Prioritize recall.
③ Criminal investigation
This one has no clean answer. If the goal is solving cases and catching perpetrators → recall. If the goal is preventing wrongful conviction → precision. The “right” metric depends on which failure you consider more serious — and that’s a values question, not a technical one.
What stuck with me: the number doesn’t tell you which error matters more. That judgment belongs to people, not models.
Putting It Into Practice: What the Model Actually Did
With this in mind, I ran the same exercise from the previous session using peer-to-peer lending data.
Without any filtering: roughly 15% of loans default, resulting in a net loss of $220,000.
Using the model to identify and select only the top 25% of loans with the lowest predicted default probability — the default rate dropped to 6.74%.
The math:
$40M × (1 − 0.0674) × 1.17 ≈ $43.65M → A swing from −$220K to +$3.65M per year.
But here’s the important caveat: the model didn’t make that decision. A human decided to filter to the top 25%, set the threshold, and choose to accept the tradeoff of issuing fewer loans. The model provided the inputs; the judgment call was human.
A Model Isn’t “Done” When It’s Built
One more thing that stuck: a model degrades over time.
Once you deploy a model and start using it in the real world, its accuracy will eventually decline. Reasons include:
- Data formats change
- Business conditions or market behavior shift
- The historical data it was trained on becomes less representative
Coming from an IT infrastructure background, this was a genuine mindset shift. Systems you build are expected to keep working. Machine learning models require ongoing monitoring and periodic retraining. It’s less like “deploying a server” and more like “maintaining a garden.”
What I Learned: ML Raises the Quality of Human Judgment
Thresholds, recall, precision, model maintenance — working through all of this brought a single idea into focus: machine learning doesn’t automate decisions. It improves the quality of the inputs that humans use to make decisions.
Where you set the threshold. Which errors you’re willing to accept. Whether to prioritize revenue or risk reduction. These are judgment calls that require knowing the business context — and no model can make them for you.
AI isn’t building a world where we think less. It’s creating a world where we need to think more carefully about the right questions to ask.
→ [Intro to ML #5 — coming soon]
Books to Go Deeper
① For Understanding Business Decisions with Data
Data Science for Business — Foster Provost & Tom Fawcett (O’Reilly)
Written specifically for business professionals who want to understand how data science models work and how to use them in decision-making. Covers classification, regression, evaluation metrics, and the business framing of ML problems. The chapter on expected value and decision-making with classifiers is directly relevant to everything covered in this post.
② For the Business Strategy Side of AI Deployment
Competing on Analytics — Thomas H. Davenport & Jeanne G. Harris (Harvard Business Review Press)
A classic on how organizations actually build competitive advantage through data and analytics. Useful context for anyone thinking about the “so what” after you have a working model — how do you embed it into a business process, and what does it take to act on the output systematically?