Getting More Out of Generative AI — Prompts, RAG, and the Age of Agents [Intro to ML #9]

In my previous post (Intro to ML #8), I covered how generative AI actually works — the next-word prediction mechanism behind LLMs, how RLHF turned raw language models into useful conversational tools, and how diffusion models generate images.

This post is about what to do with that understanding. My MBA data science course organized the ways to get more out of generative AI into four axes: ① prompting, ② RAG, ③ external tool integration (MCP), and ④ agents. The session closed with a strategic question: if generative AI is available to everyone, how do you build a competitive advantage with it?

The Butterfly Effect of Prompts

The first thing that surprised me was research showing what the course called the “butterfly effect” of prompting.

Small changes in how you phrase a prompt — adding a tip offer, adjusting the output format, rewording a single sentence — measurably affect the quality of the model’s response. Like a butterfly flap triggering a storm, tiny input differences produce large output differences.

The implication: two people using the same model can get very different results depending on how they prompt. The tool is commoditized; the skill of using it is not.

The RICE Framework: Structuring Your Prompts

The course introduced RICE as a practical framework for writing effective prompts:

  • R (Role): Assign the AI a role. “You are a world-class podcast producer.”
  • I (Instruction): Describe the task clearly. “Extract the most engaging and insightful content for a podcast discussion.”
  • C (Context): Provide relevant background. “The input may be unstructured text extracted from PDFs or web pages.”
  • E (Examples): Show what good output looks like. “Each line of dialogue should be under 100 characters.”

The distinction between zero-shot (no examples, just instructions) and few-shot (2–5 input/output examples) prompting also matters. When the task is straightforward, zero-shot works fine. When the output format or reasoning style needs to be precise, few-shot examples guide the model more reliably.

Chain-of-Thought: Why “Think Step by Step” Works

One of the most counterintuitive findings in the prompting literature is Chain-of-Thought (CoT): simply adding “Let’s think step by step” to a prompt measurably improves accuracy on reasoning tasks.

The reason connects directly to how LLMs work. Since the model generates text by predicting the next token, prompting it to produce intermediate reasoning steps means those steps become part of the context for subsequent predictions. The model “reasons through” the problem rather than jumping to an answer — and gets it right more often.

Few-shot CoT takes this further: provide examples that include the reasoning process, not just the final answer. Showing the model how to think about a type of problem improves generalization to similar problems. “Here’s the answer” is less powerful than “here’s how to arrive at the answer.”

RAG: Giving the Model Knowledge It Doesn’t Have

Even the best LLM can’t answer questions about information it was never trained on — your company’s internal policies, a product released last month, a confidential market report. For everything outside the training data, the model either guesses or says it doesn’t know.

RAG (Retrieval Augmented Generation) addresses this by adding a retrieval step. When a user asks a question, the system first searches an external knowledge base for relevant content, then appends that content to the prompt before sending it to the LLM. The model answers based on retrieved information rather than training-data recall alone.

In class, a live demo illustrated this clearly. A chatbot was asked about an upcoming assignment in the course. Without RAG: “I don’t have that information.” With the course syllabus loaded as a knowledge base: a precise, accurate answer. The same model, very different output.

McKinsey’s 2024 report argued that “the real value from LLMs comes from their ability to work with unstructured data” — the PowerPoint decks, meeting transcripts, and policy documents that most organizations have never been able to search systematically. RAG is how that value gets unlocked.

MCP: The USB-C of AI Tool Integration

Beyond knowledge retrieval, generative AI can be connected to external tools — search engines, code interpreters, databases, calendar systems. Traditionally, each connection required a custom API integration: separate authentication, separate error handling, separate maintenance.

MCP (Model Context Protocol) is designed to standardize this. The course described it as “USB-C for AI” — a single protocol that lets an AI system connect to diverse tools and data sources in a unified way, with dynamic tool discovery and bidirectional real-time communication.

Major LLMs including Claude are now MCP-compatible. The practical effect: a single AI interface can search the web, execute code, read files, and update a calendar — without custom glue code for each integration. I use MCP in my own workflow for writing these posts, and this session made clear why the approach works the way it does.

Agents: From Answering to Doing

Prompting, RAG, and tool integration all point toward the same destination: AI agents.

The shift being described is from “AI that answers questions” to “AI that gets things done.” Give an agent a goal rather than a specific instruction, and it plans the steps, uses tools, evaluates intermediate results, and adjusts — autonomously.

Masayoshi Son’s projection — “one billion AI agents by end of year” — signals where the industry is heading. Multi-agent systems, where specialized agents collaborate on complex problems, are already operational in some domains.

Andrew Ng’s framing at LangChain Interrupt 2025 cut through the definitional noise: “Rather than arguing about whether something is ‘truly’ an agent, let’s just say these are agentic systems with different degrees of autonomy.” The important question isn’t taxonomy — it’s how much autonomy to design in for a given use case.

The Strategic Question: Competitive Advantage with Generative AI

The session closed with a thought experiment: generative AI tools are available to everyone, including your competitors. So where does the advantage come from?

The inverse framing was equally useful: how could a competitor use generative AI to put you at a disadvantage?

My own answer: the tool itself isn’t the differentiator. What differs between organizations is proprietary data (the raw material for RAG), the quality of use case design (which problems to apply AI to and how), and the speed of the iteration cycle. The organizations that figure out the right applications fast and improve them continuously will pull ahead — not the ones that simply have access to the same frontier model.

What It Takes to Use Generative AI Well

The course instructor offered a closing thought on what humans need to bring to the table:

“It comes down to what you put in. This isn’t about one-shot answers — it requires judgment and dialogue.”

An LLM is a massive probability engine. The quality of what it produces depends on the quality of what you give it — which is a function of domain knowledge × thinking ability × communication skill. As models get more capable, the ceiling on output quality rises. But the gap between a well-constructed prompt and a lazy one rises with it.

Eric Schmidt said in a TED talk: “This is a marathon, not a sprint. Ride the wave every day. If you’re not using this technology, you will fall behind the people who are.”

I’ve been in the IT industry since 1997. That framing resonates. The question was never whether to adopt — it’s whether you’re building the judgment to use it well. That’s what this whole course series has been trying to develop.

Intro to ML #10 — What I Learned from a Six-Session MBA Data Science Course

Books to Go Deeper

For Thinking About AI Strategy and Competitive Advantage

広告