AI Foundation Series: The Day My Model Learned to Say “Yes” or “No”

Last week, my model predicted a house price of $253,587 and I felt like a genius. I gave it an income level, it drew a line, and boom a number came out the other side. Beautiful. Satisfying. Done.

Then I asked myself a question that broke everything.

What if I don’t want a number? What if I just want to know, will this house sell quickly, or not?

That one question pulled me into a completely different corner of Machine Learning called Classification. And the first tool I picked up there was something called Logistic Regression.

Yes, I know. It has the word “regression” in it. It confused me too.

The Problem with Drawing Lines

Linear Regression is brilliant at one thing: predicting a value on a continuous scale. Price. Temperature. Age. Things that sit on a number line.

But a lot of the world doesn’t work that way. Sometimes the answer is binary. Spam or not spam. Fraud or not fraud. Will this patient need surgery or not. Approved or rejected.

When I tried to use my old Linear Regression model to answer “will this house sell above its median value?” something ugly happened. The line it drew kept spitting out values like 1.4 or −0.2. Those aren’t valid answers for a yes/no question. The model was technically doing its job, it just had no idea it was being asked the wrong question.

This is where Logistic Regression steps in.

The “S-Curve” That Changed Everything

Instead of drawing a straight line through the data, Logistic Regression draws an S-curve, mathematicians call it a Sigmoid function, but honestly, just picture a lazy S stretching from 0 to 1.

The genius of this curve is that no matter what number you throw at it, the output always lands between 0 and 1. And that range? That’s a probability.

Output closer to 1 → “Yes, it will sell above median”
Output closer to 0 → “No, it probably won’t”

Somewhere in the middle, you pick a threshold usually 0.5 and that becomes your decision boundary. Above 0.5, the model says yes. Below 0.5, it says no.

It sounds almost too simple. But I opened up my Google Colab notebook, loaded the same California housing dataset I’ve been working with for weeks, and started building.

What I Actually Built

I flipped the problem. Instead of predicting a house price, I created a new column in my dataset: above_median. If a house’s value was above the state median, I marked it 1. Below? 0.

Now I had a target that was binary. Clean. Yes or no.

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

# Create binary target
median_price = df['median_house_value'].median()
df['above_median'] = (df['median_house_value'] > median_price).astype(int)

# Features and target
X = df[['median_income', 'total_rooms', 'housing_median_age']]
y = df['above_median']

# Same train-test split as before
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

The model came back with an accuracy of around 0.77 meaning it correctly classified about 79% of houses it had never seen before.

That felt like a moment.

The Confusion Matrix (Which Confused Me at First)

Here’s the thing about accuracy: it lies sometimes. Imagine a dataset where 95% of houses are below median. A model that just always says “no” would be 95% accurate without learning anything at all. That’s cheating, and it happens more than you’d think.

So I dug into something called the Confusion Matrix.

[[True Negatives,  False Positives]
 [False Negatives, True Positives]]

When I printed mine, I could see exactly where my model was getting it wrong. It was pretty good at identifying “no” (below median) but occasionally let above-median houses slip through as negatives False Negatives.

In real life, that kind of mistake matters differently depending on context. A false negative in a spam filter means one piece of junk mail in your inbox. A false negative in a cancer screening means something far worse. This was the moment I understood that accuracy alone is not the whole story you need to think about what types of errors your model is making.

What’s Actually Happening Under the Hood

Here’s the intuition I kept coming back to: Logistic Regression is asking, for every house, “how far is this from the boundary that separates cheap from expensive?”

It’s still doing geometry. It’s still using those weighted features income, rooms, age to calculate something. But instead of projecting that calculation onto a straight line, it’s running it through the sigmoid function to squash the answer into a probability.

The model is essentially saying: “Based on what I’ve seen in 80% of your dataset, a neighborhood with a median income of $80k, 2,000 rooms, and houses around 20 years old has roughly a 0.88 probability of being above median value. I’ll call that a yes.”

That’s not magic. That’s just math that’s been learned from patterns.

The Bigger Realisation

What hit me after this session wasn’t about the code. It was about framing.

Linear Regression answers “how much?” Logistic Regression answers “which one?”

And once you understand that split, you start seeing classification problems everywhere. Your email inbox. The bank flagging your card overseas. A Netflix algorithm deciding whether you’re a thriller person or a romcom person. A hospital system triage tool.

All of them are asking yes/no questions. All of them are drawing that S-curve somewhere.

Next Week

My model can now make decisions, not just predictions. But right now it only draws one straight boundary line between classes. What happens when the boundary isn’t a line at all — when the pattern in the data is more like a curved or jagged shape?

That question is going to take me into Decision Trees where instead of math, the AI starts asking a series of “if this, then that” questions, like a flowchart that built itself. And honestly, it might be the most intuitive thing I’ve learned so far.

This is part of my AI Foundation Series an honest, 8-hours-a-week account of learning AI from scratch. No background in computer science. Just curiosity and Google Colab.

Github Repo: https://github.com/ankitsrivastava/ai-foundation-series/blob/main/AI_foundation_Series.ipynb?short_path=5388780