I moved from being a data observer to a data architect. I built my first Artificial Intelligence model using a concept called Linear Regression. It sounds complex, but it’s actually beautiful in its simplicity: I’m teaching the computer to draw a ‘Line of Best Fit’ through 17,000 data points.
I focused on the relationship between median income and house value. The most critical lesson? The ‘Train-Test Split.’ I learned that you must keep some data secret from the AI a ‘final exam’ of sorts to prove it has actually learned the patterns rather than just memorizing the spreadsheet. When I plugged in a random income level and watched the AI spit out a price prediction within seconds, the ‘magic’ of AI finally felt like math I could control. I’m no longer just looking at a map; I’m building a tool that can see into the future of the market
In my last blog, I looked at a heatmap of California and saw the “heat” of high prices along the coast. This week, I decided to stop observing and start predicting. I built my first actual Artificial Intelligence model: Linear Regression.
The Core Concept: The “Line of Best Fit”
Linear Regression sounds like a mouthful, but it’s actually a concept we all learned in high school: y=mx+c.
In the world of real estate, y is the price we want to know, and x is a feature we already have, like median income. The AI’s job is to look at 17,000 houses and find the perfect “line” that passes as close to all of them as possible. Once it finds that line, you can give it any income level, and it can “guess” the house price instantly.
The “Final Exam” (Train-Test Split)
The most important lesson I learned this week wasn’t about math; it was about trust. You can’t just show an AI a dataset and then ask it to predict those same houses that’s just memorisation.
In my Google Colab lab, I used a technique called the Train-Test Split. I hid 20% of the houses in a “locked drawer” (the Test set) and let the AI study the other 80% (the Training set).

The Moment of Truth
After the “training” was finished, I gave the AI a median income of $50,000 for a neighborhood it had never seen. Within milliseconds, it predicted a house value of $253,587.
Comparing its guesses to the actual “hidden” prices in my test set felt like magic, but it’s actually just high-speed geometry.

Multiple Linear Regression
By adding total_rooms and housing_median_age, you are giving the AI more “context.” Think of it like this: If I told you a neighborhood’s income was $100k, you’d guess the houses are expensive. But if I then told you the house was 100 years old and only had 1 room, you’d probably lower your estimate.
By adding more columns, you are teaching the AI to weigh multiple factors at once.
Guthub Repo: https://github.com/ankitsrivastava/ai-foundation-series/blob/main/AI_foundation_Series.ipynb?short_path=5388780


