Three Low-Hanging Computer-Vision Features Zillow Should Ship Tomorrow

Zillow quietly hosts the biggest labeled photo dataset in U.S. real-estate history: 100+ M listings × ~30 photos each ≈ 3 B images—all GPS-tagged, time-stamped, and paired with MLS text. Yet the core UI still looks like 2012 Craigslist. With a sprinkle of computer vision¹, Zillow could unlock new filters (no pool, no carpet, cul-de-sac), a per-listing similarity score, greatly increasing user engagement for just cents per image.

Increased Filtering Capability:

For a quick win, Zillow should add the following filters:

No pool: There’s a “Must have pool filter”, but no “No Pool” Filter. It seems like a pretty obvious oversight! With a young child, I don’t want to have a pool.
Floor types (i.e. exclude any carpet): Flooring seems complex, but a quick negation of “Carpet/Tile/Wood” would be a quick win.
On a cul-de-sac: Why can’t I filter for a house not on a through-road? Zillow has the address and can see, what if I don’t want to live on a through road?

Technical Implementation

Based on current pricing, GPT-4o-mini costs $0.15 per million input tokens and $0.60 per million output tokens. For image analysis, each photo would require roughly 1,000-1,500 input tokens (for the image encoding plus prompt) and generate 50-100 output tokens for the JSON response identifying features like pools or carpet. Processing Zillow’s entire 3-4 billion image² corpus would cost between $540K-$855K as a one-time expense, with ongoing costs of about $0.00018 per new image.³

However, a hybrid approach would be far more cost-effective: use GPT-4o-mini to label 100K training images for just $18, then train lightweight ResNet50 binary classifiers that can run inference at under $0.00002 per image. That’s 9X cheaper than the LLM approach! This would reduce the one-time processing cost to around $60K plus training compute, and daily costs to just $60 (depending on the number of images processed). For Zillow’s scale, this hybrid strategy delivers the same accuracy while saving hundreds of thousands of dollars, making it the best option for production deployment.

House Similarity Score

Every house listing contains 20-30 photos, structured data (beds, baths, sq ft), and unstructured text. Zillow processes this data but never computes similarity between listings. This is a massive missed opportunity.

A house with 95% similarity but priced 20% higher immediately flags as either:

Overpriced (negotiation leverage)
Has hidden value (better schools, recent renovations not visible in photos)

In Phoenix, where the median home price is $450K, identifying even 5% pricing inefficiency saves buyers $22,500. Multiply that by Zillow’s 4 million active shoppers can lead to a significant increase in market efficiency. Hypothetical Example: Two 3-bed colonials on Maple Street. Same year, same square footage. Lot sizes are close, but one recently repaired their AC and Roof and so is charging $30K more. Current Zillow: you’d never know they’re otherwise identical. With similarity scores: instant context for pricing. You can now go ask a relator and figure out what is causing that price increase.

What’s Stopping Them (or what could go wrong?)

Two things that could potentially stop this from happening are:

Institutional Barriers: Zillow is a large company with a lot of inertia. Although they have a strong data science team (and have done some simliar work before!), they’ve been burned before by AI, and so why would they want to take the risk of destroying the golden goose?
MLS Policies: Zillow gets photos through MLS agreements that may restrict any new computational analysis.

Of note: Zillow’s competition has similar features with Redfin already experimenting with visual search, and Airbnb has had ‘similar listings’ since 2017.

A Better House Hunting Experience

Opening Zillow and immediately seeing only homes that match your exact lifestyle needs is a game-changer. No more scrolling past pools you’ll never use or carpet that triggers your allergies

Overall, there’s a strong framework of using data to continue to build a moat. By being able to enhance Zillow’s product offering, it can deepen ties and make housing search easier for end users.

By way of GPT-4o-mini or an open-source ViT ↩︎
160M homes * 20-30 photos on average is around 3.2B-4.8B images ↩︎
Not sure how many photos are updated daily, and I couldn’t get any good statistics on it. I also winged the math via an LLM, feel free to tell me I’m wrong ↩︎

← Introduction to PodDigests.com

Anthropic's Marketing Plan →