Site Icon Matthew Raynor

I Built an AI Shopping Assistant That Sees Your Walls

Published on February 16, 2026

I Built an AI Shopping Assistant That Sees Your Walls

I Built an AI Shopping Assistant That Sees Your Walls

My photography store sells fine art prints of the Hamptons — aerial shots of beaches, lighthouses, harbors, coastline. The kind of work that ends up on walls in beach houses and Manhattan apartments. The store needed more than a product grid. It needed a way for customers to find photos by feeling, not by filename. And it needed to answer the question every art buyer asks: "How will this actually look in my space?"

So I built an AI shopping assistant with 14 tools, semantic search powered by pgvector, and a computer vision pipeline that detects walls in customer room photos and composites prints at correct scale. Here's how it works.

The AI Pipeline: From Photo to Searchable

Before the shopping assistant can help anyone, every photo in the catalog needs to be understood by a machine. Raw image files don't know they contain "a serene aerial view of a beach at sunset." So I built a two-stage pipeline that turns photographs into searchable, describable products.

Stage 1: Claude Vision generates metadata. A Django management command (generate_photo_descriptions) sends each photo to Claude's vision model. Claude analyzes the image and returns structured JSON: a natural language description, dominant colors, mood tags, subject identification, and room suggestions. A photo of Montauk Lighthouse comes back with colors like "steel gray" and "ocean blue," mood tags like "dramatic" and "solitary," subjects like "lighthouse, waves, rocky coast," and room suggestions like "living room, office, coastal home."

This metadata gets stored directly on the Photo model — ai_description, ai_colors, ai_mood, ai_subjects, ai_room_suggestions. It's not hidden in some separate AI table. It's part of the product, because it IS the product data. Claude sees the photograph the way a gallery curator would describe it to a customer.

Stage 2: OpenAI generates embeddings. A second management command (generate_photo_embeddings) takes the AI-generated description and metadata for each photo and creates a vector embedding using OpenAI's text-embedding-ada-002 model. The text that gets embedded is a composite: the description, colors, mood, subjects, location, and collection name — all concatenated into a single string that captures the full semantic meaning of the photograph.

These embeddings are stored in PostgreSQL using pgvector, a Postgres extension that adds vector similarity search. Each photo's embedding is a 1536-dimensional vector sitting right next to its title, price, and inventory in the same database. No separate vector store, no external service, no synchronization headaches.

The result: when a customer says "something calm and blue for my bedroom," the system generates an embedding for that query and finds the photos whose embeddings are closest in vector space. Not keyword matching — semantic matching. "Calm and blue" surfaces aerial ocean shots, peaceful harbor scenes, and twilight beach photos, even if none of those words appear in the titles.

The Shopping Assistant: 14 Tools

The assistant is built with LangChain and Claude. It's not a retrieval chatbot that spits back FAQ answers — it's an agent with tools that take actions on the store. Fourteen of them.

Search and browse tools: search_photos_semantic does the pgvector similarity search. search_photos_filter handles structured queries — filter by collection, orientation, price range. get_photo_details returns everything about a specific photo including all available sizes and prices. get_collections lists what's in the store.

Cart management tools: add_to_cart, get_cart, remove_from_cart, update_cart_item. The agent can build a customer's cart through conversation. "Add that one in 30x40 aluminum" becomes a tool call that creates a real cart item in the database. start_checkout initiates a Stripe Checkout session and returns the payment URL.

Room visualization tools: analyze_room_image triggers the ML wall detection pipeline on a customer's uploaded room photo. generate_mockup composites a specific print onto the detected wall at the correct physical scale. The customer uploads a photo of their living room, and the agent shows them exactly how a 40x60 aluminum print would look above their couch.

Utility tools: get_sizing_info returns recommendations based on wall and furniture dimensions. track_order looks up order status. check_gift_card validates balance.

The agent doesn't just answer questions — it moves customers through the funnel. Browse, discover, visualize, add to cart, checkout. Every step happens in conversation.

The "See In Room" Pipeline: MiDaS + RANSAC

This is the piece that took the most engineering. A customer uploads a photo of their wall. The system needs to figure out where the wall is, how big it is in real-world units, and then composite a print at physically accurate scale. That's three separate ML/geometry problems.

Step 1: MiDaS depth estimation. MiDaS is a monocular depth estimation model — it takes a single 2D image and predicts the relative distance of every pixel from the camera. The output is a depth map where walls (flat, consistent distance) look different from furniture, floors, and ceilings. I run MiDaS as a Celery task because inference takes 30-60 seconds, and you don't want that blocking a web request.

Step 2: RANSAC wall plane detection. Given the depth map, I need to find the dominant flat surface — the wall. RANSAC (Random Sample Consensus) is a classic algorithm for fitting a model to noisy data. It repeatedly samples random points from the depth map, fits a plane to them, and checks how many other points are consistent with that plane. The plane with the most inliers wins. The output is wall bounds — the region of the image that represents a flat wall surface, with a confidence score.

Step 3: Scale calculation. The wall bounds give us pixel coordinates, but we need real-world inches to place a print correctly. The system calculates a pixels-per-inch ratio using the wall bounds and a configurable ceiling height (default 8 feet, adjustable via a slider). A 30x40 inch print should take up the right proportion of the wall regardless of camera distance or angle.

The frontend then composites the print image onto the wall photo using a canvas element. The customer can drag the print to reposition it, switch between different photos and sizes, and see immediately how each option looks. If the wall detection confidence is low (under 0.3), the system falls back to manual placement — the customer draws the wall bounds themselves.

The whole pipeline runs as: upload triggers Celery task, task runs MiDaS + RANSAC, results stored in WallAnalysis model, frontend polls for completion, then renders the interactive mockup editor. Old analysis records are cleaned up after 24 hours to manage S3 storage.

Why pgvector Over a Vector Database

I could have used Pinecone, Weaviate, Qdrant, or any of the purpose-built vector databases. I used pgvector instead because the catalog has dozens of photos, not millions. PostgreSQL with pgvector means the vector embeddings live in the same database as orders, carts, customers, and product variants. One connection pool, one backup strategy, one operational concern.

The query is a single SQL call with pgvector's cosine distance operator. It joins against the Photo table to filter by is_active=True and returns results with similarity scores. The embedding dimension (1536 from ada-002) is well within pgvector's capabilities at this scale.

If the catalog grew to hundreds of thousands of images, I'd revisit this. But for a luxury art store with curated collections? pgvector is the right tool. Adding Pinecone would mean another service to deploy, another API key to manage, another failure mode to handle, and synchronization logic to keep vectors in sync with the product database. For zero performance benefit at this scale.

The Conversation Model

The agent stores conversations in the database — a Conversation model with related Message records. Each message has a role (user, assistant, tool, system), content, and optional fields for tool calls and image URLs. This means conversation history persists across page reloads. A customer can ask about a photo, leave, come back, and the agent remembers the context.

The agent runs in a loop: build the message chain from conversation history, call Claude with all 14 tools bound, check if the response contains tool calls, execute them, append the results, and loop until Claude returns a text response. Tool results are streamed back to the frontend as SSE events so the customer sees real-time activity — "Searching photos...", "Adding to cart...", "Generating mockup..."

One design decision worth noting: the agent's system prompt includes all pricing, sizing guides, and store policies directly. The prompt knows that paper prints ship in 5-7 days and aluminum prints in 14-21 days. It knows the sizing recommendations for different wall widths. It knows the return policy. This is the same pattern I used for ToteTaxi — when the knowledge base is small enough to fit in the context window, a system prompt beats RAG every time.

Security Considerations

The mockup tool accepts image URLs from customers. That's an SSRF (Server-Side Request Forgery) vector — an attacker could try to get the server to fetch internal resources. The analyze_room_image tool validates that URLs come from our own S3 bucket domain before processing. Anything else gets rejected.

Gift card balance checks return the same generic error for invalid codes, expired codes, and nonexistent codes. No enumeration possible — an attacker can't tell whether a code exists but is expired versus never existed.

Cart operations are scoped by cart ID, which is tied to the session. No cross-session cart access.

The Stack

Backend: Django 5, Django REST Framework, PostgreSQL 16 with pgvector, Redis, Celery Frontend: Next.js 15, React, TypeScript, Tailwind CSS AI: Claude API (vision + chat agent via LangChain), OpenAI (text-embedding-ada-002 for embeddings) ML: MiDaS (depth estimation), RANSAC (plane fitting) Payments: Stripe Checkout Storage: AWS S3 (photos, room uploads, mockup renders) Email: Resend (transactional), MailerLite (newsletter) Hosting: Railway (backend + Celery worker + Redis), Netlify (frontend)

What Makes This Different

Most e-commerce AI is a chatbot that searches products by keyword and links to product pages. This agent operates the store. It searches semantically, manages the cart, visualizes purchases in the customer's actual room, and initiates checkout — all through conversation. The customer never needs to navigate the site if they don't want to.

The Claude Vision metadata pipeline means every photo is described the way a human would describe it, and searchable by meaning rather than tags someone manually entered. The MiDaS + RANSAC visualization pipeline means customers can answer "how will this look?" before they spend $2,000 on an aluminum print. Those two capabilities together reduce the friction that kills luxury e-commerce conversions — uncertainty about what you're getting and how it'll work in your space.

The store is live at store.matthewraynor.com. Try the assistant — ask it for "something dramatic for a large living room wall" and see what it finds.


I'm a software engineer building production AI systems. This is one of five live applications in my portfolio, including a logistics platform with its own AI agent (ToteTaxi), an enterprise computer vision system (IDP EasyCapture), and more at matthewraynor.com.

← Back to Blog

Comments

No comments yet. Be the first to comment!

Leave a Comment