System Active // V3.0.0
LOC: 23.8103° N
All posts
Jun 18, 2026·8 min read

VibeSell: The Capstone We Couldn't Stop Over-Engineering

StoryAI AgentsRAGSystem Design

For our final-year capstone, my team and I built VibeSell — an AI sales agent that plugs into Facebook Pages and Messenger and handles a shop's customer conversations on its own. Answer product questions, search the catalog from a photo, take orders from a rambling message, the whole loop.

The twist that made this more than a class project: it was graded on production standards. CI/CD, Docker, monitoring, real test coverage — the rubric wanted a product, not a demo. So we built it like one. What follows is the honest version of that journey: the wins, the hacks, and the one blunder that cost us "best project."

🎥 Watch the system-design walkthrough: VibeSell architecture explained

Where we started

A bit of context on the three of us. By final year we could all build solid CRUDs, we knew our way around basic deployment, and we'd seen CI/CD pipelines before. Jaber was our strongest with AI; Jakaria was sharp on the backend and frontend. Nobody on the team had shipped a tool-calling agent or a Facebook integration before, though — so most of this was learned in the deep end, on a deadline.

Challenge #1 — Getting Facebook to talk to us

The first wall was just connecting to Facebook: creating an app, getting it live with the terms-and-conditions hoops, and figuring out which Graph API calls read messages, send messages, and post comments.

The API docs themselves were straightforward. The tokens were the nightmare. You open a developer app, then a business account, and then there are two or three different kinds of access token depending on what you're doing — and the "right" path through all of it is genuinely hard to find. I dug for hours. Eventually a random YouTube tutorial became my saviour and the whole thing clicked into place.

With auth sorted, I wired up a webhook to listen for incoming messages. For a quick proof of concept I prototyped that flow in n8n just to see events landing in real time before committing it to our own backend.

Challenge #2 — The 15-minute CI/CD "hack"

Then came our second course checkpoint: deployment. And here's the embarrassing-but-honest part. By this point everyone on the team had about a year of professional experience, and we still fell into the classic trap — we kept piling on features and left deployment, CI/CD, and Docker for "later."

"Later" arrived in the lab, where I had roughly 15 minutes to produce a working pipeline. I started writing the GitHub Actions config as fast as I could type, and of course it kept failing. So I reached for a move I'm not entirely proud of — but I'd call it engineering your way out of a corner more than cheating:

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main, develop ]
  workflow_dispatch:

jobs:
  test:
    name: Run Tests
    runs-on: ubuntu-latest
    continue-on-error: true

That one line — continue-on-error: true — tells the job to report success even when it fails. The Actions dashboard lit up green across the board, every stage a satisfying checkmark, while underneath the tests weren't actually gating anything yet. We got full marks on the CI/CD checkpoint.

I want to be clear that this isn't where the story ends. The point of a green pipeline is that it means something, and a few weeks later (checkpoint three) we came back and made it real — proper test jobs that genuinely fail the build. But in that moment, with the clock running, it was the right hack to survive the lab. Sometimes shipping the checkmark buys you the time to earn it.

Challenge #3 — My first real agent

Next was something I'd never done: building an agent that calls tools and runs in a loop. The first version had a small toolbox — database lookups plus the send/read-message calls I'd built on top of the Facebook webhook.

Then came the RAG layer. We loaded the shop's products into an inventory, vectorized them, and suddenly a customer could ask for something in plain language and the agent would actually surface the right items. The first time a natural-language product query came back with the correct products, it genuinely felt like magic.

The part nobody warns you about — real customer chat is messy

Here's where theory met reality. Onboarding helped: we partnered with a real e-commerce page and ran VibeSell against their actual customer traffic for testing. Real users do not behave like your test cases.

  • They message all at once, so you're handling many conversations concurrently.

  • They break a single thought into three separate sends:

    I need this book.

    Do you have it?

    What is this project?

  • They write in Bangla, in Banglish (Bangla typed in English letters), and they ask genuinely difficult, ambiguous questions.

Taming this took a lot of prompt iteration to get the agent's behavior right, including making it pause and wait for a person to finish their thought instead of firing off three half-answers. Then, to handle the concurrency, I put a queue in front of everything: BullMQ on Redis. Every webhook message lands in the queue, and a pool of agent worker instances pulls from it and works through the conversations in parallel. After that, the messaging side was basically solved.

Then users started sending photos

Just as we caught our breath, customers began sending images — "do you have something like this?" with a picture attached. Jakaria jumped on this and brought in a CLIP model to embed the incoming image and search for visually similar products in our inventory, then suggest the closest matches. Image search done.

The last piece — orders from natural language

The final big hurdle was order processing. Customers give you everything — name, address, phone number — sometimes in one big paragraph, sometimes dribbled across several messages. Jaber spun up a dedicated order agent that parses all of that free-text mess into a clean, structured order. With that working, we had a genuinely great demo.

The lab "final" — three of us, three live challenges

Then the assessment turned into a pressure test. Each of us was given a different real-time feature to add in one hour, live in the lab, like a final exam:

TeammateLive taskHow it went
MeSorting, searching, filtering, and revenue views in the dashboardComfortable territory — done well within the hour
JakariaThe order tab — wiring up the order APIsTough. He'd lived in frontend and CLIP and hadn't touched the DB much, so this one fought him
JaberUser tagging (lead, potential customer, customer, repeat customer, unimportant…)Some tags were plain DB queries, others needed AI sentiment analysis — and he landed it

That tagging task was good enough that we kept it as a real product feature afterward.

Production polish

For the final submission we leaned all the way into the "make it real" rubric. We stood up a Prometheus + Grafana monitoring stack, and we backfilled Jest + Playwright for proper end-to-end test coverage (that was checkpoint three — the moment our CI/CD checkmarks finally started meaning something).

Here's the full system we ended up with:

VibeSell system architecture — GitHub Actions CI/CD into a PM2 + Docker deployment on AWS EC2, with Caddy reverse proxy, Next.js front and back end, BullMQ/Redis queue and AI workers, Postgres on RDS, Qdrant vector DB, CLIP and Gemini/OpenAI models, Meta Graph API + webhooks, and a Prometheus/Grafana monitoring stack.

A quick tour: Cloudflare sits in front of DNS, Caddy reverse-proxies into the Next.js front and back ends, and everything runs in Docker on AWS EC2, managed by PM2. The agents and cron jobs (automated tagging, scheduled posts) run as Process-AI workers that pull from BullMQ/Redis. State lives in Postgres (AWS RDS) for relational data and Qdrant for vectors; CLIP handles image embeddings and Gemini/OpenAI power the language side. Messages flow in through the Meta Graph API and webhooks, Clerk handles auth, and Prometheus + Grafana watch the whole thing. GitHub Actions ships it, with Jest + Playwright guarding the gate.

Presentation day, and the blunder

And then, the way these stories always seem to go — on presentation day, the server fell over. It had quietly drowned in Docker build caches until it ran out of room and the deployment failed at the worst possible moment.

I scrambled: restarted things, spun up a fresh server, and redeployed the entire stack from scratch. I got it back up — but I walked into the presentation late.

We scored 98% in the lab. But that last-moment server blunder is, I'm fairly sure, what cost us best project. So close.

What I took away

This was, without question, one of the most satisfying and gloriously over-engineered things I've built. A capstone that asked for production standards turned into a genuine product crash course — Facebook's auth maze, my first agent loop, RAG, a real message queue, CLIP image search, monitoring, tests, and one humbling lesson about babysitting your disk space before a demo.

We didn't win best project. But I'd do the whole messy thing again.

Keep reading