Pablo AI Assistant
Automotive Parts Expert โข Always Online
๐ Hi! I'm Pablo, your AI automotive parts expert. I'm trained on 577+ million part fitment records!
I can help you find the perfect parts for your vehicle, verify fitment, and answer any automotive questions. What are you working on today?
I need an oil filler cap for my 1986 Hyundai Excel with the 1.5L engine
๐ฏ Great! I found 4 compatible oil filler caps for your 1986 Hyundai Excel 1.5L:
Engine Oil Filler Cap โข OE Exact Grade
Engine Oil Filler Cap โข Twist Lock
๐ก Pro Tip: The Beck/Arnley 016-0050 has the exact OE specifications (1.41" neck diameter, twist-lock type) for your Excel's 1.5L SOHC engine.
๐ Real-time Metrics
๐ง Knowledge Base
Pablo's Brain: How It Works
Understanding the LLM Architecture in Simple Terms
๐ฝ๏ธ Think of Pablo Like a Super-Smart Restaurant Waiter
Imagine a restaurant waiter who has memorized every dish from every restaurant in the world, knows every ingredient, and can instantly tell you what dishes match your dietary needs. That's Pablo โ but for auto parts.
When you ask "I need an oil cap for my 1986 Hyundai Excel," Pablo doesn't just randomly guess. It uses a sophisticated system with three main parts working together, just like how a great restaurant has a host, waiter, and chef working as a team.
๐งฉ The Three Parts of Pablo's Brain
Part 1: The Brain (Llama 3.1 70B Model)
This is the core "thinking" part of Pablo. It's a Large Language Model (LLM) created by Meta called Llama 3.1 with 70 billion parameters. Think of parameters like brain cells โ more parameters means more capacity to understand and generate complex responses.
What Does "70 Billion Parameters" Mean?
Imagine a massive spreadsheet with 70 billion numbers. Each number has been carefully adjusted during training so that when you feed text in, the model produces intelligent responses. These numbers encode everything the model has learned about language, logic, and knowledge.
๐ฏ Why Llama 3.1 70B? It's the sweet spot between being smart enough to understand complex automotive questions and being small enough to run on our hardware. Larger models (like 405B) are smarter but require massive data centers. Smaller models (8B) are faster but less accurate.
Part 2: The Library (RAG Vector Database)
Even with 70 billion parameters, the brain can't memorize ALL 577 million parts. So we give it a "reference library" it can search instantly. This technique is called RAG (Retrieval-Augmented Generation).
How RAG Works โ A Simple Analogy:
Imagine you're taking an open-book exam. You're smart and know the concepts, but you're allowed to look up specific facts in your textbook. RAG works the same way:
User asks: "Oil cap for 1986 Hyundai Excel 1.5L"
System searches 577M records and finds the 10 most relevant parts
These 10 parts are given to the brain along with the question
The brain crafts a helpful response using this specific information
We use Pinecone โ a specialized database that can search through 577 million records in under 50 milliseconds by comparing mathematical "fingerprints" of text.
Each part record is converted into a 1024-number "fingerprint" using the E5-large-v2 model. Similar parts have similar fingerprints, making search fast and accurate.
Part 3: The Specialized Training (QLoRA Fine-tuning)
The base Llama model is smart but doesn't know anything about auto parts. Fine-tuning is how we teach it automotive expertise by showing it 150,000 example conversations about parts, fitment, and vehicle specifications.
The "Sticky Notes" Technique (QLoRA):
Retraining all 70 billion parameters would take weeks and cost hundreds of thousands of dollars. Instead, we use a clever technique called QLoRA:
Imagine the base model as a massive filing cabinet with 70 billion folders. Instead of rewriting all folders, we attach small "sticky notes" (about 400 million of them) to key folders. When the AI looks something up, it reads both the main folder AND the sticky note.
This means we only train 0.5% of the model, making it possible to train on 4 consumer GPUs instead of a massive data center!
๐ How All Three Parts Work Together
When a customer asks: "I need an oil filler cap for my 1986 Hyundai Excel 1.5L"
๐ฏ The Result:
In about 0.3 seconds, Pablo combines its specialized automotive training, searches through 577 million parts, and generates a helpful, accurate response that recommends the right oil filler cap (Beck/Arnley 016-0050) with detailed specifications, alternative options, and pro tips โ just like having an expert parts specialist available 24/7.
โ๏ธ Technical Specifications Summary
Base Model
Fine-tuning
RAG System
How We Train Pablo AI
A Simple Guide to Understanding LLM Training
๐ Think of it Like Training a New Employee
Imagine you hired a brilliant new employee who has read millions of books and websites โ they're incredibly smart and can write, code, and answer questions about almost anything. But they know nothing about auto parts.
That's exactly what the base Llama 3.1 model is like. It's incredibly intelligent, but it doesn't know that a 016-0050 oil filler cap fits a 1986 Hyundai Excel.
Training is how we teach this brilliant employee everything about automotive parts โ which parts fit which vehicles, what the part numbers mean, how to help customers find the right parts.
๐ The Complete Training Journey
๐ฅ Gathering the Knowledge (Raw Data Collection)
First, we collect ALL the automotive knowledge that exists. This comes from ACES files โ the industry-standard format that every auto parts manufacturer uses to share their catalog data.
Real ACES Data Example:
When Beck/Arnley says "our part 016-0050 fits the 1986 Hyundai Excel", it looks like this in their catalog file:
<App id="5223">
<BaseVehicle id="139" /> โ This number means "1986 Hyundai Excel"
<EngineBase id="1062" /> โ This means "1.5L 4-cylinder SOHC engine"
<PartType id="5269" /> โ This means "Oil Filler Cap"
<Part>016-0050</Part> โ The actual part number
</App>
We have 577 million of these records from 850+ brands!
๐งน Making the Data Human-Readable (Data Processing)
Those cryptic codes like BaseVehicle id="139" mean nothing to a human or AI. We need to translate them into actual vehicle and part information by joining with reference databases.
Before โ After Translation:
BaseVehicle: 139
EngineBase: 1062
PartType: 5269
Part: 016-0050
Make: HYUNDAI
Model: EXCEL
Year: 1986
Engine: 1.5L L4 SOHC
Product: Engine Oil Filler Cap
Part Number: 016-0050
Brand: Beck/Arnley
This processed data goes into our euro table โ a massive database with 50+ columns of vehicle and part attributes for every fitment record.
โ๏ธ Creating Lesson Plans (Q&A Dataset Generation)
Now here's the key insight: you can't just show an AI raw data and expect it to learn. You need to create example conversations โ showing the AI "when a customer asks THIS, you should respond like THIS."
Training Example (Q&A Pair):
"I need an oil filler cap for my 1986 Hyundai Excel with a 1.5 liter engine"
"For your 1986 Hyundai Excel 1.5L SOHC, I recommend the Beck/Arnley 016-0050 Engine Oil Filler Cap. This is an OE-exact replacement with a twist-lock design, 1.41" neck diameter. It fits all Excel trims (Base, GL, GLS) for the USA and Canadian markets. The Gates 31085 is also compatible if you prefer an alternative brand."
We generate 150,000+ of these training examples programmatically from our 577M records!
๐ก Analogy: This is like creating a training manual with 150,000 example customer conversations. "When a customer says X, the perfect response is Y."
๐ข Converting Text to Numbers (Tokenization)
Computers don't understand words โ they only understand numbers. We need to convert every word into tokens (numbers) that the AI can process.
Tokenization Example:
"oil filler cap for Hyundai Excel"
[8435, 1254, 6623, 369, 19478, 35621]
The Llama tokenizer has a vocabulary of 128,000 tokens, covering every word and subword it might encounter.
๐ง The Actual Training (Fine-tuning with QLoRA)
This is where the magic happens. We load the base Llama 3.1 70B model and teach it using our 150,000 Q&A examples. But here's the clever part โ we don't modify the entire 70 billion parameters (that would require massive computing power). Instead, we use a technique called QLoRA.
๐ก The QLoRA Technique Explained Simply:
Imagine the base model as a massive filing cabinet with 70 billion folders (parameters). Rewriting all those folders would take forever.
Instead, QLoRA adds a small "sticky note system" โ about 400 million small notes attached to key folders. When the AI looks something up, it checks the main folder AND reads the sticky note. The sticky notes contain all our automotive knowledge!
This means we only need to train 0.5% of the model (the sticky notes), making it possible to train on consumer GPUs instead of requiring a massive data center.
๐ง Training Configuration:
โ๏ธ What Happens Inside During Training
During each training step, the computer does this cycle millions of times:
Feed one Q&A pair: "User asks about Excel oil cap โ Expected response about 016-0050"
The AI generates its own response based on current knowledge
Compare AI's response to expected response โ how different are they?
Slightly update the LoRA adapter weights to reduce the error
Do this for all 150,000 examples, 3 times (epochs) = ~450,000 update cycles
๐ฏ Result: After 72 hours, the LoRA adapters have been tuned so precisely that the AI now understands automotive parts as well as your best employee โ but it can answer instantly and never forgets!
๐ Adding the Reference Library (RAG Vector Database)
Even after training, the AI can't memorize ALL 577 million parts. So we give it a "reference library" it can search instantly. This is called RAG (Retrieval-Augmented Generation).
How RAG Works:
Convert each fitment record into a numerical "fingerprint" (embedding vector)
Store all 577M fingerprints in a vector database (Pinecone)
When user asks a question, find the 10 most relevant records instantly
Feed those records to the AI along with the question
This way, Pablo can access any of the 577 million records in under 50 milliseconds, without needing to memorize them all!
๐ฏ The Complete Training Pipeline
Local GPU Training Setup
Train on Your Own Hardware โ Deploy to Cloud
Best Performance Setup
Cost-Effective Alternative
๐ GPU Memory Usage
โ๏ธ Training Config
๐ ๏ธ Software Stack
Training
CUDA
Inference
AWS Inference (Cost-Optimized)
Train Locally โ Deploy to AWS for Inference Only
Cost Optimization Strategy
By training locally on your own hardware, you eliminate expensive GPU training costs on AWS. Cloud is used purely for scalable inference.
๐๏ธ Minimum Viable Infrastructure
๐ฐ Monthly Cost (On-Demand)
๐ก Cost Reduction Options
Comprehensive Development Plan
10-Week Sprint Plan with Detailed Deliverables
๐ Project Timeline Overview
Phase 1: Research & Planning
Foundation Setup & Architecture Design
๐ Week 1: Data Assessment
๐๏ธ Week 2: Architecture Design
๐ฆ Phase 1 Deliverables:
Phase 2: Data Pipeline & Vector Database
Process 577M Records & Build RAG Infrastructure
๐ Week 3: Data Processing
๐ Week 4: Vector Database
๐ฆ Phase 2 Deliverables:
Phase 3: Model Fine-tuning (LOCAL)
Train Pablo AI on 4ร RTX 4090 GPUs
โ๏ธ Week 5: Dataset Preparation
๐ง Week 6: Training & Export
๐ฆ Phase 3 Deliverables:
Phase 4: API Development & Integration
Build Production-Ready Backend Services
โก Week 7: Inference Server
๐ Week 8: API & UI
๐ฆ Phase 4 Deliverables:
Phase 5: AWS Deployment & Launch
Production Deployment & Go-Live