DR / / drodriguez.site
case study / / 03 / /

Self-hosted LLM

Same RAG, no external AI

statuslive
stackOllama, Llama 3.1 8B, nomic-embed-text
updated2026
OllamaLlama 3Self-hostedpgvector
01 / / the problem

Healthcare, legal, and EU clients with GDPR requirements often cannot send data to OpenAI or Anthropic. They need on-premise AI with comparable functionality. Few engineers can demonstrate this works.

02 / / what i built
Same UI as case study 02
Toggle between Cloud (Anthropic) and Local (Llama) backends
Local embedding model — no external API at any step
Side-by-side response time and cost comparison
Live resource utilization shown during inference
Tradeoff documentation: when each backend makes sense
03 / / how i built it
Ollama
Drop-in local LLM serving, simple deployment
Llama 3.1 8B
Strong quality at a size that fits on a 32GB VPS
nomic-embed-text
Local embeddings, no external API needed
pgvector
Reused from case study 02
04 / / live demo
→ open live demo at https://local-ai.drodriguez.site
Loom walkthrough — 90 seconds

Demo credentials shown on the demo's landing page.

05 / / production extensions

Things deliberately out of scope for the demo, but I'd add for production:

Larger models (70B) for higher quality on dedicated hardware
Quantization variations for inference speed
GPU acceleration with CUDA
Multi-model orchestration based on query type