Self-hosted LLM
Same RAG, no external AI
statuslive
stackOllama, Llama 3.1 8B, nomic-embed-text
updated2026
OllamaLlama 3Self-hostedpgvector
01 / / the problem
Healthcare, legal, and EU clients with GDPR requirements often cannot send data to OpenAI or Anthropic. They need on-premise AI with comparable functionality. Few engineers can demonstrate this works.
02 / / what i built
→Same UI as case study 02
→Toggle between Cloud (Anthropic) and Local (Llama) backends
→Local embedding model — no external API at any step
→Side-by-side response time and cost comparison
→Live resource utilization shown during inference
→Tradeoff documentation: when each backend makes sense
03 / / how i built it
Ollama
Drop-in local LLM serving, simple deployment
Llama 3.1 8B
Strong quality at a size that fits on a 32GB VPS
nomic-embed-text
Local embeddings, no external API needed
pgvector
Reused from case study 02
04 / / live demo
→ open live demo at https://local-ai.drodriguez.site
Loom walkthrough — 90 seconds
Demo credentials shown on the demo's landing page.
05 / / production extensions
Things deliberately out of scope for the demo, but I'd add for production:
→Larger models (70B) for higher quality on dedicated hardware
→Quantization variations for inference speed
→GPU acceleration with CUDA
→Multi-model orchestration based on query type