case study / / 03 / /

Self-hosted LLM

Same RAG, no external AI

statuslive

stackOllama, Llama 3.1 8B, nomic-embed-text

updated2026

OllamaLlama 3Self-hostedpgvector

→ open live demo → book a call about this

01 / / the problem

Healthcare, legal, and EU clients with GDPR requirements often cannot send data to OpenAI or Anthropic. They need on-premise AI with comparable functionality. Few engineers can demonstrate this works.

02 / / what i built

→Same UI as case study 02

→Toggle between Cloud (Anthropic) and Local (Llama) backends

→Local embedding model — no external API at any step

→Side-by-side response time and cost comparison

→Live resource utilization shown during inference

→Tradeoff documentation: when each backend makes sense

03 / / how i built it

Ollama

Drop-in local LLM serving, simple deployment

Llama 3.1 8B

Strong quality at a size that fits on a 32GB VPS

nomic-embed-text

Local embeddings, no external API needed

pgvector

Reused from case study 02

04 / / live demo

→ open live demo at https://local-ai.drodriguez.site

Loom walkthrough — 90 seconds

Demo credentials shown on the demo's landing page.

05 / / production extensions

Things deliberately out of scope for the demo, but I'd add for production:

→Larger models (70B) for higher quality on dedicated hardware

→Quantization variations for inference speed

→GPU acceleration with CUDA

→Multi-model orchestration based on query type