Skip to content

AI

πŸ“š From PDF Overload to AI Clarity: Building an AI RAG Assistant

Introduction

If you’ve ever tried to dig a single obscure fact out of a massive technical manual, you’ll know the frustration 😩: you know it’s in there somewhere, you just can’t remember the exact wording, cmdlet name, or property that will get you there.

For me, this pain point came from Office 365 for IT Pros β€” a constantly updated, encyclopaedic PDF covering Microsoft cloud administration. It’s a superb resource… but not exactly quick to search when you can’t remember the magic keyword.
Often I know exactly what I want to achieve β€” say, add copies of sent emails to the sender’s mailbox when using a shared mailbox β€” but I can’t quite recall the right cmdlet or property to Ctrl+F my way to the answer.

That’s when I thought: what if I could take this PDF (and others in my archive), drop them into a centralised app, and use AI as the conductor and translator πŸŽΌπŸ€– to retrieve the exact piece of information I need β€” just by asking naturally in plain English.

This project also doubled as a test bed for Claude Code, which I’d been using since recently completing a GenAI Bootcamp πŸš€.
I wanted to see how it fared when building something from scratch in an IDE, rather than in a chat window.

πŸ‘‰ In this post, I’ll give a very high level overview of the four iterations (v1–v4) - what worked, what failed, and what I learned along the way.


Version Comparison at a Glance πŸ—‚οΈ

Version Stack / Interface Vector DB(s) Outcome
v1 Python + Gradio UI Pinecone Uploaded PDFs fine, but no usable retrieval. Abandoned.
v2 FastAPI + React (Dockerised) Pinecone + Qdrant Cleaner setup, partial functionality. Containers failed often.
v3 Python CLI (dual pipeline: PDF + Markdown) ChromaDB More stable retrieval, dropped UI, faster iteration. Still config headaches.
v4 Enterprise-style CLI Azure OpenAI + Ollama, Chroma Usable tool: caching, reranking, analytics, model switching. I actually use this daily.

Architecture Evolution (v1 β†’ v4)

Architecture Flowchart

This single diagram captures the arc of the project πŸ› οΈ.


Version 1 – Pinecone RAG Test Application

Version 1 Screenshot

The first attempt was… short lived πŸͺ¦.

I gave Claude clear instructions, and to its credit, it produced a functional backend and frontend pretty quickly. I uploaded a PDF into Pinecone, successfully chunked it, and then… nothing.

This first attempt was a non-starter 🚫.
Despite uploading the PDF successfully to Pinecone, the app was unable to retrieve any usable results for whatever reason. I spent a day troubleshooting before calling it a day and moving on.
I had kind of being following a YouTube tutorial for this project, but even though the tutorial was less than a year old, much of the content didn't map to what I was seeing - especially in the Pinecone UI.
Evidence of how quickly the AI landscape and products are changing I guess.😲

πŸ’‘ Lesson learned: I should've known the steps I was following in the tutorial were likely to have changed. I do afterall work with Microsoft Cloud on a daily basis, where product interfaces seem to change between browser refreshes!😎

πŸ‘‰ Which led me to v2: if I was going to try again, I wanted a cleaner, containerised architecture from the start.


Version 2 – IT Assistant (FastAPI + React)

Version 2 Screenshot

For round two, I decided to start cleaner 🧹.

The first attempt had been a sprawl of Python files, with Claude spinning up new scripts left, right, and centre. So I thought: let’s containerise from the start 🐳.

  • Stack: FastAPI backend, Next.js frontend, Dockerised deployment
  • Vector Stores: Pinecone and Qdrant
  • Features: Modular vector store interface, PDF + Markdown parsing, a React chat UI with source display

On paper, it looked solid. In practice: the containers refused to start, health checks failed β€” meaning the services never even got to the point of talking to each other β€” and ports (3030, 8000) were dead πŸ’€.

In short, the project got a bit further in terms of useful results and functionality, but ultimately I parked it and went back to the drawing board.

πŸ’‘ Lesson learned: Dockerising from day one helps with clean deployments, but only if the containers actually run.

By this point, I genuinely wondered if I was wasting my time and that I might be missing some huge bit of fundamental knowledge that was grounding the project before it had started 🫠.
Still, I knew I wanted to strip things back and simplify.
So, before ordering a copy of "The Big Book of AI: Seniors Edition" off of Amazon, I thought I would try a different tack.

πŸ‘‰ Which led directly to v3: drop the UI, keep it lean, focus on retrieval.


Version 3 – RAG Agent (Dual-Pipeline, CLI)

Version 3 Screenshot

By this point, I realised the frontend was becoming a distraction 🎭. I’d spent too long wrestling with UX issues, which were getting in the way of the real meat and potatoes of the project β€” so I ditched the UI and went full CLI.

  • Stack: Python CLI, dual pipelines for PDF + Markdown
  • Vector Store: ChromaDB
  • Features: PDF-to-Markdown converter, deduplication, metadata enrichment, batch processing, incremental updates, output logging, rich terminal formatting

Chroma proved more successful than Pinecone, and the CLI gave me a faster dev loop ⚑.
But misaligned environment variables and Azure credential mismatches caused repeated headaches 🀯.

πŸ’‘ Lesson learned: simplifying the interface let me focus on the retrieval logic β€” but configuration discipline was just as important. During issue debugging Claude will spin up numerous different python files to fix the issue(s) at hand. I had to remember to get Claude to roll the fixes into the new container builds each time, to ensure the project structure stayed clean and tidy

At this stage, I had a functioning app, but the results of retrival were pretty poor, and the functionality was lacking.

πŸ‘‰ Which led naturally into v4: keep the CLI, tune the retriaval process, and add the features that would make the app useable.


Version 4 – PDF RAG Assistant v2.0 Enterprise

Version 4 Screenshot After three rewrites, I finally had something that looked and felt like a useable tool πŸŽ‰.

This is the version I still use today πŸŽ‰. It wasn’t a quick win: v4 took a long time to fettle into shape with many hours of trying different things to improve the results, testing, re-testing, and testing again πŸ”„.

The app is in pretty good shape now, with some good features added along the way. Most importantly, the results returned via query are good enough for me to use βœ…. Don’t get me wrong, the β€œfinal” version of the app (for now) is pretty usable β€” but I don’t think I’ll be troubling any AI startup finance backers any time soon πŸ’ΈπŸ™ƒ.


The Guided Tour (v4 Screenshots)

1. Startup & Menu

v4 Menu
Finally felt like a tool instead of just another Python script πŸ› οΈ.

2. Query Processing Pipeline

v4 Pipeline
For the first time, everything was working together instead of fighting me βš”οΈ.

3a. Model Switching – Azure

v4 Model Switch 1
Azure OpenAI was quicker ⚑ and free with my Dev subscription.

3b. Model Switching – Ollama

v4 Model Switch 2
Ollama gave me a safety net offline 🌐, even if slower.

4. Start Page & Status

v4 Start Page
Reassuring after so many broken starts β€” just seeing a healthy status page felt like progress. πŸ˜…

5a. Query Results – Simple

v4 Simple Response

5b. Query Results – Detailed

v4 Detailed Response
Detailed mode felt like the first time the assistant could teach me back, not just parrot text. πŸ“–

6. Response Quality Reports

v4 Quality Report
Handy both as a sanity check βœ… and as a reminder that it’s not perfect β€” but at least it knows it. 🀷

7. Query History

v4 History
At this point, it wasn’t just answering questions β€” it was helping me build knowledge over time. πŸ“š


What Made v4 Different

Here’s what finally tipped the balance from β€œprototype” to β€œusable assistant”:

  • Semantic caching 🧠 – the assistant remembers previous queries and responses, so it doesn’t waste time (or tokens) re-answering the same thing.
  • ColBERT reranking 🎯 – instead of trusting the first vector search result, ColBERT reorders the results by semantic similarity, surfacing the most relevant chunks.
  • Analytics πŸ“Š – lightweight stats on query quality and hit rates. Not a dashboard, more like reassurance that the retrieval pipeline is behaving.
  • Dynamic model control πŸ”€ – being able to switch between Azure OpenAI (fast, cloud-based) and Ollama (slow, local fallback) directly in the CLI.

πŸ’‘ Lesson learned: retrieval accuracy isn’t just about the database β€” caching, reranking, and model flexibility all compound to make the experience better.


Losing my RAG 🧡 (Pun Intended)

There were definitely points where frustration levels were high enough to make me question why I’d even started β€” four rewrites will do that to you. Pinecone that wouldn’t retrieve, Docker containers that wouldn’t start, environment variables that wouldn’t line up.

Each dead end was frustrating in the moment, but in hindsight, as we all know, the failures are where the learning is. Every wrong turn taught me something that made the next version a little better.


Experimentation & Debugging

  • Pinecone: I created two or three different DBs and successfully chunked the data each time. But v1 and v2 couldn’t pull anything useful back out πŸͺ«.
  • Azure: The only real issue was needing a fairly low chunk size (256) to avoid breaching my usage quota βš–οΈ.
  • Iteration habit: If I hit a roadblock with Claude Code that seemed to be taking me further away from the goal, I’d pause ⏸️, step away 🚢, then revisit πŸ”„. Sometimes it was worth troubleshooting; other times it was better to start fresh.

Lessons Learned

πŸ’‘ Start with a CLI before adding a UI β€” it keeps you focused on retrieval.
πŸ’‘ Always check embedding/vector dimensions for compatibility.
πŸ’‘ Dockerising helps with clean deployments, but rebuilds can be brittle.
πŸ’‘ Small chunk sizes often work better with Azure OpenAI quotas.
πŸ’‘ RAG accuracy depends on multiple layers β€” not just the vector DB.


If I Were Starting Again

With hindsight, I’d probably:

  • Begin directly with ChromaDB instead of Pinecone.
  • Skip the frontend until retrieval was nailed down.
  • Spend more time upfront on embedding/vector compatibility.
  • Put more time into researching retrievability improvements.

What’s Next (v5?)

Future directions might include:

  • πŸ§ͺ Testing new embedding models and vector DBs – different models could improve retrieval precision, especially for domain-specific PDFs.
  • 🎯 Improving pinpoint retrieval accuracy – because even in v4, it sometimes still β€œgets close” rather than β€œspot on.”
  • πŸ’¬ MCP Server integration – so the app can query multiple data sources, not just local files.
  • πŸ“Š Adding Guardrails – edging it closer toward an enterprise-ready assistant.

If I can improve the results in v4 or with v5, then that would be a real win πŸ†.


Things I Liked About Claude Code πŸ–₯️

One of the constants across the project was working inside Claude Code, and there were some things I really liked about the experience:

  • βœ… Automatic chat compaction – no endless scrolling or need to copy/paste old snippets
  • πŸ—‚οΈ Chat history – the ability to pick up where I left off in a previous session
  • πŸ”’ On-screen token counter – knowing exactly how much context I was burning through
  • πŸ‘€ Realtime query view – watching Claude process step-by-step, with expand/collapse options for analysis

Compared to a browser-based UI, these felt like small but meaningful quality-of-life upgrades. For a coding-heavy project, those workflow improvements really mattered.


Final Thoughts

This project started with the frustration of not being able to remember which cmdlet to search for in a 1,000-page PDF 😀. Four rewrites later, I have a tool that can answer those questions directly.

It’s far from perfect. There are limitations to how well the data can be processed and subsequently how accurately it can be retrieved β€” at least with the models and resources I used βš–οΈ. But it’s functional enough that I actually use it β€” which is more than I could say for versions one through three.

Overall, though, this wasn’t just about the app. It was about getting hands-on with a code editor in the terminal and IDE, instead of being stuck in a chat-based UI πŸ’». In that regard, the project goal was achieved. Using Claude Code (other CLI-based AI assistants are available 😎) was a much better experience for a coding-heavy project.

I did briefly try OpenAI’s Codex at the very start, just to see which editor I preferred. It didn’t take long to see that Codex didn’t really have the chops ❌. Claude felt sharper, more capable ✨, and it became clear why it has the reputation as the current CLI editor sweetheart πŸ’– β€” while Codex has barely made a ripple 🌊.


Reader Takeaway πŸ“¦

If you’re thinking about building your own RAG assistant:

  • Expect dead ends β€” each failed attempt will teach you something.
  • Keep it simple early (CLI + local DB) before adding shiny extras.
  • Focus on retrieval quality, not just the vector DB.
  • Treat AI assistants as copilots, not magicians.

At the end of the day, my assistant works well enough for me (for now) β€” and that was the whole point.

Share on Share on

πŸ€– First Steps into AI Automation: My Journey from Trial to Self-Hosted Chaos

What started as 'let me just automate some emails' somehow turned into a comprehensive exploration of every AI automation platform and deployment method known to mankind...

After months of reading about AI automation tools and watching everyone else's productivity skyrocket with clever workflows, I finally decided to stop being a spectator and dive in myself. What started as a simple "let's automate job alert emails" experiment quickly became a week-long journey through cloud trials, self-hosted deployments, OAuth authentication battles, and enough Docker containers to power a small data centre.

In this post, you'll discover:

  • Real costs of AI automation experimentation ($10-50 range)
  • Why self-hosted OAuth2 is significantly harder than cloud versions
  • Performance differences: Pi 5 vs. desktop hardware for local AI
  • When to choose local vs. cloud AI models
  • Time investment reality: ~10 hours over 1 week for this project

Here's how my first real foray into AI automation unfolded β€” spoiler alert: it involved more container migrations than I initially planned.

Hardware baseline for this project:

πŸ’» Development Environment

  • Primary machine: AMD Ryzen 7 5800X, 32GB DDR4, NVMe SSD
  • Pi 5 setup: 8GB RAM, microSD storage
  • Network: Standard home broadband (important for cloud API performance)

🎯 The Mission: Taming Job Alert Email Chaos

Let's set the scene. If you're drowning in recruitment emails like I was, spending 30+ minutes daily parsing through multiple job listings scattered across different emails, you'll understand the frustration. Each recruitment platform has its own format, some emails contain 5-10 different opportunities, and manually extracting the relevant URLs was becoming a productivity killer.

The vision: Create an automated workflow that would:

  • Scrape job-related emails from my Outlook.com inbox
  • Extract and clean the job data using AI
  • Generate a neat summary email with all the job URLs in one place
  • Send it back to me in a digestible format

Simple enough, right? Famous last words.


πŸ”„ Phase 1: The n8n Cloud Trial Adventure

My research pointed to n8n as the go-to tool for this kind of automation workflow. Being sensible, I started with their 14-day cloud trial rather than jumping straight into self-hosting complexities.

βš™οΈ Initial Setup & First Success

The n8n cloud interface is genuinely impressive β€” drag-and-drop workflow building with a proper visual editor that actually makes sense. Within a couple of hours, I had:

βœ… Connected to Outlook.com via their built-in connector
βœ… Set up email filtering to grab job-related messages
βœ… Configured basic data processing to extract text content
βœ… Integrated OpenAI API for intelligent job URL extraction

n8n jobs workflow with OpenAI API integration

πŸ€– The AI Integration Challenge

This is where things got interesting. Initially, I connected the workflow to my OpenAI API account, using GPT-4 to parse email content and extract job URLs. The AI component worked brilliantly β€” almost too well, since I managed to burn through my $10 worth of token credits in just two days of testing.

The cost reality: Those "just testing a few prompts" sessions add up fast. A single complex email with multiple job listings processed through GPT-4 was costing around $0.15-0.30 per API call. When you're iterating on prompts and testing edge cases, those costs compound quickly.

Lesson learned: Test with smaller models first, then scale up. GPT-4 is excellent but not cheap for experimental workflows.

🎯 Partial Success (The Classic IT Story)

The workflow was partially successful β€” and in true IT fashion, "partially" is doing some heavy lifting here. While the automation successfully processed emails and generated summaries, it had one glaring limitation: it only extracted one job URL per email, when most recruitment emails contain multiple opportunities.

What this actually meant: A typical recruitment email might contain 5-7 job listings with individual URLs, but my workflow would only capture the first one it encountered. This wasn't a parsing issue β€” the AI was correctly identifying all the URLs in its response, but the n8n workflow was only processing the first result from the AI output.

Why this limitation exists: The issue stemmed from how I'd configured the data processing nodes in n8n. The workflow was treating the AI response as a single data item rather than an array of multiple URLs. This is a common beginner mistake when working with structured data outputs.

This became the recurring theme of my experimentation week: everything works, just not quite how you want it to.

πŸ’‘ Enter Azure OpenAI

Rather than continue burning through OpenAI credits, I pivoted to Azure OpenAI. This turned out to be a smart move for several reasons:

  • Cost control: Better integration with my existing Azure credits
  • Familiar environment: Already comfortable with Azure resource management
  • Testing flexibility: My Visual Studio Developer subscription gives me Β£120 monthly credits

I deployed a GPT-4 Mini model in my test lab Azure tenant β€” perfect for experimentation without breaking the bank.

Azure OpenAI GPT-4 Mini deployment configuration

The Azure OpenAI integration worked seamlessly with n8n, and I successfully redirected my workflow to use the new endpoint. Finally, something that worked first time.

n8n jobs workflow with Azure OpenAI integration


🐳 Phase 2: Self-Hosting Ambitions (Container Edition #1)

With the n8n cloud trial clocking ticking down, I faced the classic build-vs-buy decision. The cloud version was excellent, but I wanted full control and the ability to experiment without subscription constraints. The monthly $20 cost wasn't prohibitive, but the learning opportunity of self-hosting was too appealing to pass up.

Enter self-hosting with Docker containers β€” specifically, targeting my Raspberry Pi 5 setup.

🏠 The OpenMediaVault Experiment

My first attempt involved deploying n8n as a self-hosted Docker container on my OpenMediaVault (OMV) setup. For those unfamiliar, OMV is a network-attached storage (NAS) solution built on Debian, perfect for home lab environments where you want proper storage management with container capabilities.

Why the Pi 5 + OMV route:

  • Always-on availability: Unlike my main PC, the Pi runs 24/7
  • Low power consumption: Perfect for continuous automation workflows
  • Storage integration: OMV provides excellent Docker volume management
  • Learning opportunity: Understanding self-hosted deployment challenges

The setup:

  • Host: Raspberry Pi 5 running OpenMediaVault
  • Backend storage: NAS device for persistent data
  • Database: PostgreSQL container for n8n's backend
  • Edition: n8n Community Edition (self-hosted)

OpenMediaVault Docker container management interface

😀 The Great OAuth Authentication Battle

This is where my self-hosting dreams met reality with a resounding thud.

I quickly discovered that replicating my cloud workflow wasn't going to be straightforward. The self-hosted community edition has functionality restrictions compared to the cloud version, but more frustratingly, I couldn't get OAuth2 authentication working properly.

Why OAuth2 is trickier with self-hosted setups:

  • Redirect URI complexity: Cloud services handle callback URLs automatically, but self-hosted instances need manually configured redirect URIs
  • App registration headaches: Azure app registrations expect specific callback patterns that don't align well with dynamic self-hosted URLs
  • Token management: Cloud versions handle OAuth token refresh automatically; self-hosted requires manual configuration
  • Security certificate requirements: Many OAuth providers now require HTTPS callbacks, adding SSL certificate management complexity

The specific challenges I hit:

  • Outlook.com authentication: Couldn't configure OAuth2 credentials using an app registration from my test lab Azure tenant
  • Exchange Online integration: Also failed to connect via app registration β€” kept getting "invalid redirect URI" errors
  • Documentation gaps: Self-hosting authentication setup felt less polished than the cloud version

After several hours over two days debugging OAuth flows and Azure app registrations, I admitted defeat on the email integration front. Sometimes retreat is the better part of valour.

🌀️ Simple Success: Weather API Workflow

Rather than abandon the entire self-hosting experiment, I pivoted to a simpler proof-of-concept. I created a basic workflow using:

  • OpenWeatherMap API for weather data
  • Gmail integration with app passwords (much simpler than OAuth2)
  • Basic data processing and email generation

This worked perfectly and proved that the self-hosted n8n environment was functional β€” the issue was specifically with the more complex authentication requirements of my original workflow.

Simple n8n weather workflow using OpenAI API


🐳 Phase 3: The WSL Migration (Container Migration #2)

While the Pi 5 setup was working fine for simple workflows, I started feeling the hardware limitations when testing more complex operations. Loading even smaller AI models was painfully slow, and memory constraints meant I couldn't experiment with anything approaching production-scale workflows.

Time for Container Migration #2.

πŸ–₯️ Moving to WSL + Docker Desktop

With the Pi 5 hitting performance limits, I decided to experiment with local AI models using Ollama (a local LLM hosting platform) and OpenWebUI (a web interface for interacting with AI models). This required more computational resources than the Pi could provide, so I deployed these tools using Docker Compose inside Ubuntu running on Windows WSL (Windows Subsystem for Linux).

This setup offered several advantages:

Why WSL over the Pi 5:

  • Better hardware resources: Access to my Windows PC's 32GB RAM and 8-core CPU vs. Pi 5's 8GB RAM limitation
  • Docker Desktop integration: Visual container management through familiar interface
  • Development flexibility: Easier to iterate and debug workflows with full IDE access
  • Performance reality: Local LLM model loading went from 1+ minutes on Pi 5 to under 30 seconds

My development machine specs:

  • CPU: AMD Ryzen 7 5800H with Radeon Graphics
  • RAM: 32GB DDR4
  • Storage: NVMe SSD for fast model loading
  • GPU: None (pure CPU inference)

Time Investment Reality:

  • n8n cloud setup: 2-3 hours (including initial workflow creation)
  • OAuth2 debugging: 3+ hours over 2 days (ongoing challenge)
  • Pi 5 container setup: 2+ hours
  • Docker Desktop container set up: 2+ hours
  • Total project time: ~10 hours over 1 week

The new stack:

  • Host: Ubuntu in WSL2 on Windows
  • Container orchestration: Docker Compose
  • Management: Docker Desktop for Windows
  • Models: Ollama for local LLM hosting
  • Interface: OpenWebUI for model interaction

Docker Desktop showing Ollama containers running

🧠 Local LLM Experimentation

This is where the project took an interesting turn. Rather than continuing to rely on cloud APIs, I started experimenting with local language models through Ollama.

Why local LLMs?

  • Cost control: No per-token charges for experimentation
  • Privacy: Sensitive data stays on local infrastructure
  • Learning opportunity: Understanding how different models perform

The Docker Compose setup made it trivial to spin up different model combinations and test their performance on my email processing use case.

⚠️ Reality Check: Local vs. Cloud Performance

Let's be honest here β€” using an LLM locally is never going to be a fully featured replacement for the likes of ChatGPT or Claude. This became apparent pretty quickly during my testing.

Performance realities:

  • Speed: Unless you're running some serious hardware, the performance will be a lot slower than the online AI counterparts
  • Model capabilities: Local models (especially smaller ones that run on consumer hardware) lack the sophisticated reasoning of GPT-4 or Claude
  • Resource constraints: My standard PC setup meant I was limited to smaller model variants
  • Response quality: Noticeably less nuanced and accurate responses compared to cloud services

Where local LLMs do shine:

  • Privacy-sensitive tasks: When you can't send data to external APIs
  • Development and testing: Iterating on prompts without burning through API credits
  • Learning and experimentation: Understanding how different model architectures behave
  • Offline scenarios: When internet connectivity is unreliable

The key insight: local LLMs are a complement to cloud services, not a replacement. Use them when privacy, cost, or learning are the primary concerns, but stick with cloud APIs when you need reliable, high-quality results.

πŸ”— Hybrid Approach: Best of Both Worlds

The final configuration became a hybrid approach:

  • OpenWebUI connected to Azure OpenAI for production-quality responses
  • Local Ollama models for development and privacy-sensitive testing
  • Docker containers exposed through Docker Desktop for easy management

This gave me the flexibility to choose the right tool for each task β€” cloud APIs when I need reliability and performance, local models when I want to experiment or maintain privacy.

OpenWebUI local interface with model selection

πŸ’° Cost Reality Check

After a week of experimentation, here's how the costs actually broke down:

Service Trial Period Monthly Cost My Usage Notes
n8n Cloud 14 days free €20/month 2 weeks testing Full OAuth2 features
OpenAI API Pay-per-use Variable $10 in 2 days Expensive for testing
Azure OpenAI Free credits Β£120/month budget ~Β£15 used Better for experimentation
Self-hosted Free Hardware + time 2 days setup OAuth2 complexity

Key insight: The "free" self-hosted option came with a significant time cost β€” debugging authentication issues for hours vs. having things work immediately in the cloud version.


πŸ“Š Current State: Lessons Learned & Next Steps

After a week of container deployments, OAuth battles, and API integrations, here's where I've landed:

βœ… What's Working Well

Technical Stack:

  • n8n self-hosted: Currently running 2 active workflows (weather alerts, basic data processing)
  • Azure OpenAI integration: Reliable and cost-effective for AI processing β€” saving ~Β£25/month vs. direct OpenAI API
  • Docker containerisation: Easy deployment and management across different environments
  • WSL environment: 10x performance improvement over Pi 5 for local AI model loading

Process Improvements:

  • Iterative approach: Start simple, add complexity gradually β€” this saved significant debugging time
  • Hybrid cloud/local strategy: Use the right tool for each requirement rather than forcing one solution
  • Container flexibility: Easy to migrate and scale across different hosts when hardware constraints appear

Daily productivity impact: While the original job email automation isn't fully solved, the weather automation saves ~10 minutes daily, and the learning has already paid dividends in other automation projects.

⚠️ Ongoing Challenges (The Work-in-Progress List)

Authentication Issues:

  • OAuth2 integration with Outlook.com/Exchange Online still unresolved
  • Need to explore alternative authentication methods or different email providers
  • May require diving deeper into Azure app registration configurations

Workflow Limitations:

  • Original job email processing goal partially achieved but needs refinement
  • Multiple job URL extraction per email still needs work
  • Error handling and retry logic need improvement

Infrastructure Decisions:

  • Balancing local vs. cloud resources for different use cases
  • Determining optimal Docker deployment strategy for production workflows
  • Managing costs across multiple AI service providers

Decision-making process during failures: When something doesn't work, I typically: (1) Troubleshoot the exact error using ChatGPT or Anthropic Claude, (2) Search for similar issues in community forums, (3) Try a simpler alternative approach, (4) If still stuck after 2-3 hours, pivot to a different method rather than continuing to debug indefinitely.

πŸš€ Next Steps & Future Experiments

Short-term goals (next 2-4 weeks):

  1. Resolve OAuth2 authentication for proper email integration
  2. Improve job URL extraction accuracy β€” tackle the multiple URLs per email challenge
  3. Add error handling and logging to existing workflows
  4. Explore alternative email providers if Outlook.com integration remains problematic

Medium-term exploration (next 2-3 months):

  1. Local LLM performance tuning for specific use cases
  2. Workflow templates for common automation patterns
  3. Integration with other productivity tools (calendar, task management)
  4. Monitoring and alerting for automated workflows

πŸ› οΈ Quick Wins for Beginners

If you're just starting your AI automation journey, here are the lessons learned that could save you time:

🎯 Start Simple First

  • Begin with n8n cloud trial to understand the platform without authentication headaches
  • Use simple APIs (weather, RSS feeds) before tackling complex ones (email OAuth2)
  • Test with smaller AI models before jumping to GPT-4

πŸ’‘ Budget for Experimentation

  • Set aside $20-50 for API testing β€” it goes faster than you think
  • Azure OpenAI credits can be more cost-effective than direct OpenAI API for learning
  • Factor in time costs when choosing self-hosted vs. cloud solutions

πŸ”§ Have Fallback Options Ready

  • Plan alternative authentication methods (app passwords vs. OAuth2)
  • Keep both cloud and local AI options available
  • Document what works and what doesn't for future reference

πŸ”§ Technical Resources & Documentation

For anyone inspired to start their own AI automation journey, here are the key resources that proved invaluable:

πŸ› οΈ Core Tools & Platforms

  • n8n β€” Visual workflow automation platform
  • Docker β€” Containerisation platform
  • Docker Compose β€” Multi-container orchestration tool
  • OpenMediaVault β€” NAS/storage management solution

πŸ€– AI & LLM Resources

πŸ“š Setup Guides & Documentation

πŸ”§ Troubleshooting Common Issues

Based on my week of trial and error, here are the most common problems you'll likely encounter:

πŸ” OAuth2 Authentication Failures

Symptoms: "Invalid redirect URI" or "Authentication failed" errors when connecting to email services.

Likely causes:

  • Redirect URI mismatch between app registration and n8n configuration
  • Self-hosted instance not using HTTPS for callbacks
  • App registration missing required API permissions

Solutions to try:

  • Use app passwords instead of OAuth2 where possible (Gmail, Outlook.com) β€” Note: App passwords are simpler username/password credentials that bypass OAuth2 complexity but offer less security
  • Ensure your n8n instance is accessible via HTTPS with valid SSL certificate
  • Double-check app registration redirect URIs match exactly (including trailing slashes)
  • Start with cloud trial to verify workflow logic before self-hosting

🐳 Container Performance Issues

Symptoms: Slow model loading, container crashes, high memory usage.

Likely causes:

  • Insufficient RAM allocation to Docker
  • CPU-intensive models running on inadequate hardware
  • Competing containers for limited resources

Solutions to try:

  • Increase Docker memory limits in Docker Desktop settings
  • Use smaller model variants (7B instead of 13B+ parameters)
  • Monitor resource usage with docker stats command
  • Consider migrating from Pi to x86 hardware for better performance

πŸ’Έ API Rate Limiting and Costs

Symptoms: API calls failing, unexpected high costs, token limits exceeded.

Likely causes:

  • Testing with expensive models (GPT-4) instead of cheaper alternatives
  • No rate limiting in workflow configurations
  • Inefficient prompt design causing high token usage

Solutions to try:

  • Start testing with GPT-3.5-turbo or GPT-4-mini models
  • Implement workflow rate limiting and retry logic
  • Optimize prompts to reduce token consumption
  • Set API spending alerts in provider dashboards

πŸ’» Resource Requirements Summary

Minimum Requirements for Recreation:

  • Cloud approach: n8n trial account + $20-50 API experimentation budget
  • Self-hosted approach: 8GB+ RAM, Docker knowledge, 2-3 days setup time
  • Local AI experimentation: 16GB+ RAM recommended, considerable patience, NVMe storage preferred
  • Network: Stable broadband connection for cloud API performance

πŸ’­ Final Thoughts: The Joy of Controlled Chaos

What started as a simple email automation project became a comprehensive exploration of modern AI automation tools. While I didn't achieve my original goal completely (yet), the journey provided invaluable hands-on experience with:

  • Container orchestration across different environments
  • AI service integration patterns and best practices
  • Authentication complexity in self-hosted vs. cloud environments
  • Hybrid deployment strategies for flexibility and cost control

The beauty of this approach is that each "failed" experiment taught me something valuable about the tools and processes involved. The OAuth2 authentication issues, while frustrating, highlighted the importance of proper authentication design. The container migrations demonstrated the flexibility of modern deployment approaches.

Most importantly: I now have a functional foundation for AI automation experiments, with both cloud and local capabilities at my disposal.

Is it overengineered for a simple email processing task? Absolutely. Was it worth the learning experience? Without question.

Have you tackled similar AI automation projects? I'd particularly love to hear from anyone who's solved the OAuth2 self-hosting puzzle or found creative workarounds for email processing limitations. Drop me a line if you've found better approaches to any of these challenges.


πŸ“Έ Image Requirements Summary

For anyone recreating this setup, here are the key screenshots included in this post:

  1. n8n-jobs-workflow-openai.png β€” Original workflow using direct OpenAI API (the expensive version that burned through $10 in 2 days)
  2. azure-openai-deployment.png β€” Azure OpenAI Studio showing GPT-4 Mini deployment configuration
  3. n8n-jobs-workflow-azure.png β€” Improved workflow using Azure OpenAI integration (the cost-effective version)
  4. omv-docker-n8n-containers.png β€” OpenMediaVault interface showing Docker container management on Pi 5
  5. n8n-weather-workflow.png β€” Simple weather API to Gmail workflow demonstrating successful self-hosted setup
  6. docker-desktop-ollama.png β€” Docker Desktop showing Ollama and OpenWebUI containers running on WSL
  7. openwebui-local.png β€” OpenWebUI interface showing both Azure OpenAI and local model selection options

Each image demonstrates the practical implementation rather than theoretical concepts, helping readers visualize the actual tools and interfaces involved in the automation journey.

Share on Share on