A fully local AI demo environment running on the Intel Core Ultra NPU — no cloud, no API keys, no data leaving the device. When complete you will have: Microsoft Foundry Local serving AI models on-device, a professional SMB tool demo app (Email Rewriter, PII Detector, Summarizer, Ticket Triage, Tone Fixer), Open WebUI for free-form chat, and one-click batch files to start and stop everything. This guide is the exact process validated on a Dell Pro 16 with Intel Core Ultra 7 and 40+ TOPS NPU.
The ZIP package contains all files referenced in this guide: local_ai_demo.py (the Flask demo app), keepalive.py (prevents model idle timeout), start_demo.bat (one-click launcher), stop_demo.bat (clean shutdown), and requirements.txt. Download and extract the ZIP into your project folder — you do not need to manually copy any of the code blocks in Phases 4 and 5 unless you want to.
The code blocks are included in this guide for transparency so you can see exactly what each file does before running it.
| Requirement | What to Check | Minimum |
|---|---|---|
| Operating System | Settings → System → About → OS Build | Windows 11 24H2 (Build 26100+). NPU acceleration via Foundry Local requires 24H2 — earlier versions will not work. |
| Processor | Settings → System → About → Processor | Intel Core Ultra (any series). Confirmed working: Core Ultra 7 Lunar Lake (40+ TOPS NPU). Meteor Lake (10 TOPS) works at reduced performance. |
| RAM | Settings → System → About → Installed RAM | 16 GB minimum. 32 GB recommended for running multiple models or simultaneous services. |
| Free Disk Space | File Explorer → This PC → C: drive free space | 20 GB minimum (3–5 GB per AI model + ~10 GB for Open WebUI dependencies). |
| Intel NPU Driver | Device Manager → Neural Processors → Intel® AI Boost | Must be present with no yellow warning icon. Download latest from intel.com NPU Driver page. |
Two Windows settings must be corrected before installing anything. Skipping these causes silent failures that are difficult to diagnose.
python command and redirects it to the Microsoft Store. Every Python install and pip command will fail until this is disabled.Python is the runtime for the demo app and keepalive script. Version 3.11 is required.
python --version — you should see Python 3.11.x.Git is used to clone repositories and is required for some dependencies.
Close and reopen PowerShell as Administrator after this completes.
Foundry Local is Microsoft's official on-device AI runtime. It automatically detects your NPU, downloads hardware-optimized model variants, and serves them through an OpenAI-compatible API.
Verify: foundry --version — should return a version number. If you see an error, close and reopen PowerShell.
C:\LocalAI. If you choose a different path, update it consistently in all batch files and future commands. Ken's original setup uses C:\windows\system32\surface-npu-demo — the batch files in the ZIP are pre-configured for that path.A virtual environment keeps all demo packages isolated. This prevents version conflicts with other Python projects on your system.
(venv) at the beginning. The venv must be activated every time you open a new PowerShell window. If (venv) is not showing, run the activate command again before running any other commands.Install the core packages first (fast), then Open WebUI separately (large — allow 5–10 minutes).
Extract the ZIP package you downloaded and copy all four files into C:\LocalAI (or your chosen project folder):
C:\windows\system32\surface-npu-demo by default. If you're using a different folder, open each .bat file in Notepad and replace that path with your actual folder path (e.g., C:\LocalAI) on every line it appears.This is the Flask-based professional demo application. It is included in the ZIP — this section shows the full code for transparency. If you downloaded the ZIP, skip to Section 6.
A single-file Flask web application that serves a professional dark-themed UI at localhost:8501. It connects to Foundry Local's OpenAI-compatible API at localhost:57055/v1 and exposes 6 tools: Email Rewriter, PII Detector, Summarizer, Ticket Triage, Tone Fixer, and Free Chat. Each tool uses a purpose-built system prompt designed for short, focused outputs that stay within the on-device model's context window.
To create manually: open Notepad, paste the code below, save as local_ai_demo.py in your project folder (select "All Files" as the file type when saving so Notepad doesn't add .txt).
The full source (763 lines including complete UI) is in local_ai_demo.py in the ZIP. The snippet above shows the core configuration — use the ZIP file, not this snippet.
Microsoft Foundry Local has a 10-minute idle timeout — if no requests are made, it unloads the model from memory. The next request then requires a 20–30 second model reload. During a demo, this causes visible slowness and occasional errors. The keepalive script sends a tiny invisible ping every 4 minutes to reset the timer and keep the model hot and ready.
These are plain text files saved with a .bat extension. To create them manually: open Notepad, paste the code, click File → Save As, select All Files as file type, and name the file with the .bat extension. Or use the files from the ZIP.
C:\windows\system32\surface-npu-demo — replace this with your actual project folder path on every line it appears. Use Find & Replace in Notepad (Ctrl+H) to do this quickly.By default Foundry picks a random port on each start, which breaks the Open WebUI connection. Pinning it to a fixed port means the connection setup in Section 10 only needs to be done once.
Foundry downloads the NPU-optimized variant of Phi-4 Mini (~3 GB, one time only) and opens an interactive chat. Type anything and press Enter to confirm it's working. Type /exit to quit. Subsequent launches load from cache in seconds.
Run these to pre-download all recommended demo models. Downloads happen once and are cached locally. Allow 5–15 minutes per model depending on your connection speed.
This shows all models in the catalog with their available device variants (NPU/GPU/CPU), file sizes, and licenses. Foundry automatically selects the best variant for your hardware when you load a model by alias.
The full Foundry Local model catalog is at foundrylocal.ai/models. This is the Microsoft-curated catalog of models optimized for on-device use — every model has been quantized and tested across consumer hardware. Use the CLI (foundry model list) to see what's available for your specific hardware configuration.
| Alias (load command) | Device | Size | Best For | Demo Value |
|---|---|---|---|---|
| phi-4-mini | NPU | ~3 GB | Default demo model. Microsoft's own SLM — strong credibility with Microsoft partners. Reliable, consistent outputs across all 6 tool categories. | ⭐⭐⭐⭐⭐ Best all-around |
| deepseek-r1-7b | NPU | 4.2 GB | Reasoning showcase. DeepSeek R1 shows its thinking chain before answering — partners can watch the AI reason through a problem step by step, entirely on the NPU. | ⭐⭐⭐⭐⭐ Wow factor |
| mistral-7b-v0.2 | NPU | 3.6 GB | Strong analytical outputs. Mistral 7B produces longer, more detailed responses than Phi-4 Mini. Better for complex licensing analysis and ROI business case prompts. | ⭐⭐⭐⭐ Analysis depth |
| deepseek-r1-1.5b | iGPU | 1.3 GB | Fastest responses. Lightweight DeepSeek variant. Good for live typing demos where speed matters more than depth. Runs on iGPU (not NPU). | ⭐⭐⭐ Speed demo |
| qwen3-0.6b | CPU | 0.6 GB | Ultra-lightweight fallback. Runs on CPU, tiny footprint. Good for demonstrating that even without NPU or GPU, local AI works — just slower. Always available regardless of hardware. | ⭐⭐ Fallback only |
On-device NPU models use aggressively compressed variants with smaller context windows than their cloud counterparts. This is the key tradeoff of local AI inference:
| Model | Approx. Context Window | Practical Impact |
|---|---|---|
| Phi-4 Mini (NPU) | ~4,200 tokens total | 4–6 back-and-forth exchanges before context errors. Start a new chat for each demo scenario. |
| DeepSeek R1 7B (NPU) | ~4,000 tokens total | Burns tokens fast due to verbose reasoning chain. Use as a one-shot single-prompt tool, not a conversation. |
| Mistral 7B (NPU) | ~4,000 tokens total | Similar to Phi-4 Mini. One topic per chat session. |
Open WebUI needs to be told where Foundry Local is running. Once configured, this connection persists across restarts. You only need to do this the first time Open WebUI is launched, or if it ever shows "No models available."
start_demo.bat or manually start the services. Navigate to localhost:3000 in your browser. Create a local admin account on first launch (any username/password — nothing goes to the cloud).http://localhost:57055/v1
(click to copy)
local
(click to copy — any value works, Foundry doesn't check it)
foundry service status shows service on port 57055foundry service startfoundry model list --loaded shows phi-4-mini activefoundry model load phi-4-minipython keepalive.pypython local_ai_demo.py is running and no errors in that windowOption A (Recommended): Double-click start_demo.bat. It handles everything and opens both browser tabs automatically. Wait ~45 seconds for Open WebUI to finish starting before presenting localhost:3000.
Option B (Manual): Open three PowerShell windows as Administrator and run the following in each:
After switching, update the model selector in the demo app sidebar. In Open WebUI, click the model name at the top of the chat and select the new model from the dropdown.
Double-click stop_demo.bat — it stops Foundry, kills all Python processes, and frees all memory. Then close any remaining PowerShell windows.
| Problem | Cause | Fix |
|---|---|---|
| "python was not found" | App Execution Alias not disabled | Go to App execution aliases → toggle OFF both python.exe entries. Close and reopen PowerShell. |
| (venv) not showing in prompt | Virtual environment not activated | Run venv\Scripts\activate in your project folder. Every new PowerShell window needs this. |
| "No models available" in Open WebUI | Connection to Foundry not configured | Admin Panel → Settings → Connections → add http://localhost:57055/v1 with key local |
| Model very slow / timing out | Idle timeout triggered — model was unloaded | Ensure keepalive.py is running. Run foundry service restart then reload the model. |
| TransferEncodingError / Infer Request busy | Context window full or model overloaded | Start a new chat. Keep one topic per session. Restart Foundry if repeated: foundry service restart |
| Port conflict on 57055 | Previous session left service running | Run foundry service stop then foundry service start |
| Open WebUI blank at localhost:3000 | Not fully started yet | Wait for "Uvicorn running on 0.0.0.0:3000" in the PowerShell window. Takes ~45 seconds. |
| Batch file shows garbage characters | File saved with wrong encoding (UTF-16) | Open in Notepad, delete everything, paste from the code blocks above, save as ANSI encoding. Or use the files from the ZIP. |
| NPU not showing in Task Manager | NPU driver not installed or not on 24H2 | Confirm OS is Windows 11 24H2+. Install latest Intel NPU driver. Check Device Manager for Intel AI Boost. |