TD SYNNEX  Local AI on Copilot+ PC
Complete Setup Guide — Microsoft Foundry Local + Intel Core Ultra NPU
TD SYNNEX Cloud Engineering · Copilot+ PC Demo Program · May 2026

A fully local AI demo environment running on the Intel Core Ultra NPU — no cloud, no API keys, no data leaving the device. When complete you will have: Microsoft Foundry Local serving AI models on-device, a professional SMB tool demo app (Email Rewriter, PII Detector, Summarizer, Ticket Triage, Tone Fixer), Open WebUI for free-form chat, and one-click batch files to start and stop everything. This guide is the exact process validated on a Dell Pro 16 with Intel Core Ultra 7 and 40+ TOPS NPU.

📦 Files Included in the Download ZIP

The ZIP package contains all files referenced in this guide: local_ai_demo.py (the Flask demo app), keepalive.py (prevents model idle timeout), start_demo.bat (one-click launcher), stop_demo.bat (clean shutdown), and requirements.txt. Download and extract the ZIP into your project folder — you do not need to manually copy any of the code blocks in Phases 4 and 5 unless you want to.

The code blocks are included in this guide for transparency so you can see exactly what each file does before running it.

Contents
  1. System Requirements
  2. Pre-Installation Fixes
  3. Core Software Installs
  4. Project Folder Setup
  5. Python Files — Demo App
  6. Python Files — Keepalive
  7. Batch Files — Start & Stop
  8. Foundry Configuration & Models
  9. Model Catalog
  10. Open WebUI One-Time Setup
  11. First Run & Verification
  12. Running Your Demo
  13. Troubleshooting

1 · System Requirements

Verify Your Hardware & OS Before Starting
All three requirements must be met — no exceptions
RequirementWhat to CheckMinimum
Operating System Settings → System → About → OS Build Windows 11 24H2 (Build 26100+). NPU acceleration via Foundry Local requires 24H2 — earlier versions will not work.
Processor Settings → System → About → Processor Intel Core Ultra (any series). Confirmed working: Core Ultra 7 Lunar Lake (40+ TOPS NPU). Meteor Lake (10 TOPS) works at reduced performance.
RAM Settings → System → About → Installed RAM 16 GB minimum. 32 GB recommended for running multiple models or simultaneous services.
Free Disk Space File Explorer → This PC → C: drive free space 20 GB minimum (3–5 GB per AI model + ~10 GB for Open WebUI dependencies).
Intel NPU Driver Device Manager → Neural Processors → Intel® AI Boost Must be present with no yellow warning icon. Download latest from intel.com NPU Driver page.
Copilot+ PC Designation: Intel Core Ultra 200V (Lunar Lake) processors deliver 48 TOPS from the NPU alone, meeting Microsoft's 40+ TOPS Copilot+ requirement. This is the validated hardware platform for this demo. The full platform delivers 120 TOPS across NPU + iGPU + CPU combined.
↑ Back to Contents

2 · Pre-Installation Fixes

Two Windows settings must be corrected before installing anything. Skipping these causes silent failures that are difficult to diagnose.

A
Disable Python App Execution Aliases
Critical — without this, all Python commands fail silently
⚠ Do this first or nothing else works. Windows ships with a "Python" App Execution Alias that intercepts the python command and redirects it to the Microsoft Store. Every Python install and pip command will fail until this is disabled.
1
Open App Execution Aliases
Press Windows key, type App execution aliases, press Enter.
2
Disable both Python entries
Find App Installer - python.exe and App Installer - python3.exe. Toggle both OFF.
3
Close Settings
No restart required. The fix takes effect immediately.
B
Verify Intel NPU Driver
Required for NPU-accelerated model inference
1
Open Device Manager
Right-click the Start button → Device Manager
2
Expand Neural Processors
Look for Intel® AI Boost. No yellow warning icon means the driver is installed correctly. If it's missing or shows a warning, download the latest driver from intel.com, install it, and reboot.
↑ Back to Contents

3 · Core Software Installations

Open PowerShell as Administrator for all commands in this section. Press Windows key, type PowerShell, right-click, select "Run as administrator", click Yes. Keep this window open for the entire installation process.
3
Install Python 3.11, Git, and Microsoft Foundry Local
All three installed via winget — Windows built-in package manager

Step 1 — Install Python 3.11

Python is the runtime for the demo app and keepalive script. Version 3.11 is required.

PowerShell — Run as Administrator ⧉ Copy
winget install Python.Python.3.11
After install completes, close PowerShell completely and reopen it as Administrator. Verify with: python --version — you should see Python 3.11.x.

Step 2 — Install Git

Git is used to clone repositories and is required for some dependencies.

PowerShell — Run as Administrator ⧉ Copy
winget install Git.Git

Close and reopen PowerShell as Administrator after this completes.

Step 3 — Install Microsoft Foundry Local

Foundry Local is Microsoft's official on-device AI runtime. It automatically detects your NPU, downloads hardware-optimized model variants, and serves them through an OpenAI-compatible API.

PowerShell — Run as Administrator ⧉ Copy
winget install Microsoft.FoundryLocal

Verify: foundry --version — should return a version number. If you see an error, close and reopen PowerShell.

↑ Back to Contents

4 · Project Folder Setup

4
Create Project Directory and Python Virtual Environment
Isolates all demo dependencies from the rest of your system

Step 1 — Create the project folder and navigate to it

PowerShell — Run as Administrator ⧉ Copy
mkdir C:\LocalAI cd C:\LocalAI
You can use any folder path you prefer. This guide uses C:\LocalAI. If you choose a different path, update it consistently in all batch files and future commands. Ken's original setup uses C:\windows\system32\surface-npu-demo — the batch files in the ZIP are pre-configured for that path.

Step 2 — Create a Python virtual environment

A virtual environment keeps all demo packages isolated. This prevents version conflicts with other Python projects on your system.

PowerShell ⧉ Copy
python -m venv venv

Step 3 — Activate the virtual environment

PowerShell ⧉ Copy
venv\Scripts\activate
Your prompt should now show (venv) at the beginning. The venv must be activated every time you open a new PowerShell window. If (venv) is not showing, run the activate command again before running any other commands.

Step 4 — Install Python packages

Install the core packages first (fast), then Open WebUI separately (large — allow 5–10 minutes).

PowerShell — Core packages (fast) ⧉ Copy
pip install flask openai requests
PowerShell — Open WebUI (~500 MB, takes 5-10 minutes) ⧉ Copy
pip install open-webui
Open WebUI is a large install. You will see hundreds of packages downloading — this is normal. The install is complete when you see "Successfully installed open-webui..." and the prompt returns. Do not close the window during installation. You may also see some packages being uninstalled and reinstalled to resolve version conflicts — this is also normal.

Step 5 — Copy the demo files into your project folder

Extract the ZIP package you downloaded and copy all four files into C:\LocalAI (or your chosen project folder):

Files to place in your project folder

local_ai_demo.py Flask
The professional demo app — Email Rewriter, PII Detector, Summarizer, Ticket Triage, Tone Fixer, Free Chat
keepalive.py Python
Pings the model every 4 minutes to prevent Foundry's 10-minute idle timeout from unloading it
start_demo.bat Batch
One-click launcher — starts all services and opens both browser tabs automatically
stop_demo.bat Batch
Clean shutdown — stops all services and frees all memory
Update the path in both batch files. The batch files reference C:\windows\system32\surface-npu-demo by default. If you're using a different folder, open each .bat file in Notepad and replace that path with your actual folder path (e.g., C:\LocalAI) on every line it appears.
↑ Back to Contents

5 · Demo App — local_ai_demo.py

This is the Flask-based professional demo application. It is included in the ZIP — this section shows the full code for transparency. If you downloaded the ZIP, skip to Section 6.

A single-file Flask web application that serves a professional dark-themed UI at localhost:8501. It connects to Foundry Local's OpenAI-compatible API at localhost:57055/v1 and exposes 6 tools: Email Rewriter, PII Detector, Summarizer, Ticket Triage, Tone Fixer, and Free Chat. Each tool uses a purpose-built system prompt designed for short, focused outputs that stay within the on-device model's context window.

To create manually: open Notepad, paste the code below, save as local_ai_demo.py in your project folder (select "All Files" as the file type when saving so Notepad doesn't add .txt).

local_ai_demo.py — Flask Demo Application ⧉ Copy All
# local_ai_demo.py — TD SYNNEX Local AI Demo App # Flask web app serving professional SMB tool UI at localhost:8501 # Connects to Microsoft Foundry Local at localhost:57055 # Run: python local_ai_demo.py from flask import Flask, request, jsonify, render_template_string from openai import OpenAI app = Flask(__name__) FOUNDRY_URL = "http://localhost:57055/v1" FOUNDRY_KEY = "local" MAX_TOKENS = 380 # See full source in downloaded local_ai_demo.py # This abbreviated view shows the key configuration. # The full file is ~760 lines including the complete HTML/CSS/JS UI. def call_foundry(model, system, user): client = OpenAI(base_url=FOUNDRY_URL, api_key=FOUNDRY_KEY) resp = client.chat.completions.create( model=model, messages=[ {"role": "system", "content": system}, {"role": "user", "content": user}, ], max_tokens=MAX_TOKENS, temperature=0.3, ) return resp.choices[0].message.content.strip() if __name__ == "__main__": app.run(host="0.0.0.0", port=8501, debug=False)

The full source (763 lines including complete UI) is in local_ai_demo.py in the ZIP. The snippet above shows the core configuration — use the ZIP file, not this snippet.

6 · Keepalive Script — keepalive.py

Microsoft Foundry Local has a 10-minute idle timeout — if no requests are made, it unloads the model from memory. The next request then requires a 20–30 second model reload. During a demo, this causes visible slowness and occasional errors. The keepalive script sends a tiny invisible ping every 4 minutes to reset the timer and keep the model hot and ready.

keepalive.py — Complete Source (copy this entire block) ⧉ Copy All
""" Foundry Local Keepalive Sends a lightweight ping to the model every 4 minutes to prevent the 10-minute idle timeout from unloading it. Run: python keepalive.py Leave running in the background during demos. """ import time import datetime from openai import OpenAI FOUNDRY_URL = "http://localhost:57055/v1" FOUNDRY_KEY = "local" MODEL = "phi-4-mini-instruct-openvino-npu:3" INTERVAL = 240 # seconds (4 min — under the 10-min timeout) client = OpenAI(base_url=FOUNDRY_URL, api_key=FOUNDRY_KEY) def ping(): resp = client.chat.completions.create( model=MODEL, messages=[{"role": "user", "content": "ok"}], max_tokens=5, temperature=0, ) return resp.choices[0].message.content.strip() print("=" * 48) print(" Foundry Local Keepalive - Model stays hot") print(f" Pinging every {INTERVAL//60} minutes. Ctrl+C to stop.") print("=" * 48) while True: try: result = ping() ts = datetime.datetime.now().strftime("%H:%M:%S") print(f"[{ts}] Model alive - response: '{result}'") except Exception as e: ts = datetime.datetime.now().strftime("%H:%M:%S") print(f"[{ts}] Ping failed: {e}") print(" Check: foundry service status") time.sleep(INTERVAL)

7 · Batch Files — Start and Stop

These are plain text files saved with a .bat extension. To create them manually: open Notepad, paste the code, click File → Save As, select All Files as file type, and name the file with the .bat extension. Or use the files from the ZIP.

Update the path before using. Both batch files contain C:\windows\system32\surface-npu-demo — replace this with your actual project folder path on every line it appears. Use Find & Replace in Notepad (Ctrl+H) to do this quickly.

start_demo.bat — Launch Everything

start_demo.bat — Double-click to start the full demo stack ⧉ Copy All
@echo off echo Starting Local AI Demo Stack... echo. cd C:\windows\system32\surface-npu-demo call venv\Scripts\activate echo [1/4] Starting Foundry Local engine... foundry service start echo [2/4] Loading Phi-4 Mini onto NPU... foundry model load phi-4-mini echo [3/4] Starting keepalive (prevents idle timeout)... start "Keepalive" cmd /k "cd C:\windows\system32\surface-npu-demo && venv\Scripts\activate && python keepalive.py" echo [4/4] Launching demo app and Open WebUI... start "Demo App 8501" cmd /k "cd C:\windows\system32\surface-npu-demo && venv\Scripts\activate && python local_ai_demo.py" start "Open WebUI 3000" cmd /k "cd C:\windows\system32\surface-npu-demo && venv\Scripts\activate && open-webui serve --port 3000" echo. echo Waiting for services to start (8 seconds)... timeout /t 8 /nobreak >nul echo Opening browser tabs... start "" "http://localhost:8501" timeout /t 3 /nobreak >nul start "" "http://localhost:3000" echo. echo ============================================= echo Demo stack is LIVE echo localhost:8501 - SMB Tool Demo App echo localhost:3000 - Open WebUI Chat echo ============================================= echo. echo TO SHUT DOWN: double-click stop_demo.bat echo. pause

stop_demo.bat — Clean Shutdown

stop_demo.bat — Double-click to stop all services and free memory ⧉ Copy All
@echo off echo Shutting down Local AI Demo Stack... echo. echo Stopping Foundry Local service... foundry service stop echo Killing any remaining Python processes (demo app + keepalive)... taskkill /F /IM python.exe >nul 2>&1 echo Killing Open WebUI if still running... taskkill /F /FI "WINDOWTITLE eq Open WebUI 3000" >nul 2>&1 echo. echo ============================================= echo All services stopped. Resources freed. echo ============================================= echo. pause
Why stop_demo.bat uses taskkill: Simply closing PowerShell windows does not always stop Foundry Local — the background service continues running and consuming memory. The stop script explicitly stops the service and kills any remaining Python processes, ensuring a completely clean state after the demo.
↑ Back to Contents

8 · Foundry Local Configuration & Model Downloads

8
Pin Foundry to a Fixed Port and Download Models
One-time configuration — Foundry remembers these settings

Step 1 — Pin Foundry to port 57055

By default Foundry picks a random port on each start, which breaks the Open WebUI connection. Pinning it to a fixed port means the connection setup in Section 10 only needs to be done once.

PowerShell — Run once, permanent setting ⧉ Copy
foundry service set --port 57055 foundry service restart

Step 2 — Test Foundry and verify NPU is active

PowerShell ⧉ Copy
foundry service start foundry model run phi-4-mini

Foundry downloads the NPU-optimized variant of Phi-4 Mini (~3 GB, one time only) and opens an interactive chat. Type anything and press Enter to confirm it's working. Type /exit to quit. Subsequent launches load from cache in seconds.

What Foundry does automatically: It detects your Intel Core Ultra NPU, selects the OpenVINO-optimized model variant for your hardware, and routes inference to the dedicated AI chip. You don't configure any of this manually.

Step 3 — Download your model library

Run these to pre-download all recommended demo models. Downloads happen once and are cached locally. Allow 5–15 minutes per model depending on your connection speed.

PowerShell — Download all recommended models ⧉ Copy
# Primary demo model — Microsoft's Phi-4 Mini on NPU foundry model download phi-4-mini # DeepSeek R1 7B on NPU — shows visible reasoning chain foundry model download deepseek-r1-7b # Mistral 7B on NPU — strong analytical outputs foundry model download mistral-7b-v0.2 # DeepSeek R1 1.5B on iGPU — lightweight, fastest responses foundry model download deepseek-r1-1.5b # Qwen3 0.6B on CPU — ultra-lightweight, always available foundry model download qwen3-0.6b

Step 4 — List all available models at any time

PowerShell ⧉ Copy
foundry model list

This shows all models in the catalog with their available device variants (NPU/GPU/CPU), file sizes, and licenses. Foundry automatically selects the best variant for your hardware when you load a model by alias.

↑ Back to Contents

9 · Model Catalog

The full Foundry Local model catalog is at foundrylocal.ai/models. This is the Microsoft-curated catalog of models optimized for on-device use — every model has been quantized and tested across consumer hardware. Use the CLI (foundry model list) to see what's available for your specific hardware configuration.

Recommended Models for SMB Partner Demos

Alias (load command)DeviceSizeBest ForDemo Value
phi-4-mini NPU ~3 GB Default demo model. Microsoft's own SLM — strong credibility with Microsoft partners. Reliable, consistent outputs across all 6 tool categories. ⭐⭐⭐⭐⭐ Best all-around
deepseek-r1-7b NPU 4.2 GB Reasoning showcase. DeepSeek R1 shows its thinking chain before answering — partners can watch the AI reason through a problem step by step, entirely on the NPU. ⭐⭐⭐⭐⭐ Wow factor
mistral-7b-v0.2 NPU 3.6 GB Strong analytical outputs. Mistral 7B produces longer, more detailed responses than Phi-4 Mini. Better for complex licensing analysis and ROI business case prompts. ⭐⭐⭐⭐ Analysis depth
deepseek-r1-1.5b iGPU 1.3 GB Fastest responses. Lightweight DeepSeek variant. Good for live typing demos where speed matters more than depth. Runs on iGPU (not NPU). ⭐⭐⭐ Speed demo
qwen3-0.6b CPU 0.6 GB Ultra-lightweight fallback. Runs on CPU, tiny footprint. Good for demonstrating that even without NPU or GPU, local AI works — just slower. Always available regardless of hardware. ⭐⭐ Fallback only
Skip deepseek-r1-14b for demos. This model has no NPU variant, requires 8+ GB of GPU memory, and is too slow for live demos. The 7B NPU variant (deepseek-r1-7b) tells the same story better.

Context Window Limitations — Important Demo Guidance

On-device NPU models use aggressively compressed variants with smaller context windows than their cloud counterparts. This is the key tradeoff of local AI inference:

ModelApprox. Context WindowPractical Impact
Phi-4 Mini (NPU)~4,200 tokens total4–6 back-and-forth exchanges before context errors. Start a new chat for each demo scenario.
DeepSeek R1 7B (NPU)~4,000 tokens totalBurns tokens fast due to verbose reasoning chain. Use as a one-shot single-prompt tool, not a conversation.
Mistral 7B (NPU)~4,000 tokens totalSimilar to Phi-4 Mini. One topic per chat session.
Demo best practice: One topic = one fresh chat. Click the pencil icon (✎) in Open WebUI or refresh localhost:8501 between scenarios. This is not a limitation to hide — it's a natural talking point: "These models are optimized for focused tasks, not endless threads. For the SMB use cases we're talking about — email rewriting, PII scanning, document summarization — each runs as a discrete task anyway."
↑ Back to Contents

10 · Open WebUI — One-Time Connection Setup

Open WebUI needs to be told where Foundry Local is running. Once configured, this connection persists across restarts. You only need to do this the first time Open WebUI is launched, or if it ever shows "No models available."

10
Connect Open WebUI to Foundry Local
localhost:3000 → Admin Panel → Settings → Connections
1
Start the services and open Open WebUI
Run start_demo.bat or manually start the services. Navigate to localhost:3000 in your browser. Create a local admin account on first launch (any username/password — nothing goes to the cloud).
2
Open Admin Panel → Settings → Connections
Click your profile icon (bottom left corner) → Admin PanelSettings tab → Connections in the left sidebar.
3
Add the Foundry Local connection
Under OpenAI API, click the + button next to "Manage OpenAI API Connections". Enter exactly:
URL: http://localhost:57055/v1 (click to copy)
API Key: local (click to copy — any value works, Foundry doesn't check it)
4
Click Save, then return to chat
Go back to the main chat. Click Select a model at the top. Your loaded models will now appear in the dropdown. Select phi-4-mini-instruct-openvino-npu to start.
↑ Back to Contents

11 · First Run & Verification

11
Verify the Full Stack Is Working
Run through this checklist on first setup and before any important demo

Pre-Demo Verification Checklist

Foundry service running: foundry service status shows service on port 57055
If not: run foundry service start
Model loaded: foundry model list --loaded shows phi-4-mini active
If not: run foundry model load phi-4-mini
Keepalive running: Keepalive window shows "Model alive" messages
If not: open a new PS window, activate venv, run python keepalive.py
Demo app accessible: localhost:8501 loads the full dark-themed UI
If not: check that python local_ai_demo.py is running and no errors in that window
Open WebUI accessible: localhost:3000 shows the chat interface
If not: Open WebUI takes ~45 seconds to fully start — wait and refresh
Tool test: Run the PII Detector in the demo app with the sample patient data
Confirms end-to-end NPU inference is working before the demo starts
NPU visualization ready: Task Manager → Performance → NPU tab is open in the background
Run a prompt and confirm the NPU graph spikes to 80–100% during inference
↑ Back to Contents

12 · Running Your Demo

Starting Up

Option A (Recommended): Double-click start_demo.bat. It handles everything and opens both browser tabs automatically. Wait ~45 seconds for Open WebUI to finish starting before presenting localhost:3000.

Option B (Manual): Open three PowerShell windows as Administrator and run the following in each:

Window 1 — Engine + Keepalive⧉ Copy
cd C:\LocalAI venv\Scripts\activate foundry service start foundry model load phi-4-mini python keepalive.py
Window 2 — Demo App (localhost:8501)⧉ Copy
cd C:\LocalAI venv\Scripts\activate python local_ai_demo.py
Window 3 — Open WebUI (localhost:3000)⧉ Copy
cd C:\LocalAI venv\Scripts\activate open-webui serve --port 3000

Switching Models Mid-Demo

PowerShell — Run in Window 1⧉ Copy
# Unload current model first, then load new one foundry model unload phi-4-mini foundry model load deepseek-r1-7b

After switching, update the model selector in the demo app sidebar. In Open WebUI, click the model name at the top of the chat and select the new model from the dropdown.

Shutting Down

Double-click stop_demo.bat — it stops Foundry, kills all Python processes, and frees all memory. Then close any remaining PowerShell windows.

↑ Back to Contents

13 · Troubleshooting

ProblemCauseFix
"python was not found" App Execution Alias not disabled Go to App execution aliases → toggle OFF both python.exe entries. Close and reopen PowerShell.
(venv) not showing in prompt Virtual environment not activated Run venv\Scripts\activate in your project folder. Every new PowerShell window needs this.
"No models available" in Open WebUI Connection to Foundry not configured Admin Panel → Settings → Connections → add http://localhost:57055/v1 with key local
Model very slow / timing out Idle timeout triggered — model was unloaded Ensure keepalive.py is running. Run foundry service restart then reload the model.
TransferEncodingError / Infer Request busy Context window full or model overloaded Start a new chat. Keep one topic per session. Restart Foundry if repeated: foundry service restart
Port conflict on 57055 Previous session left service running Run foundry service stop then foundry service start
Open WebUI blank at localhost:3000 Not fully started yet Wait for "Uvicorn running on 0.0.0.0:3000" in the PowerShell window. Takes ~45 seconds.
Batch file shows garbage characters File saved with wrong encoding (UTF-16) Open in Notepad, delete everything, paste from the code blocks above, save as ANSI encoding. Or use the files from the ZIP.
NPU not showing in Task Manager NPU driver not installed or not on 24H2 Confirm OS is Windows 11 24H2+. Install latest Intel NPU driver. Check Device Manager for Intel AI Boost.

Key Commands Reference

Foundry Local — Quick Reference Commands⧉ Copy
# Service management foundry service start # Start the AI engine foundry service stop # Stop and free memory foundry service restart # Reset if something is stuck foundry service status # Check port and running state foundry service set --port 57055 # Pin to fixed port (run once) # Model management foundry model list # Show full catalog with device variants foundry model load phi-4-mini # Load a model (auto-selects best variant) foundry model unload phi-4-mini # Unload to free memory for another model foundry model download phi-4-mini # Pre-download without loading # Cache management foundry cache remove # Delete cached models to free disk space
↑ Back to Contents