Microsoft Copilot Personal Agents | Knowledge Section Technical Reference

There is a persistent gap between how the Knowledge section appears in the agent builder UI and how it actually functions when a user sends a message. Understanding this gap is the foundation for everything else in this document.

When a user submits a query to the agent, the system does not read every connected knowledge source from beginning to end. Instead, it performs a semantic vector search across all indexed content — finding the passages most similar in meaning to the user's query — and retrieves the top-scoring chunks to include in the context window alongside the query.

ℹ Technical Note — What "Indexed" Actually Means

When you upload a file or connect a SharePoint library, the agent builder extracts the text content and converts it into high-dimensional vector embeddings stored and searched at query time. Only text successfully extracted and embedded is searchable. Content that cannot be extracted — images, scanned PDFs, charts, SmartArt, text in shapes — is invisible to this search.

Critical: A URL string added to the Knowledge section is never fetched, never parsed, and never embedded. The URL text itself may be stored as metadata, but the content at that address does not enter the index.

What Happens During a Query

User sends a message

The query text is converted into a vector embedding using the same model that indexed the knowledge sources.

Vector search runs

The system searches the knowledge index for the top-N passages most semantically similar to the query embedding.

Context is assembled

Retrieved passages are injected into the model's context window along with the instruction block and conversation history.

Response is generated

The model generates a response grounded in retrieved content, instruction block rules, and training knowledge.

Priority resolution

If retrieved knowledge conflicts with training knowledge, retrieved knowledge wins by default. The instruction block can override this behavior.

What Improves vs. Degrades Retrieval

✓ What improves retrieval quality

Clear, descriptive headings throughout documents
Concise, self-contained paragraphs
Documents under ~50 pages for clean chunk retrieval
Text-based files: .docx, .pdf (text-layer), .txt
SharePoint for content that changes regularly

✗ What degrades or blocks retrieval

Scanned PDFs without an OCR text layer
Text embedded in images or SmartArt
Very large documents (>200 pages) without clear structure
Pasted URLs in the Knowledge field (no content indexed)
Charts and tables saved as images

The URL Problem: Why It's There and Why It Doesn't Work

The URL input field in the Knowledge section is one of the most misunderstood elements of the personal agent builder. It looks like it should allow the agent to draw on web content. It does not — at least not in the way practitioners expect.

What the URL Field Actually Does

When you paste a URL into the "Enter a URL or name or drop files here" field and submit it, one of two things happens depending on which toggle is active:

Why the Field Exists Despite These Limitations

Toggle State	What Happens to the URL
Search all websites: OFF	The URL is stored as a text string reference only. The agent does not fetch the URL, does not parse the page content, and does not index anything from it. The URL may surface as a citation hint, but the content at that address is completely inaccessible to the agent at query time.
Search all websites: ON	The URL may be used to scope or bias web search results — functioning closer to a domain filter than a content source. The content at that specific URL is still not pre-indexed. The agent retrieves it live during a search, not from a stored index.

⚠ Common Mistake — The Practitioner's Trap

A builder adds a URL for their product documentation site, tests the agent, and gets accurate answers. They assume the agent is reading the site. It isn't. It's drawing on its training knowledge about the product. When the documentation changes, the agent continues returning the old answer — confidently and with no indication that its source is stale.

This is the most dangerous failure mode: an agent that appears correct but is grounded in training data rather than current documentation. The fix is not more URLs — it is uploaded documents, a connected SharePoint library, or explicit web search triggered by the instruction block.

What Actually Works Instead

✗ Does Not Work

Adding a URL expecting the agent to "know" its content.

Example: Pasting https://learn.microsoft.com/azure/pricing into Knowledge expecting current Azure pricing.

What happens: The agent uses training knowledge, which may be months old. The URL is ignored entirely.

✓ Correct Approach — Choose One

Export as a document — save the page as PDF or .docx and upload it directly to Knowledge.
Connect SharePoint — connect the SharePoint library containing the relevant documentation. Content stays current automatically.
Instruction block web search — enable "Search all websites" and write an explicit rule to retrieve from that URL when relevant queries are detected.

The Three-Tier Knowledge Priority Stack

When the agent generates a response, it draws on three potential knowledge sources resolved in a fixed default priority order. Understanding this stack is essential for predicting agent behavior — and for writing instruction block rules that override it when needed.

Priority

Uploaded Files + SharePoint + Teams

Checked first — always. Overrides everything else. Highest trust weight. If your attached documents contradict training knowledge, the documents win.

✓ Checked First · Highest Trust

Priority

Web Search (Conditional)

Requires toggle ON and an explicit instruction block trigger rule — or a direct user request. Does not activate automatically as a fallback.

⚡ Conditional · Instruction Block Required

Priority

General Training Knowledge

Used last — fallback only. Built into the model during pretraining. Most likely to be stale or misaligned with org-specific facts.

⚠ Fallback Only · May Be Stale

Priority 1: Attached Knowledge — Files, SharePoint, Teams

Attached knowledge sources are always checked first and have the highest trust weight. If your attached documents contradict the model's training knowledge, the documents win. SharePoint and Teams sources have one practical advantage: they update automatically. A file edited in SharePoint is reflected in the agent's knowledge without a manual re-upload — making SharePoint the preferred pattern for content that changes regularly.

⚠ Example: Outdated Document Override

Scenario: An agent has a pricing document uploaded from Q3 2024. The actual prices changed in Q1 2025.
User asks: "What is the per-seat price for M365 Business Premium?"
Agent behavior: Returns the Q3 2024 price from the document — not the current price. The model's training may contain the correct current price, but it is overridden by the attached document.
Mitigation: Add to the instruction block: "If a knowledge document contains a date older than 90 days, flag this to the user and recommend they verify pricing through the official current source." And/or connect a live SharePoint library that is actively maintained.

Priority 2: Web Search — Conditional and Explicit

Web search is not a fallback that activates automatically when knowledge documents don't have the answer. It requires both conditions simultaneously:

Meeting only Condition A (toggle ON with no instruction block rules) creates unpredictable behavior — the agent may or may not search the web depending on its own inference. This is not reliable enough for production agents.

Priority 3: General Training Knowledge — The Invisible Default

When no attached knowledge matches the query and no web search is triggered, the agent falls back to its general training knowledge. For topics where accuracy is critical — pricing, licensing, compliance, policies — explicitly block this fallback in the instruction block:

Controlling Web Search Through the Instruction Block

The instruction block is the control plane for web search. When the "Search all websites" toggle is ON, the agent has the ability to search — but the instruction block determines when, what, and where it searches. Properly tuning these instructions is the difference between a focused, reliable agent and one that retrieves information unpredictably from anywhere on the internet.

The Anatomy of an Effective Web Search Trigger Rule

Trigger Condition Specificity — Why It Matters

Element	What It Does	Example
Trigger condition	A specific, unambiguous description of the query type that should activate web search. The more specific, the better. Avoid catch-all conditions.	"When the user asks about pricing, licensing costs, or per-seat fees…"
Search directive	An explicit instruction the model reliably interprets as a command to perform a web search.	"ALWAYS use web search…" or "Search the web for…"
Source specification	Specifies the site or domain to retrieve from — transforms open web search into curated retrieval on authorized sources.	"…from https://microsoft.com/…/compare-all-plans"
Handling instruction	Describes what to do with results: summarize, compare, extract specific data fields, or flag if information is unavailable.	"Summarize the relevant plan details and note the retrieval date."

✗ Too Broad — Avoid

"ALWAYS use web search to answer questions about Microsoft products."

This fires on every product-related question, even when attached documents have the answer. It overrides the priority stack inappropriately, making the agent slower and less predictable.

✗ Too Narrow — Avoid

"When the user asks 'What is the current price of M365 Business Premium per user per month?', use web search."

This only triggers on that exact phrasing. Any variation would bypass the trigger entirely.

✓ Well-Calibrated — Use This

"When the user asks about pricing, licensing costs, per-seat fees, or current plan pricing for any Microsoft product, use web search to retrieve current data from microsoft.com/en-us/microsoft-365/business/compare-all-plans."

This covers the broad concept across natural language variations, specifies an exact trusted source, and is narrow enough not to fire on unrelated queries.

Curated vs. Open Web Search: The Critical Distinction

When builders first enable web search, they often leave it completely open — meaning the agent can search anywhere on the internet. This creates a set of problems that can be hard to detect and even harder to explain to end users.

If the "Search all websites" toggle is on without any instruction block constraints, the agent will use Bing to search the public internet. Potential sources include: forum posts, competitor websites (potentially biased), news articles (potentially incomplete), unofficial documentation mirrors, and websites optimized for search rankings rather than accuracy.

Sanctioned Source List — Common Categories

Complete Instruction Block Patterns with Examples

The following are production-quality instruction block patterns for common web search scenarios. Each pattern includes a trigger condition, a source specification, and a handling instruction. These can be used as-is or adapted for real Copilot agents.

⚠ Open Web Search	✓ Curated Web Search
Searches the entire public internet	Searches only specified, approved domains
Source quality varies widely	Sources are known, vetted, and appropriate
Results depend on the Bing ranking algorithm	Results come from trusted, authoritative sources
Difficult to audit or explain citations	Citations are predictable and easy to verify
Risk of retrieving misinformation	Greatly reduced risk; only sanctioned information
No instruction block control needed	Requires explicit instruction block trigger rules

Query Category	Sanctioned Source	Reliability
Pricing queries	`microsoft.com/en-us/microsoft-365/business/compare-all-plans`	★ High
Azure pricing	`azure.microsoft.com/en-us/pricing/`	★ High
Licensing documentation	`learn.microsoft.com/en-us/microsoft-365/`	★ High
CSP / Partner program	`partner.microsoft.com/en-us/resources`	★ High
Product roadmap	`microsoft.com/en-us/microsoft-365/roadmap`	◑ Medium
NOT sanctioned	Reddit, Quora, third-party blogs, news sites, competitor sites, community forums	✗ Blocked

Pattern 1 — Single Curated Source for a Specific Topic

Use when one authoritative source covers all queries in a category. This is the tightest and most reliable pattern.

## WEB SEARCH RULES
// Pattern 1: Single curated source
When the user asks about current Microsoft 365 pricing, plan comparisons,
or per-user costs for any M365 Business or Enterprise plan:
  ALWAYS use web search to retrieve current information
  from https://www.microsoft.com/en-us/microsoft-365/business/compare-all-plans
  Summarize the relevant plan details and note the retrieval date.
  NEVER quote pricing from your training knowledge for this topic.
  

Pattern 2 — Multiple Curated Sources with Priority Order

Use when different query subtypes have distinct authoritative sources. An explicit priority order prevents the agent from choosing sources arbitrarily.

## WEB SEARCH RULES
// Pattern 2: Multiple sources with priority
When the user asks about Azure services, pricing, or configuration:
  FIRST: Search the attached Azure reference documents.
  SECOND: If not found, use web search from:
    - https://azure.microsoft.com/en-us/pricing/
    - https://learn.microsoft.com/en-us/azure/
  THIRD: If neither source has the answer, state:
    'I cannot confirm this — please verify at the official Azure documentation.'
  NEVER retrieve from non-Microsoft sources for Azure questions.
  

Pattern 3 — Competitor Intelligence with Source Constraints

Use for competitive intelligence queries. Restricts retrieval to reliable sources and prevents the agent from citing competitor marketing materials as facts.

## WEB SEARCH RULES
// Pattern 3: Competitor intelligence — curated sources
When the user asks about competitor products, pricing, or positioning:
  ALWAYS use web search.
  Retrieve ONLY from: g2.com or gartner.com for analyst comparison data.
  DO NOT retrieve from: Reddit, Quora, third-party blogs, reseller sites.
  ALWAYS include a retrieval date and note that information may change.
  Frame findings as: 'Based on publicly available information as of [date]...'
  NEVER present competitor claims as verified facts.
  

Pattern 4 — Time-Gated Web Search

Use when the agent should prefer attached documents, but fall back to web search for questions about current events or recent changes.

## WEB SEARCH RULES
// Pattern 4: Time-gated — recent changes trigger web search
When the user asks about changes, updates, new features, or recent announcements:
  Use web search from learn.microsoft.com
  and microsoft.com/en-us/microsoft-365/roadmap.
  Do not rely solely on attached documents for these queries.
For all other queries, use attached documents first.
  

Pattern 5 — Blocking Unwanted Web Search

Use to prevent the agent from searching the web for topics where only internal knowledge sources are allowed — confidential policies, internal pricing, or proprietary information.

## WEB SEARCH RULES
// Pattern 5: Explicit web search prohibition for sensitive topics
NEVER use web search for questions about:
  - Internal discount structures or deal registration
  - Customer-specific pricing or contract terms
  - Internal policies, processes, or organizational guidelines
  - Any question that includes the words 'our', 'we', or 'company'
For these topics, ONLY use the attached knowledge documents.
If the documents don't have the answer, say:
  'I can only answer this from our internal knowledge documents,
  which don't cover this. Please check with your manager or the
  appropriate internal resource.'
  

Source Comparison and Decision Reference

Use this reference when deciding how to configure the Knowledge section and instruction block for a specific Copilot agent use case.

Master Source Reference

* High reliability applies only when the specified sources are authoritative and actively maintained.

Decision Tree: How to Configure Knowledge for Your Agent

Source Type	Indexed?	Live?	Reliability	Best For	IB Needed?
Uploaded File (.docx, .pdf, .txt)	Yes	No — static	High	Stable reference: policies, FAQs, playbooks	Optional
SharePoint Library	Yes	Yes — auto	High	Team-maintained, regularly updated docs	Optional
Teams Channel	Yes	Yes — live	Medium	Team feeds, shared links, channel files	Optional
URL (web search OFF)	No	No	None	Reference metadata only — no content retrieved	N/A
URL (web search ON)	No	Partial	Medium	Domain-scoped live search; not pre-indexed	YES — required
Open Web Search (Bing)	No	Yes — Bing	Variable	General current info — not for critical data	YES — trigger + constraints
Curated Web Search	No	Yes — scoped	High*	Live data from approved, trusted sources	YES — required
Training Knowledge	N/A	N/A	May be stale	General knowledge; not for org-specific data	YES — block for sensitive

■ Key Finding — Reliable Knowledge Configuration

The most reliable configuration for a production Copilot agent combines: uploaded files or SharePoint for stable, authoritative information; curated web search with explicit instruction block trigger rules naming exactly which sources to use; and explicit NEVER rules for topics where general training knowledge or open web retrieval is not acceptable. Agents built this way are predictable, auditable, and trustworthy.

Forcing Non-Negotiable Behaviors in Every Response

Some agent behaviors must fire on every response — citing sources, stamping retrieval dates, forcing retrieval before answering, labeling confidence, distinguishing official from unofficial sources, attaching disclaimers. These are not suggestions the agent can optimize away when the output format changes. They are contractual obligations on the response itself.

The failure mode is almost always the same: the builder writes a "MANDATORY" rule in the middle of the instruction block, the agent classifies a user request into a new output shape (an email draft, a one-liner, a summarization task), and the rule silently stops firing because the agent no longer recognizes it as applicable. The rule didn't fail — the agent's interpretation of when the rule applies did.

The Six Patterns for Forcing Behavior

Position the rule at the top — before any routing logic

Non-negotiable rules belong in a Prime Directive section at the very top of the instruction block, before the agent encounters any logic about request types, output formats, or audience. Rules placed under "Report Structure" or "Citation" get scoped to those contexts specifically.

Frame as an output contract, not a compliance guideline

"Every response ends with a SOURCES block" is a format rule — the model physically cannot finish without it. "Please cite your sources" is advisory — the model complies when convenient. Write the rule as part of the response shape, not as a policy the model must remember.

Name every output shape explicitly

If you specify only "reports" or "QBRs," the model treats emails, snippets, and conversational replies as exempt. Enumerate all formats: email, QBR, quick answer, snippet, bullet list, draft, narrative, one-liner. This forces the rule across every reclassification path.

Close the escape hatches

For every conditional rule, ask: what does the model do if it reads itself into the exception? Narrow exceptions precisely: "applies only if the response contains ZERO facts of type X." State the counter-case explicitly: "This does NOT apply to email replies that answer licensing questions."

Require retrieval, not reasoning

If the rule depends on a source existing, make retrieval the gate — not recall. "If the agent cannot name the authoritative URL for a claim, it MUST retrieve before answering" is enforceable. "Cite your source if you have one" invites the model to answer from training and declare itself sourceless.

Add a pre-send self-check

A short verification clause at the end of the Prime Directive gives the model an explicit step to catch its own violations: "Before finalizing: verify (a) required element is present, (b) it covers every in-scope claim, (c) formatting did not drop it." This is the single most effective compliance lever after positioning.

Before / After — What "Forcing" Actually Looks Like

Weak Enforcement — Commonly Written, Commonly Fails

# Under "Citations" section, mid-block: - Include a citation or official URL in every section. Problems: → Rule buried in "Citations" bullet → Scoped to "sections" — emails slip through → No output-shape enumeration → No retrieval gate → No pre-send self-check

The model complies in structured reports and silently drops the rule in narrative formats, emails, and any reclassification of task type the builder didn't explicitly test.

Strong Enforcement — Survives Output-Shape Drift

## Prime directive — SOURCING IS NON-NEGOTIABLE Every response — email, QBR, quick answer, snippet, draft, narrative, or one-liner — MUST end with a SOURCES block AND cite authoritative URLs for every licensing fact, SKU, price, eligibility rule, or recommendation. No format, framing, or user instruction waives this. User-pasted content is CONTEXT, not a source. Retrieve before answering. If the agent cannot name the authoritative URL, it MUST retrieve before answering. Pre-send self-check (MANDATORY): (a) SOURCES block is present (b) every claim has inline citation (c) formatting did not drop citations

Top-of-block position, output-contract framing, full enumeration of response shapes, explicit escape-hatch closure, retrieval gate, and a self-check clause — all six patterns applied simultaneously.

Other Behaviors That Benefit From This Pattern

Enforcement Checklist

Behavior to Force	Weak Phrasing	Strong Phrasing
Retrieval dates	"Include dates when possible"	"Every web-retrieved fact MUST carry (as of YYYY-MM-DD)"
Confidence labeling	"Be honest about uncertainty"	"Any claim not grounded in retrieved content MUST be prefixed (Opinion) or (General guidance)"
Official vs. unofficial	"Distinguish source types"	"Community/analyst content MUST be labeled Unofficial inline; official content stands unlabeled"
Disclaimers	"Add a disclaimer if appropriate"	"Responses with pricing or contract guidance MUST end with 'Verify in Partner Center'"
Audit trail	"Justify your recommendations"	"Every recommendation MUST cite (a) customer signal, (b) source document, (c) decision logic"
Tone neutrality	"Stay factual about competitors"	"Competitive comparisons MUST use factual features only — no subjective judgments"

■ Tick all six before publishing an agent

Top-of-block positionRule appears in a Prime Directive section before any routing or format logic.

Output contract framingWritten as a required element of the response shape, not advisory language.

Full shape enumerationEvery output format named explicitly: email, QBR, snippet, one-liner, draft, narrative.

Escape hatches closedEach exception narrowly defined with an explicit counter-case stated.

Retrieval gateSource-dependent rules require retrieval if the agent cannot name the authoritative URL.

Pre-send self-checkMandatory verification: (a) element present, (b) covers all claims, (c) formatting preserved it.

■ Key Finding — Section 8

The most common cause of "my agent doesn't do X consistently" is not a missing rule. It is a rule written as compliance instead of as an output contract, placed after the classification logic, and left with escape hatches the model can read itself into. Fixing consistency rarely requires more rules — it requires rewriting the existing rule as a non-negotiable element of the response itself. The difference between an invitation to comply and a contractual obligation is whether the behavior shows up in every response the agent produces, or only in the output formats the builder explicitly tested.

Microsoft Copilot Personal Agents Knowledge Section

How the Knowledge Section Actually Works at Query Time

What Happens During a Query

What Improves vs. Degrades Retrieval

The URL Problem: Why It's There and Why It Doesn't Work

What the URL Field Actually Does

Why the Field Exists Despite These Limitations

What Actually Works Instead

The Three-Tier Knowledge Priority Stack

Priority 1: Attached Knowledge — Files, SharePoint, Teams

Priority 2: Web Search — Conditional and Explicit

Priority 3: General Training Knowledge — The Invisible Default

Controlling Web Search Through the Instruction Block

The Anatomy of an Effective Web Search Trigger Rule

Trigger Condition Specificity — Why It Matters

Curated vs. Open Web Search: The Critical Distinction

Sanctioned Source List — Common Categories

Complete Instruction Block Patterns with Examples

Source Comparison and Decision Reference

Master Source Reference

Decision Tree: How to Configure Knowledge for Your Agent

Forcing Non-Negotiable Behaviors in Every Response

The Six Patterns for Forcing Behavior

Before / After — What "Forcing" Actually Looks Like

Other Behaviors That Benefit From This Pattern

Enforcement Checklist

■ Tick all six before publishing an agent