Section 1

How the Knowledge Section Actually Works at Query Time

There is a persistent gap between how the Knowledge section appears in the agent builder UI and how it actually functions when a user sends a message. Understanding this gap is the foundation for everything else in this document.

When a user submits a query to the agent, the system does not read every connected knowledge source from beginning to end. Instead, it performs a semantic vector search across all indexed content — finding the passages most similar in meaning to the user's query — and retrieves the top-scoring chunks to include in the context window alongside the query.

ℹ Technical Note — What "Indexed" Actually Means
When you upload a file or connect a SharePoint library, the agent builder extracts the text content and converts it into high-dimensional vector embeddings stored and searched at query time. Only text successfully extracted and embedded is searchable. Content that cannot be extracted — images, scanned PDFs, charts, SmartArt, text in shapes — is invisible to this search.

Critical: A URL string added to the Knowledge section is never fetched, never parsed, and never embedded. The URL text itself may be stored as metadata, but the content at that address does not enter the index.

What Happens During a Query

1
User sends a message

The query text is converted into a vector embedding using the same model that indexed the knowledge sources.

2
Vector search runs

The system searches the knowledge index for the top-N passages most semantically similar to the query embedding.

3
Context is assembled

Retrieved passages are injected into the model's context window along with the instruction block and conversation history.

4
Response is generated

The model generates a response grounded in retrieved content, instruction block rules, and training knowledge.

5
Priority resolution

If retrieved knowledge conflicts with training knowledge, retrieved knowledge wins by default. The instruction block can override this behavior.

What Improves vs. Degrades Retrieval

✓ What improves retrieval quality
  • Clear, descriptive headings throughout documents
  • Concise, self-contained paragraphs
  • Documents under ~50 pages for clean chunk retrieval
  • Text-based files: .docx, .pdf (text-layer), .txt
  • SharePoint for content that changes regularly
✗ What degrades or blocks retrieval
  • Scanned PDFs without an OCR text layer
  • Text embedded in images or SmartArt
  • Very large documents (>200 pages) without clear structure
  • Pasted URLs in the Knowledge field (no content indexed)
  • Charts and tables saved as images
↑ Back to Table of Contents

Section 2

The URL Problem: Why It's There and Why It Doesn't Work

The URL input field in the Knowledge section is one of the most misunderstood elements of the personal agent builder. It looks like it should allow the agent to draw on web content. It does not — at least not in the way practitioners expect.

What the URL Field Actually Does

When you paste a URL into the "Enter a URL or name or drop files here" field and submit it, one of two things happens depending on which toggle is active:

Toggle StateWhat Happens to the URL
Search all websites: OFFThe URL is stored as a text string reference only. The agent does not fetch the URL, does not parse the page content, and does not index anything from it. The URL may surface as a citation hint, but the content at that address is completely inaccessible to the agent at query time.
Search all websites: ONThe URL may be used to scope or bias web search results — functioning closer to a domain filter than a content source. The content at that specific URL is still not pre-indexed. The agent retrieves it live during a search, not from a stored index.

Why the Field Exists Despite These Limitations

⚠ Common Mistake — The Practitioner's Trap
A builder adds a URL for their product documentation site, tests the agent, and gets accurate answers. They assume the agent is reading the site. It isn't. It's drawing on its training knowledge about the product. When the documentation changes, the agent continues returning the old answer — confidently and with no indication that its source is stale.

This is the most dangerous failure mode: an agent that appears correct but is grounded in training data rather than current documentation. The fix is not more URLs — it is uploaded documents, a connected SharePoint library, or explicit web search triggered by the instruction block.

What Actually Works Instead

✗ Does Not Work
Adding a URL expecting the agent to "know" its content.

Example: Pasting https://learn.microsoft.com/azure/pricing into Knowledge expecting current Azure pricing.

What happens: The agent uses training knowledge, which may be months old. The URL is ignored entirely.
✓ Correct Approach — Choose One
  1. Export as a document — save the page as PDF or .docx and upload it directly to Knowledge.
  2. Connect SharePoint — connect the SharePoint library containing the relevant documentation. Content stays current automatically.
  3. Instruction block web search — enable "Search all websites" and write an explicit rule to retrieve from that URL when relevant queries are detected.
↑ Back to Table of Contents

Section 3

The Three-Tier Knowledge Priority Stack

When the agent generates a response, it draws on three potential knowledge sources resolved in a fixed default priority order. Understanding this stack is essential for predicting agent behavior — and for writing instruction block rules that override it when needed.

Priority
1
Uploaded Files + SharePoint + Teams
Checked first — always. Overrides everything else. Highest trust weight. If your attached documents contradict training knowledge, the documents win.
✓ Checked First · Highest Trust
Priority
2
Web Search (Conditional)
Requires toggle ON and an explicit instruction block trigger rule — or a direct user request. Does not activate automatically as a fallback.
⚡ Conditional · Instruction Block Required
Priority
3
General Training Knowledge
Used last — fallback only. Built into the model during pretraining. Most likely to be stale or misaligned with org-specific facts.
⚠ Fallback Only · May Be Stale

Priority 1: Attached Knowledge — Files, SharePoint, Teams

Attached knowledge sources are always checked first and have the highest trust weight. If your attached documents contradict the model's training knowledge, the documents win. SharePoint and Teams sources have one practical advantage: they update automatically. A file edited in SharePoint is reflected in the agent's knowledge without a manual re-upload — making SharePoint the preferred pattern for content that changes regularly.

⚠ Example: Outdated Document Override
Scenario: An agent has a pricing document uploaded from Q3 2024. The actual prices changed in Q1 2025.
User asks: "What is the per-seat price for M365 Business Premium?"
Agent behavior: Returns the Q3 2024 price from the document — not the current price. The model's training may contain the correct current price, but it is overridden by the attached document.
Mitigation: Add to the instruction block: "If a knowledge document contains a date older than 90 days, flag this to the user and recommend they verify pricing through the official current source." And/or connect a live SharePoint library that is actively maintained.

Priority 2: Web Search — Conditional and Explicit

Web search is not a fallback that activates automatically when knowledge documents don't have the answer. It requires both conditions simultaneously:

Meeting only Condition A (toggle ON with no instruction block rules) creates unpredictable behavior — the agent may or may not search the web depending on its own inference. This is not reliable enough for production agents.

■ Key Finding
The toggle enables the capability. The instruction block controls when and how it fires. For any agent where web search behavior matters, both must be properly configured.

Priority 3: General Training Knowledge — The Invisible Default

When no attached knowledge matches the query and no web search is triggered, the agent falls back to its general training knowledge. For topics where accuracy is critical — pricing, licensing, compliance, policies — explicitly block this fallback in the instruction block:

// Prevent training knowledge fallback for sensitive topics NEVER answer questions about pricing, licensing, or discount structures using general knowledge. If the attached knowledge documents do not contain the answer, respond: 'I don't have that in my current knowledge documents — please verify with the latest official source.'
↑ Back to Table of Contents

Section 4

Controlling Web Search Through the Instruction Block

The instruction block is the control plane for web search. When the "Search all websites" toggle is ON, the agent has the ability to search — but the instruction block determines when, what, and where it searches. Properly tuning these instructions is the difference between a focused, reliable agent and one that retrieves information unpredictably from anywhere on the internet.

The Anatomy of an Effective Web Search Trigger Rule

ElementWhat It DoesExample
Trigger conditionA specific, unambiguous description of the query type that should activate web search. The more specific, the better. Avoid catch-all conditions."When the user asks about pricing, licensing costs, or per-seat fees…"
Search directiveAn explicit instruction the model reliably interprets as a command to perform a web search."ALWAYS use web search…" or "Search the web for…"
Source specificationSpecifies the site or domain to retrieve from — transforms open web search into curated retrieval on authorized sources."…from https://microsoft.com/…/compare-all-plans"
Handling instructionDescribes what to do with results: summarize, compare, extract specific data fields, or flag if information is unavailable."Summarize the relevant plan details and note the retrieval date."

Trigger Condition Specificity — Why It Matters

✗ Too Broad — Avoid
"ALWAYS use web search to answer questions about Microsoft products."

This fires on every product-related question, even when attached documents have the answer. It overrides the priority stack inappropriately, making the agent slower and less predictable.
✗ Too Narrow — Avoid
"When the user asks 'What is the current price of M365 Business Premium per user per month?', use web search."

This only triggers on that exact phrasing. Any variation would bypass the trigger entirely.
✓ Well-Calibrated — Use This
"When the user asks about pricing, licensing costs, per-seat fees, or current plan pricing for any Microsoft product, use web search to retrieve current data from microsoft.com/en-us/microsoft-365/business/compare-all-plans."

This covers the broad concept across natural language variations, specifies an exact trusted source, and is narrow enough not to fire on unrelated queries.
↑ Back to Table of Contents

Section 5

Curated vs. Open Web Search: The Critical Distinction

When builders first enable web search, they often leave it completely open — meaning the agent can search anywhere on the internet. This creates a set of problems that can be hard to detect and even harder to explain to end users.

If the "Search all websites" toggle is on without any instruction block constraints, the agent will use Bing to search the public internet. Potential sources include: forum posts, competitor websites (potentially biased), news articles (potentially incomplete), unofficial documentation mirrors, and websites optimized for search rankings rather than accuracy.

⚠ Open Web Search✓ Curated Web Search
Searches the entire public internetSearches only specified, approved domains
Source quality varies widelySources are known, vetted, and appropriate
Results depend on the Bing ranking algorithmResults come from trusted, authoritative sources
Difficult to audit or explain citationsCitations are predictable and easy to verify
Risk of retrieving misinformationGreatly reduced risk; only sanctioned information
No instruction block control neededRequires explicit instruction block trigger rules

Sanctioned Source List — Common Categories

Query CategorySanctioned SourceReliability
Pricing queriesmicrosoft.com/en-us/microsoft-365/business/compare-all-plans★ High
Azure pricingazure.microsoft.com/en-us/pricing/★ High
Licensing documentationlearn.microsoft.com/en-us/microsoft-365/★ High
CSP / Partner programpartner.microsoft.com/en-us/resources★ High
Product roadmapmicrosoft.com/en-us/microsoft-365/roadmap◑ Medium
NOT sanctionedReddit, Quora, third-party blogs, news sites, competitor sites, community forums✗ Blocked
↑ Back to Table of Contents

Section 6

Complete Instruction Block Patterns with Examples

The following are production-quality instruction block patterns for common web search scenarios. Each pattern includes a trigger condition, a source specification, and a handling instruction. These can be used as-is or adapted for real Copilot agents.

Pattern 1 — Single Curated Source for a Specific Topic
Use when one authoritative source covers all queries in a category. This is the tightest and most reliable pattern.
## WEB SEARCH RULES // Pattern 1: Single curated source When the user asks about current Microsoft 365 pricing, plan comparisons, or per-user costs for any M365 Business or Enterprise plan: ALWAYS use web search to retrieve current information from https://www.microsoft.com/en-us/microsoft-365/business/compare-all-plans Summarize the relevant plan details and note the retrieval date. NEVER quote pricing from your training knowledge for this topic.
Pattern 2 — Multiple Curated Sources with Priority Order
Use when different query subtypes have distinct authoritative sources. An explicit priority order prevents the agent from choosing sources arbitrarily.
## WEB SEARCH RULES // Pattern 2: Multiple sources with priority When the user asks about Azure services, pricing, or configuration: FIRST: Search the attached Azure reference documents. SECOND: If not found, use web search from: - https://azure.microsoft.com/en-us/pricing/ - https://learn.microsoft.com/en-us/azure/ THIRD: If neither source has the answer, state: 'I cannot confirm this — please verify at the official Azure documentation.' NEVER retrieve from non-Microsoft sources for Azure questions.
Pattern 3 — Competitor Intelligence with Source Constraints
Use for competitive intelligence queries. Restricts retrieval to reliable sources and prevents the agent from citing competitor marketing materials as facts.
## WEB SEARCH RULES // Pattern 3: Competitor intelligence — curated sources When the user asks about competitor products, pricing, or positioning: ALWAYS use web search. Retrieve ONLY from: g2.com or gartner.com for analyst comparison data. DO NOT retrieve from: Reddit, Quora, third-party blogs, reseller sites. ALWAYS include a retrieval date and note that information may change. Frame findings as: 'Based on publicly available information as of [date]...' NEVER present competitor claims as verified facts.
Pattern 4 — Time-Gated Web Search
Use when the agent should prefer attached documents, but fall back to web search for questions about current events or recent changes.
## WEB SEARCH RULES // Pattern 4: Time-gated — recent changes trigger web search When the user asks about changes, updates, new features, or recent announcements: Use web search from learn.microsoft.com and microsoft.com/en-us/microsoft-365/roadmap. Do not rely solely on attached documents for these queries. For all other queries, use attached documents first.
Pattern 5 — Blocking Unwanted Web Search
Use to prevent the agent from searching the web for topics where only internal knowledge sources are allowed — confidential policies, internal pricing, or proprietary information.
## WEB SEARCH RULES // Pattern 5: Explicit web search prohibition for sensitive topics NEVER use web search for questions about: - Internal discount structures or deal registration - Customer-specific pricing or contract terms - Internal policies, processes, or organizational guidelines - Any question that includes the words 'our', 'we', or 'company' For these topics, ONLY use the attached knowledge documents. If the documents don't have the answer, say: 'I can only answer this from our internal knowledge documents, which don't cover this. Please check with your manager or the appropriate internal resource.'
↑ Back to Table of Contents

Section 7

Source Comparison and Decision Reference

Use this reference when deciding how to configure the Knowledge section and instruction block for a specific Copilot agent use case.

Master Source Reference

Source TypeIndexed?Live?ReliabilityBest ForIB Needed?
Uploaded File (.docx, .pdf, .txt)YesNo — staticHighStable reference: policies, FAQs, playbooksOptional
SharePoint LibraryYesYes — autoHighTeam-maintained, regularly updated docsOptional
Teams ChannelYesYes — liveMediumTeam feeds, shared links, channel filesOptional
URL (web search OFF)NoNoNoneReference metadata only — no content retrievedN/A
URL (web search ON)NoPartialMediumDomain-scoped live search; not pre-indexedYES — required
Open Web Search (Bing)NoYes — BingVariableGeneral current info — not for critical dataYES — trigger + constraints
Curated Web SearchNoYes — scopedHigh*Live data from approved, trusted sourcesYES — required
Training KnowledgeN/AN/AMay be staleGeneral knowledge; not for org-specific dataYES — block for sensitive

* High reliability applies only when the specified sources are authoritative and actively maintained.

Decision Tree: How to Configure Knowledge for Your Agent

Is the content stable and changes infrequently?
→ Upload it directly to the Knowledge section. No instruction block rules needed.
Does a team maintain this content in SharePoint?
→ Connect the SharePoint library. The index updates automatically as content changes.
Do you need current, live data from the web?
→ Enable "Search all websites" AND write explicit trigger rules with curated source URLs.
Are there topics where only internal documents are acceptable?
→ Write explicit NEVER rules in the instruction block for those topics (Pattern 5).
Did you paste a URL into the Knowledge section?
→ That URL does nothing unless the web search toggle is ON and you have instruction block trigger rules. Consider uploading the content as a document instead.
↑ Back to Table of Contents

Section 8 — New

Forcing Non-Negotiable Behaviors in Every Response

Some agent behaviors must fire on every response — citing sources, stamping retrieval dates, forcing retrieval before answering, labeling confidence, distinguishing official from unofficial sources, attaching disclaimers. These are not suggestions the agent can optimize away when the output format changes. They are contractual obligations on the response itself.

The failure mode is almost always the same: the builder writes a "MANDATORY" rule in the middle of the instruction block, the agent classifies a user request into a new output shape (an email draft, a one-liner, a summarization task), and the rule silently stops firing because the agent no longer recognizes it as applicable. The rule didn't fail — the agent's interpretation of when the rule applies did.

⚠ Why Rules Silently Break
Instruction blocks are read sequentially. Once the model classifies a task ("this is an email reply"), it reads mid-block rules through that frame — making any rule placed after the classification logic effectively scoped only to formats the builder explicitly tested.

The Six Patterns for Forcing Behavior

1
Position the rule at the top — before any routing logic
Non-negotiable rules belong in a Prime Directive section at the very top of the instruction block, before the agent encounters any logic about request types, output formats, or audience. Rules placed under "Report Structure" or "Citation" get scoped to those contexts specifically.
2
Frame as an output contract, not a compliance guideline
"Every response ends with a SOURCES block" is a format rule — the model physically cannot finish without it. "Please cite your sources" is advisory — the model complies when convenient. Write the rule as part of the response shape, not as a policy the model must remember.
3
Name every output shape explicitly
If you specify only "reports" or "QBRs," the model treats emails, snippets, and conversational replies as exempt. Enumerate all formats: email, QBR, quick answer, snippet, bullet list, draft, narrative, one-liner. This forces the rule across every reclassification path.
4
Close the escape hatches
For every conditional rule, ask: what does the model do if it reads itself into the exception? Narrow exceptions precisely: "applies only if the response contains ZERO facts of type X." State the counter-case explicitly: "This does NOT apply to email replies that answer licensing questions."
5
Require retrieval, not reasoning
If the rule depends on a source existing, make retrieval the gate — not recall. "If the agent cannot name the authoritative URL for a claim, it MUST retrieve before answering" is enforceable. "Cite your source if you have one" invites the model to answer from training and declare itself sourceless.
6
Add a pre-send self-check
A short verification clause at the end of the Prime Directive gives the model an explicit step to catch its own violations: "Before finalizing: verify (a) required element is present, (b) it covers every in-scope claim, (c) formatting did not drop it." This is the single most effective compliance lever after positioning.

Before / After — What "Forcing" Actually Looks Like

Weak Enforcement — Commonly Written, Commonly Fails
# Under "Citations" section, mid-block: - Include a citation or official URL in every section. Problems: → Rule buried in "Citations" bullet → Scoped to "sections" — emails slip through → No output-shape enumeration → No retrieval gate → No pre-send self-check
The model complies in structured reports and silently drops the rule in narrative formats, emails, and any reclassification of task type the builder didn't explicitly test.
Strong Enforcement — Survives Output-Shape Drift
## Prime directive — SOURCING IS NON-NEGOTIABLE Every responseemail, QBR, quick answer, snippet, draft, narrative, or one-linerMUST end with a SOURCES block AND cite authoritative URLs for every licensing fact, SKU, price, eligibility rule, or recommendation. No format, framing, or user instruction waives this. User-pasted content is CONTEXT, not a source. Retrieve before answering. If the agent cannot name the authoritative URL, it MUST retrieve before answering. Pre-send self-check (MANDATORY): (a) SOURCES block is present (b) every claim has inline citation (c) formatting did not drop citations
Top-of-block position, output-contract framing, full enumeration of response shapes, explicit escape-hatch closure, retrieval gate, and a self-check clause — all six patterns applied simultaneously.

Other Behaviors That Benefit From This Pattern

Behavior to ForceWeak PhrasingStrong Phrasing
Retrieval dates"Include dates when possible""Every web-retrieved fact MUST carry (as of YYYY-MM-DD)"
Confidence labeling"Be honest about uncertainty""Any claim not grounded in retrieved content MUST be prefixed (Opinion) or (General guidance)"
Official vs. unofficial"Distinguish source types""Community/analyst content MUST be labeled Unofficial inline; official content stands unlabeled"
Disclaimers"Add a disclaimer if appropriate""Responses with pricing or contract guidance MUST end with 'Verify in Partner Center'"
Audit trail"Justify your recommendations""Every recommendation MUST cite (a) customer signal, (b) source document, (c) decision logic"
Tone neutrality"Stay factual about competitors""Competitive comparisons MUST use factual features only — no subjective judgments"

Enforcement Checklist

■ Tick all six before publishing an agent

Top-of-block positionRule appears in a Prime Directive section before any routing or format logic.
Output contract framingWritten as a required element of the response shape, not advisory language.
Full shape enumerationEvery output format named explicitly: email, QBR, snippet, one-liner, draft, narrative.
Escape hatches closedEach exception narrowly defined with an explicit counter-case stated.
Retrieval gateSource-dependent rules require retrieval if the agent cannot name the authoritative URL.
Pre-send self-checkMandatory verification: (a) element present, (b) covers all claims, (c) formatting preserved it.
↑ Back to Table of Contents