Microsoft Copilot Personal Agents Knowledge Section
Technical Reference — How the agent actually uses knowledge, and why URLs behave the way they do
Classification: TD SYNNEX Internal — Confidential · Audience: Copilot practitioners who build and configure personal agents
Section 1
How the Knowledge Section Actually Works at Query Time
There is a persistent gap between how the Knowledge section appears in the agent builder UI and how it actually functions when a user sends a message. Understanding this gap is the foundation for everything else in this document.
When a user submits a query to the agent, the system does not read every connected knowledge source from beginning to end. Instead, it performs a semantic vector search across all indexed content — finding the passages most similar in meaning to the user's query — and retrieves the top-scoring chunks to include in the context window alongside the query.
ℹ Technical Note — What "Indexed" Actually Means
When you upload a file or connect a SharePoint library, the agent builder extracts the text content and converts it into high-dimensional vector embeddings stored and searched at query time. Only text successfully extracted and embedded is searchable. Content that cannot be extracted — images, scanned PDFs, charts, SmartArt, text in shapes — is invisible to this search.
Critical: A URL string added to the Knowledge section is never fetched, never parsed, and never embedded. The URL text itself may be stored as metadata, but the content at that address does not enter the index.
What Happens During a Query
1
User sends a message
The query text is converted into a vector embedding using the same model that indexed the knowledge sources.
2
Vector search runs
The system searches the knowledge index for the top-N passages most semantically similar to the query embedding.
3
Context is assembled
Retrieved passages are injected into the model's context window along with the instruction block and conversation history.
4
Response is generated
The model generates a response grounded in retrieved content, instruction block rules, and training knowledge.
5
Priority resolution
If retrieved knowledge conflicts with training knowledge, retrieved knowledge wins by default. The instruction block can override this behavior.
■ Key Finding
The agent does not "read" its knowledge — it searches it. This distinction has major practical implications: document structure matters enormously (clear headings improve retrieval), very long documents may return lower-quality results because relevant passages compete with noise, and content that isn't text is simply not there as far as the agent is concerned.
What Improves vs. Degrades Retrieval
✓ What improves retrieval quality
Clear, descriptive headings throughout documents
Concise, self-contained paragraphs
Documents under ~50 pages for clean chunk retrieval
Text-based files: .docx, .pdf (text-layer), .txt
SharePoint for content that changes regularly
✗ What degrades or blocks retrieval
Scanned PDFs without an OCR text layer
Text embedded in images or SmartArt
Very large documents (>200 pages) without clear structure
Pasted URLs in the Knowledge field (no content indexed)
The URL Problem: Why It's There and Why It Doesn't Work
The URL input field in the Knowledge section is one of the most misunderstood elements of the personal agent builder. It looks like it should allow the agent to draw on web content. It does not — at least not in the way practitioners expect.
What the URL Field Actually Does
When you paste a URL into the "Enter a URL or name or drop files here" field and submit it, one of two things happens depending on which toggle is active:
Toggle State
What Happens to the URL
Search all websites: OFF
The URL is stored as a text string reference only. The agent does not fetch the URL, does not parse the page content, and does not index anything from it. The URL may surface as a citation hint, but the content at that address is completely inaccessible to the agent at query time.
Search all websites: ON
The URL may be used to scope or bias web search results — functioning closer to a domain filter than a content source. The content at that specific URL is still not pre-indexed. The agent retrieves it live during a search, not from a stored index.
Why the Field Exists Despite These Limitations
Domain scoping for web search: When "Search all websites" is enabled, a URL entered here tells the search to prefer results from that domain. It functions as a hint, not a guarantee.
Explicit source limiting: The "Only use specified sources" toggle, combined with URLs and the web search toggle, attempts to restrict web search to those specific sites — but requires both toggles active and is less reliable than instruction block control.
Future capability surface: The field exists in anticipation of deeper URL crawling features. The UI component is there; the full indexing pipeline for arbitrary URLs is not yet generally available in the personal agent builder.
Reference metadata: For audit and transparency purposes, connected URLs appear in the knowledge panel so builders can see what sources are referenced — even if the content isn't indexed.
⚠ Common Mistake — The Practitioner's Trap
A builder adds a URL for their product documentation site, tests the agent, and gets accurate answers. They assume the agent is reading the site. It isn't. It's drawing on its training knowledge about the product. When the documentation changes, the agent continues returning the old answer — confidently and with no indication that its source is stale.
This is the most dangerous failure mode: an agent that appears correct but is grounded in training data rather than current documentation. The fix is not more URLs — it is uploaded documents, a connected SharePoint library, or explicit web search triggered by the instruction block.
What Actually Works Instead
✗ Does Not Work
Adding a URL expecting the agent to "know" its content.
Example: Pasting https://learn.microsoft.com/azure/pricing into Knowledge expecting current Azure pricing.
What happens: The agent uses training knowledge, which may be months old. The URL is ignored entirely.
✓ Correct Approach — Choose One
Export as a document — save the page as PDF or .docx and upload it directly to Knowledge.
Connect SharePoint — connect the SharePoint library containing the relevant documentation. Content stays current automatically.
Instruction block web search — enable "Search all websites" and write an explicit rule to retrieve from that URL when relevant queries are detected.
When the agent generates a response, it draws on three potential knowledge sources resolved in a fixed default priority order. Understanding this stack is essential for predicting agent behavior — and for writing instruction block rules that override it when needed.
Priority
1
Uploaded Files + SharePoint + Teams
Checked first — always. Overrides everything else. Highest trust weight. If your attached documents contradict training knowledge, the documents win.
✓ Checked First · Highest Trust
Priority
2
Web Search (Conditional)
Requires toggle ON and an explicit instruction block trigger rule — or a direct user request. Does not activate automatically as a fallback.
⚡ Conditional · Instruction Block Required
Priority
3
General Training Knowledge
Used last — fallback only. Built into the model during pretraining. Most likely to be stale or misaligned with org-specific facts.
⚠ Fallback Only · May Be Stale
Priority 1: Attached Knowledge — Files, SharePoint, Teams
Attached knowledge sources are always checked first and have the highest trust weight. If your attached documents contradict the model's training knowledge, the documents win. SharePoint and Teams sources have one practical advantage: they update automatically. A file edited in SharePoint is reflected in the agent's knowledge without a manual re-upload — making SharePoint the preferred pattern for content that changes regularly.
⚠ Example: Outdated Document Override
Scenario: An agent has a pricing document uploaded from Q3 2024. The actual prices changed in Q1 2025. User asks: "What is the per-seat price for M365 Business Premium?" Agent behavior: Returns the Q3 2024 price from the document — not the current price. The model's training may contain the correct current price, but it is overridden by the attached document. Mitigation: Add to the instruction block: "If a knowledge document contains a date older than 90 days, flag this to the user and recommend they verify pricing through the official current source." And/or connect a live SharePoint library that is actively maintained.
Priority 2: Web Search — Conditional and Explicit
Web search is not a fallback that activates automatically when knowledge documents don't have the answer. It requires both conditions simultaneously:
Condition A: The "Search all websites" toggle in the Knowledge section must be turned ON.
Condition B: Either the user explicitly requests a web search, or the instruction block contains a rule that triggers web search for a specific class of query.
Meeting only Condition A (toggle ON with no instruction block rules) creates unpredictable behavior — the agent may or may not search the web depending on its own inference. This is not reliable enough for production agents.
■ Key Finding
The toggle enables the capability. The instruction block controls when and how it fires. For any agent where web search behavior matters, both must be properly configured.
Priority 3: General Training Knowledge — The Invisible Default
When no attached knowledge matches the query and no web search is triggered, the agent falls back to its general training knowledge. For topics where accuracy is critical — pricing, licensing, compliance, policies — explicitly block this fallback in the instruction block:
// Prevent training knowledge fallback for sensitive topicsNEVER answer questions about pricing, licensing, or discount structures
using general knowledge. If the attached knowledge documents do not
contain the answer, respond: 'I don't have that in my current knowledge
documents — please verify with the latest official source.'
Controlling Web Search Through the Instruction Block
The instruction block is the control plane for web search. When the "Search all websites" toggle is ON, the agent has the ability to search — but the instruction block determines when, what, and where it searches. Properly tuning these instructions is the difference between a focused, reliable agent and one that retrieves information unpredictably from anywhere on the internet.
The Anatomy of an Effective Web Search Trigger Rule
Element
What It Does
Example
Trigger condition
A specific, unambiguous description of the query type that should activate web search. The more specific, the better. Avoid catch-all conditions.
"When the user asks about pricing, licensing costs, or per-seat fees…"
Search directive
An explicit instruction the model reliably interprets as a command to perform a web search.
"ALWAYS use web search…" or "Search the web for…"
Source specification
Specifies the site or domain to retrieve from — transforms open web search into curated retrieval on authorized sources.
"…from https://microsoft.com/…/compare-all-plans"
Handling instruction
Describes what to do with results: summarize, compare, extract specific data fields, or flag if information is unavailable.
"Summarize the relevant plan details and note the retrieval date."
Trigger Condition Specificity — Why It Matters
✗ Too Broad — Avoid
"ALWAYS use web search to answer questions about Microsoft products."
This fires on every product-related question, even when attached documents have the answer. It overrides the priority stack inappropriately, making the agent slower and less predictable.
✗ Too Narrow — Avoid
"When the user asks 'What is the current price of M365 Business Premium per user per month?', use web search."
This only triggers on that exact phrasing. Any variation would bypass the trigger entirely.
✓ Well-Calibrated — Use This
"When the user asks about pricing, licensing costs, per-seat fees, or current plan pricing for any Microsoft product, use web search to retrieve current data from microsoft.com/en-us/microsoft-365/business/compare-all-plans."
This covers the broad concept across natural language variations, specifies an exact trusted source, and is narrow enough not to fire on unrelated queries.
Curated vs. Open Web Search: The Critical Distinction
When builders first enable web search, they often leave it completely open — meaning the agent can search anywhere on the internet. This creates a set of problems that can be hard to detect and even harder to explain to end users.
If the "Search all websites" toggle is on without any instruction block constraints, the agent will use Bing to search the public internet. Potential sources include: forum posts, competitor websites (potentially biased), news articles (potentially incomplete), unofficial documentation mirrors, and websites optimized for search rankings rather than accuracy.
The following are production-quality instruction block patterns for common web search scenarios. Each pattern includes a trigger condition, a source specification, and a handling instruction. These can be used as-is or adapted for real Copilot agents.
Pattern 1 — Single Curated Source for a Specific Topic
Use when one authoritative source covers all queries in a category. This is the tightest and most reliable pattern.
## WEB SEARCH RULES// Pattern 1: Single curated sourceWhen the user asks about current Microsoft 365 pricing, plan comparisons,
or per-user costs for any M365 Business or Enterprise plan:
ALWAYS use web search to retrieve current information
fromhttps://www.microsoft.com/en-us/microsoft-365/business/compare-all-plans
Summarize the relevant plan details and note the retrieval date.
NEVER quote pricing from your training knowledge for this topic.
Pattern 2 — Multiple Curated Sources with Priority Order
Use when different query subtypes have distinct authoritative sources. An explicit priority order prevents the agent from choosing sources arbitrarily.
## WEB SEARCH RULES// Pattern 2: Multiple sources with priorityWhen the user asks about Azure services, pricing, or configuration:
FIRST: Search the attached Azure reference documents.
SECOND: If not found, use web search from:
- https://azure.microsoft.com/en-us/pricing/- https://learn.microsoft.com/en-us/azure/THIRD: If neither source has the answer, state:
'I cannot confirm this — please verify at the official Azure documentation.'NEVER retrieve from non-Microsoft sources for Azure questions.
Pattern 3 — Competitor Intelligence with Source Constraints
Use for competitive intelligence queries. Restricts retrieval to reliable sources and prevents the agent from citing competitor marketing materials as facts.
## WEB SEARCH RULES// Pattern 3: Competitor intelligence — curated sourcesWhen the user asks about competitor products, pricing, or positioning:
ALWAYS use web search.
Retrieve ONLY from: g2.com or gartner.com for analyst comparison data.
DO NOT retrieve from: Reddit, Quora, third-party blogs, reseller sites.
ALWAYS include a retrieval date and note that information may change.
Frame findings as: 'Based on publicly available information as of [date]...'NEVER present competitor claims as verified facts.
Pattern 4 — Time-Gated Web Search
Use when the agent should prefer attached documents, but fall back to web search for questions about current events or recent changes.
## WEB SEARCH RULES// Pattern 4: Time-gated — recent changes trigger web searchWhen the user asks about changes, updates, new features, or recent announcements:
Use web search from learn.microsoft.com
and microsoft.com/en-us/microsoft-365/roadmap.
Do not rely solely on attached documents for these queries.
For all other queries, use attached documents first.
Pattern 5 — Blocking Unwanted Web Search
Use to prevent the agent from searching the web for topics where only internal knowledge sources are allowed — confidential policies, internal pricing, or proprietary information.
## WEB SEARCH RULES// Pattern 5: Explicit web search prohibition for sensitive topicsNEVER use web search for questions about:
- Internal discount structures or deal registration- Customer-specific pricing or contract terms- Internal policies, processes, or organizational guidelines- Any question that includes the words 'our', 'we', or 'company'
For these topics, ONLY use the attached knowledge documents.
If the documents don't have the answer, say:
'I can only answer this from our internal knowledge documents,
which don't cover this. Please check with your manager or the
appropriate internal resource.'
Use this reference when deciding how to configure the Knowledge section and instruction block for a specific Copilot agent use case.
Master Source Reference
Source Type
Indexed?
Live?
Reliability
Best For
IB Needed?
Uploaded File (.docx, .pdf, .txt)
Yes
No — static
High
Stable reference: policies, FAQs, playbooks
Optional
SharePoint Library
Yes
Yes — auto
High
Team-maintained, regularly updated docs
Optional
Teams Channel
Yes
Yes — live
Medium
Team feeds, shared links, channel files
Optional
URL (web search OFF)
No
No
None
Reference metadata only — no content retrieved
N/A
URL (web search ON)
No
Partial
Medium
Domain-scoped live search; not pre-indexed
YES — required
Open Web Search (Bing)
No
Yes — Bing
Variable
General current info — not for critical data
YES — trigger + constraints
Curated Web Search
No
Yes — scoped
High*
Live data from approved, trusted sources
YES — required
Training Knowledge
N/A
N/A
May be stale
General knowledge; not for org-specific data
YES — block for sensitive
* High reliability applies only when the specified sources are authoritative and actively maintained.
Decision Tree: How to Configure Knowledge for Your Agent
Is the content stable and changes infrequently? → Upload it directly to the Knowledge section. No instruction block rules needed.
Does a team maintain this content in SharePoint? → Connect the SharePoint library. The index updates automatically as content changes.
Do you need current, live data from the web? → Enable "Search all websites" AND write explicit trigger rules with curated source URLs.
Are there topics where only internal documents are acceptable? → Write explicit NEVER rules in the instruction block for those topics (Pattern 5).
Did you paste a URL into the Knowledge section? → That URL does nothing unless the web search toggle is ON and you have instruction block trigger rules. Consider uploading the content as a document instead.
■ Key Finding — Reliable Knowledge Configuration
The most reliable configuration for a production Copilot agent combines: uploaded files or SharePoint for stable, authoritative information; curated web search with explicit instruction block trigger rules naming exactly which sources to use; and explicit NEVER rules for topics where general training knowledge or open web retrieval is not acceptable. Agents built this way are predictable, auditable, and trustworthy.
Forcing Non-Negotiable Behaviors in Every Response
Some agent behaviors must fire on every response — citing sources, stamping retrieval dates, forcing retrieval before answering, labeling confidence, distinguishing official from unofficial sources, attaching disclaimers. These are not suggestions the agent can optimize away when the output format changes. They are contractual obligations on the response itself.
The failure mode is almost always the same: the builder writes a "MANDATORY" rule in the middle of the instruction block, the agent classifies a user request into a new output shape (an email draft, a one-liner, a summarization task), and the rule silently stops firing because the agent no longer recognizes it as applicable. The rule didn't fail — the agent's interpretation of when the rule applies did.
⚠ Why Rules Silently Break
Instruction blocks are read sequentially. Once the model classifies a task ("this is an email reply"), it reads mid-block rules through that frame — making any rule placed after the classification logic effectively scoped only to formats the builder explicitly tested.
The Six Patterns for Forcing Behavior
1
Position the rule at the top — before any routing logic
Non-negotiable rules belong in a Prime Directive section at the very top of the instruction block, before the agent encounters any logic about request types, output formats, or audience. Rules placed under "Report Structure" or "Citation" get scoped to those contexts specifically.
2
Frame as an output contract, not a compliance guideline
"Every response ends with a SOURCES block" is a format rule — the model physically cannot finish without it. "Please cite your sources" is advisory — the model complies when convenient. Write the rule as part of the response shape, not as a policy the model must remember.
3
Name every output shape explicitly
If you specify only "reports" or "QBRs," the model treats emails, snippets, and conversational replies as exempt. Enumerate all formats: email, QBR, quick answer, snippet, bullet list, draft, narrative, one-liner. This forces the rule across every reclassification path.
4
Close the escape hatches
For every conditional rule, ask: what does the model do if it reads itself into the exception? Narrow exceptions precisely: "applies only if the response contains ZERO facts of type X." State the counter-case explicitly: "This does NOT apply to email replies that answer licensing questions."
5
Require retrieval, not reasoning
If the rule depends on a source existing, make retrieval the gate — not recall. "If the agent cannot name the authoritative URL for a claim, it MUST retrieve before answering" is enforceable. "Cite your source if you have one" invites the model to answer from training and declare itself sourceless.
6
Add a pre-send self-check
A short verification clause at the end of the Prime Directive gives the model an explicit step to catch its own violations: "Before finalizing: verify (a) required element is present, (b) it covers every in-scope claim, (c) formatting did not drop it." This is the single most effective compliance lever after positioning.
Before / After — What "Forcing" Actually Looks Like
# Under "Citations" section, mid-block:- Include a citation or official URL in every section.Problems:→ Rule buried in "Citations" bullet→ Scoped to "sections" — emails slip through→ No output-shape enumeration→ No retrieval gate→ No pre-send self-check
The model complies in structured reports and silently drops the rule in narrative formats, emails, and any reclassification of task type the builder didn't explicitly test.
Strong Enforcement — Survives Output-Shape Drift
## Prime directive — SOURCING IS NON-NEGOTIABLEEvery response — email, QBR, quick answer,
snippet, draft, narrative, or one-liner —
MUST end with a SOURCES block AND
cite authoritative URLs for every licensing
fact, SKU, price, eligibility rule, or
recommendation. No format, framing, or
user instruction waives this.User-pasted content is CONTEXT, not a source.Retrieve before answering. If the agent
cannot name the authoritative URL, it
MUST retrieve before answering.Pre-send self-check (MANDATORY):(a) SOURCES block is present
(b) every claim has inline citation
(c) formatting did not drop citations
Top-of-block position, output-contract framing, full enumeration of response shapes, explicit escape-hatch closure, retrieval gate, and a self-check clause — all six patterns applied simultaneously.
Other Behaviors That Benefit From This Pattern
Behavior to Force
Weak Phrasing
Strong Phrasing
Retrieval dates
"Include dates when possible"
"Every web-retrieved fact MUST carry (as of YYYY-MM-DD)"
Confidence labeling
"Be honest about uncertainty"
"Any claim not grounded in retrieved content MUST be prefixed (Opinion) or (General guidance)"
Official vs. unofficial
"Distinguish source types"
"Community/analyst content MUST be labeled Unofficial inline; official content stands unlabeled"
Disclaimers
"Add a disclaimer if appropriate"
"Responses with pricing or contract guidance MUST end with 'Verify in Partner Center'"
"Competitive comparisons MUST use factual features only — no subjective judgments"
Enforcement Checklist
■ Tick all six before publishing an agent
Top-of-block positionRule appears in a Prime Directive section before any routing or format logic.
Output contract framingWritten as a required element of the response shape, not advisory language.
Full shape enumerationEvery output format named explicitly: email, QBR, snippet, one-liner, draft, narrative.
Escape hatches closedEach exception narrowly defined with an explicit counter-case stated.
Retrieval gateSource-dependent rules require retrieval if the agent cannot name the authoritative URL.
Pre-send self-checkMandatory verification: (a) element present, (b) covers all claims, (c) formatting preserved it.
■ Key Finding — Section 8
The most common cause of "my agent doesn't do X consistently" is not a missing rule. It is a rule written as compliance instead of as an output contract, placed after the classification logic, and left with escape hatches the model can read itself into. Fixing consistency rarely requires more rules — it requires rewriting the existing rule as a non-negotiable element of the response itself. The difference between an invitation to comply and a contractual obligation is whether the behavior shows up in every response the agent produces, or only in the output formats the builder explicitly tested.