Python: [Generated by SRE Agent] docs: clarify checkpoint storage security model and deserialization trust boundaries (#6295)

* docs: clarify checkpoint storage security model and deserialization trust boundaries

Add Security Model documentation sections to the checkpoint encoding and
Azure Functions serialization modules explaining:
- Checkpoint storage is a trusted data source requiring access controls
- The RestrictedUnpickler allowlist is defense-in-depth, not a security boundary
- Developer responsibilities for securing storage backends
- Guidance on using allowed_types and strip_pickle_markers

Co-authored-by: Azure SRE Agent <noreply@microsoft.com>

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Azure SRE Agent <noreply@microsoft.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
chetantoshniwal
2026-06-09 09:53:48 -07:00
committed by GitHub
Unverified
parent 5e6eb6f121
commit 632f67b92e
2 changed files with 47 additions and 0 deletions
@@ -14,6 +14,24 @@ This module adds:
- reconstruct_to_type: for HITL responses where external data (without type markers)
needs to be reconstructed to a known type
- resolve_type: resolves 'module:class' type keys to Python types
Security Model
--------------
The underlying Azure Durable Functions storage (Azure Storage account) is the
trusted persistence layer for serialized checkpoint data. The
``RestrictedUnpickler`` in the core encoding module provides defense-in-depth
type filtering, but checkpoint storage itself must be properly access-controlled:
- Ensure the Azure Storage account used by Durable Functions is not publicly
writable and uses appropriate RBAC / shared-access policies.
- Never route untrusted user input directly into ``deserialize_value`` without
first calling :func:`strip_pickle_markers` to neutralize injection of
pickle markers into the data path.
- Configure your checkpoint storage with ``allowed_checkpoint_types`` (or call
``decode_checkpoint_value(..., allowed_types=...)`` directly) to restrict the set of types that can be deserialized.
See :mod:`agent_framework._workflows._checkpoint_encoding` for the full
security model documentation.
"""
from __future__ import annotations
@@ -13,6 +13,35 @@ during deserialization. The default built-in safe set covers common Python
value types (primitives, datetime, uuid, ...), all ``agent_framework`` internal
types, and all ``openai.types`` types. Callers can extend the set by passing
additional ``"module:qualname"`` strings.
Security Model
--------------
Checkpoint storage is treated as a **trusted data source**. The serialization
format uses Python's ``pickle`` module which can execute arbitrary code during
deserialization. The ``RestrictedUnpickler`` provides a defense-in-depth
allowlist that limits instantiable classes, but it is **not** a security
boundary — certain allowlisted builtins (e.g. ``getattr``) are required for
legitimate object reconstruction (enums, named tuples) and cannot be removed
without breaking compatibility.
Developers **must** ensure that:
1. The checkpoint storage backend (file system, Cosmos DB, Azure Blob, Durable
Functions storage) is access-controlled and not writable by untrusted
parties.
2. Data flowing into ``decode_checkpoint_value`` originates exclusively from
the application's own checkpoint storage — never from user-supplied HTTP
requests, message payloads, or other untrusted sources.
3. The ``allowed_types`` parameter is specified whenever possible to restrict
the set of reconstructible types to the minimum required by the application.
4. Never pass untrusted external input to ``decode_checkpoint_value``. If you
must accept external JSON that might contain checkpoint markers, sanitize it
first (for example, :func:`agent_framework_azurefunctions._serialization.strip_pickle_markers`).
The allowlist is a mitigation that reduces attack surface but does not
eliminate the inherent risks of deserializing untrusted pickle data. Treat
your checkpoint storage with the same access controls you would apply to
application secrets or database credentials.
"""
from __future__ import annotations