← Back to Home

SDK Feedback: Memory Observability Tools

Summary

Add built-in observability and debugging tools to the Memori SDK that let developers understand why specific memories were recalled, inspect similarity scores, trace memory lineage, and diagnose recall failures. This transforms memory from a black box into a transparent, debuggable system.

The Problem

Memory-powered AI applications are difficult to debug because the retrieval process is opaque. When an agent gives an unexpected response, developers face a frustrating investigation:

Real-world impact: A support agent that "forgets" a user's previous issue mid-conversation is a critical bug, but without observability tools, debugging it requires manually querying the database and reverse-engineering the recall logic.

Proposed Solution

Introduce a debug=True mode and a companion MemoryInspector class that provides rich introspection into every memory operation. The debug mode should be zero-config to enable, with structured output that integrates with existing logging and observability tools.

Core Features

Recall Explainer

See every memory considered during recall, with similarity scores, ranking factors, and the final selection reason.

Query Analysis

Inspect how queries are parsed, embedded, and matched against the memory index.

Lineage Tracing

Track memory provenance from creation through every access, with full attribution chain.

Performance Profiling

Timing breakdowns for embedding, search, reranking, and total latency per operation.

Memory Health Metrics

Aggregate stats on memory freshness, access patterns, and potential staleness.

Export & Replay

Capture debug sessions for offline analysis or bug report attachments.

Developer Experience

from memori import MemoriClient # Enable debug mode client = MemoriClient(api_key="...", debug=True) # Perform a recall - debug info is captured automatically result = client.memory.recall( query="What is the user's preferred language?", attribution="user:456" ) # Access the debug report debug = result.debug # See all memories that were considered for candidate in debug.candidates: print(f"Memory: {candidate.content[:50]}...") print(f" Score: {candidate.similarity_score:.3f}") print(f" Recency boost: {candidate.recency_factor:.2f}") print(f" Final rank: {candidate.rank}") print(f" Selected: {candidate.selected}") # Understand why certain memories weren't selected print(f"Rejection reasons: {debug.rejection_summary}") # Performance breakdown print(f"Embedding time: {debug.timing.embedding_ms}ms") print(f"Search time: {debug.timing.search_ms}ms") print(f"Total: {debug.timing.total_ms}ms")

Example Debug Output

MEMORY RECALL DEBUG - query: "What is the user's preferred language?"
Attribution: user:456
Candidates evaluated: 12
Memories returned: 3
TOP CANDIDATES:
"User set language preference to Spanish" 0.923
"User asked about Spanish documentation" 0.847
"Conversation was in Spanish" 0.812
"User mentioned visiting Spain" (rejected: below threshold) 0.534
TIMING:
Embed query: 12ms
Vector search: 8ms
Rerank: 23ms
Total: 43ms

Memory Inspector CLI

For interactive debugging, provide a CLI tool that can inspect memory state:

# Inspect memories for a specific attribution $ memori inspect --attribution user:456 Found 47 memories for user:456 Type breakdown: preference: 12 fact: 23 summary: 8 rule: 4 Oldest: 2024-01-15 (342 days ago) Newest: 2025-01-28 (1 day ago) Average access frequency: 2.3/week Potential issues: - 3 memories have low access (>90 days since last recall) - 2 memories have conflicting content (preference type) # Simulate a recall without executing $ memori recall --dry-run --query "user's favorite color" --attribution user:456 DRY RUN - No memories will be modified Would return 2 memories: 1. "User's favorite color is blue" (score: 0.94, stored: 2024-06-12) 2. "User mentioned liking blue themes" (score: 0.78, stored: 2024-08-03)

Trade-offs Considered

Benefits

  • Dramatically faster debugging of memory-related issues
  • Builds developer confidence in the memory system
  • Enables data-driven tuning of recall thresholds
  • Reduces support burden with self-service diagnostics
  • Creates foundation for memory analytics features
  • CLI tool enables ops teams to investigate production issues

Drawbacks

  • Debug mode adds latency (collecting and structuring metadata)
  • Increased memory usage when debug info is retained
  • Risk of exposing sensitive memory content in logs
  • API surface area increases significantly
  • Debug output format becomes a compatibility concern
  • May encourage over-reliance on debugging vs. proper testing

Mitigations

Integration with Observability Stacks

The debug output should integrate seamlessly with common observability tools:

# OpenTelemetry integration from memori import MemoriClient from memori.telemetry import OTelExporter client = MemoriClient( api_key="...", telemetry_exporter=OTelExporter() # Sends spans to configured collector ) # Every memory operation creates a span with debug attributes result = client.memory.recall(query="...", attribution="...") # Spans include: # - memori.recall.candidates_evaluated: 12 # - memori.recall.memories_returned: 3 # - memori.recall.top_score: 0.923 # - memori.recall.embedding_ms: 12 # - memori.recall.search_ms: 8
# JSON logging for structured log aggregators import logging from memori import MemoriClient from memori.logging import JSONDebugHandler logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger("memori") logger.addHandler(JSONDebugHandler()) client = MemoriClient(api_key="...", debug=True) # Debug output goes to structured logs automatically # {"event": "memory.recall", "candidates": 12, "returned": 3, ...}

Alternatives Considered

1. External APM Integration Only

Rely on Datadog, New Relic, etc. for all observability. Rejected because it requires additional infrastructure, doesn't provide memory-specific insights, and creates vendor lock-in.

2. Verbose Logging Mode

Add a LOG_LEVEL=DEBUG that prints detailed logs. Partially adopted as a fallback, but structured debug objects are more useful for programmatic analysis than log parsing.

3. Separate Debug SDK

Ship a memori-debug package with enhanced tooling. Rejected because it fragments the ecosystem and makes debugging feel like an afterthought rather than a first-class feature.

Success Metrics

Recommendation

Ship memory observability tools as a core SDK feature. The ability to understand why memory behaves the way it does is essential for building reliable AI applications. Without these tools, developers are forced to treat memory as a black box, leading to frustration and reduced trust in the platform.

Proposed rollout:

  1. Phase 1 (4 weeks): Basic debug mode with candidate list and timing in Python SDK
  2. Phase 2 (6 weeks): CLI inspector tool and JSON export
  3. Phase 3 (8 weeks): OpenTelemetry integration and TypeScript SDK parity

Related Feedback

Local Development Mode

Have feedback on this proposal? Open an issue on GitHub Issues, explore the Memori Cookbook, or check the official docs.

Back to Home