SDK Feedback: Memory Observability Tools
Summary
Add built-in observability and debugging tools to the Memori SDK that let developers understand why specific memories were recalled, inspect similarity scores, trace memory lineage, and diagnose recall failures. This transforms memory from a black box into a transparent, debuggable system.
The Problem
Memory-powered AI applications are difficult to debug because the retrieval process is opaque. When an agent gives an unexpected response, developers face a frustrating investigation:
- No visibility into recall decisions. Developers can't see which memories were considered, their similarity scores, or why certain memories were ranked higher than others.
- Silent failures are common. If no memories match a query, the SDK returns an empty result with no explanation of what was searched or why nothing matched.
- Attribution tracing is manual. Understanding where a memory originated (which conversation, which user action) requires custom logging infrastructure.
- Performance bottlenecks are invisible. Slow recalls provide no breakdown of time spent on embedding generation vs. database queries vs. reranking.
- Memory drift is hard to detect. As memories accumulate, relevance can degrade, but there's no tooling to surface this pattern.
Proposed Solution
Introduce a debug=True mode and a companion MemoryInspector class that provides rich introspection into every memory operation. The debug mode should be zero-config to enable, with structured output that integrates with existing logging and observability tools.
Core Features
Recall Explainer
See every memory considered during recall, with similarity scores, ranking factors, and the final selection reason.
Query Analysis
Inspect how queries are parsed, embedded, and matched against the memory index.
Lineage Tracing
Track memory provenance from creation through every access, with full attribution chain.
Performance Profiling
Timing breakdowns for embedding, search, reranking, and total latency per operation.
Memory Health Metrics
Aggregate stats on memory freshness, access patterns, and potential staleness.
Export & Replay
Capture debug sessions for offline analysis or bug report attachments.
Developer Experience
Example Debug Output
Memory Inspector CLI
For interactive debugging, provide a CLI tool that can inspect memory state:
Trade-offs Considered
Benefits
- Dramatically faster debugging of memory-related issues
- Builds developer confidence in the memory system
- Enables data-driven tuning of recall thresholds
- Reduces support burden with self-service diagnostics
- Creates foundation for memory analytics features
- CLI tool enables ops teams to investigate production issues
Drawbacks
- Debug mode adds latency (collecting and structuring metadata)
- Increased memory usage when debug info is retained
- Risk of exposing sensitive memory content in logs
- API surface area increases significantly
- Debug output format becomes a compatibility concern
- May encourage over-reliance on debugging vs. proper testing
Mitigations
- Opt-in with zero overhead: Debug mode is disabled by default. When disabled, no debug data is collected, ensuring zero performance impact in production.
- Content redaction: Provide a
redact_content=Trueoption that shows memory metadata and scores without exposing actual content, safe for logging. - Sampling mode: For production observability, support
debug_sample_rate=0.01to collect debug info for 1% of requests. - Structured output: Debug data is available as typed objects, JSON, or OpenTelemetry spans, integrating with existing observability stacks.
- Versioned schema: Debug output schema is versioned, with deprecation warnings for breaking changes.
Integration with Observability Stacks
The debug output should integrate seamlessly with common observability tools:
Alternatives Considered
1. External APM Integration Only
Rely on Datadog, New Relic, etc. for all observability. Rejected because it requires additional infrastructure, doesn't provide memory-specific insights, and creates vendor lock-in.
2. Verbose Logging Mode
Add a LOG_LEVEL=DEBUG that prints detailed logs. Partially adopted as a fallback, but structured debug objects are more useful for programmatic analysis than log parsing.
3. Separate Debug SDK
Ship a memori-debug package with enhanced tooling. Rejected because it fragments the ecosystem and makes debugging feel like an afterthought rather than a first-class feature.
Success Metrics
- Debug mode adoption: Track percentage of API calls with debug enabled (target: 15% in development environments)
- Mean time to resolution: Survey developers on debugging time before/after feature launch
- Support ticket reduction: Measure decrease in "memory not working" support requests
- CLI usage: Track
memori inspectcommand invocations - Documentation engagement: Monitor traffic to debugging guide pages
Recommendation
Ship memory observability tools as a core SDK feature. The ability to understand why memory behaves the way it does is essential for building reliable AI applications. Without these tools, developers are forced to treat memory as a black box, leading to frustration and reduced trust in the platform.
Proposed rollout:
- Phase 1 (4 weeks): Basic debug mode with candidate list and timing in Python SDK
- Phase 2 (6 weeks): CLI inspector tool and JSON export
- Phase 3 (8 weeks): OpenTelemetry integration and TypeScript SDK parity
Related Feedback
Have feedback on this proposal? Open an issue on GitHub Issues, explore the Memori Cookbook, or check the official docs.
Back to Home