Hippo MVP Design Document
AI-Generated Salient Insights - Minimal Viable Prototype
Core Hypothesis
Can AI-generated insights + reinforcement learning + embracing messiness actually surface more valuable knowledge than traditional structured memory systems?
The key insights:
- Generate insights cheaply and frequently - let AI create many insights without perfect organization
- Let natural selection through reinforcement determine what survives - user feedback shapes what becomes prominent
- Embrace the mess - don't try to create highly structured taxonomies or perfect categorization
- Trust temporal scoring - let time, usage patterns, and reinforcement naturally organize knowledge
This approach contrasts with traditional knowledge management that emphasizes upfront structure, careful categorization, and manual curation. Instead, Hippo bets that organic emergence through usage patterns can be more effective than imposed structure.
MVP Scope
What It Does
- Automatic Insight Generation: AI generates insights continuously during conversation at natural moments (consolidation, "make it so", "ah-ha!" moments, pattern recognition)
- Simple Storage: Single JSON file with configurable path
- Natural Decay: Insights lose relevance over time unless reinforced
- Reinforcement: During consolidation moments, user can upvote/downvote insights
- Context-Aware Search: Retrieval considers both content and situational context with fuzzy matching
What It Doesn't Do (Yet)
- Graph connections between insights
- Complex reinforcement algorithms
- Cross-session learning
- Memory hierarchy (generic vs project-specific)
- Automatic insight detection triggers
Temporal Scoring System
Core Concept
Insights are ranked using a composite relevance score that combines four factors based on research in information retrieval systems. This ensures recently accessed, frequently used, and important insights surface first while maintaining contextual relevance.
Composite Relevance Formula
relevance = 0.30 × recency + 0.20 × frequency + 0.35 × importance + 0.15 × context
Weighting Rationale:
- Importance (35%): Highest weight - user feedback through reinforcement learning
- Recency (30%): Second highest - recently accessed insights are more likely relevant
- Frequency (20%): Regular usage indicates ongoing value
- Context (15%): Situational matching for query relevance
Temporal Factors
Recency Score
Exponential decay based on days since last access:
recency = exp(-0.05 × days_since_last_access)
- Recent access (day 0): score ≈ 1.0
- One week old: score ≈ 0.7
- One month old: score ≈ 0.2
Frequency Score
Uses 30-day sliding window to prevent dilution from ancient history:
frequency = total_accesses_in_last_30_days / 30
- Normalized to 0-1 range with maximum reasonable frequency cap
- Prevents "funny frequency behavior" where long gaps reduce scores
Active Day System
Time advances only when system is actively used, making scoring "vacation-proof":
- Calendar days without usage don't advance temporal calculations
- Ensures insights don't decay during periods of non-use
- Maintains relevance relationships based on actual usage patterns
Reinforcement Learning
Importance Modification
- Upvote:
new_importance = min(1.0, current_importance × 1.5)
- Downvote:
new_importance = current_importance × 0.5
- Decay:
current_importance = base_importance × 0.9^days_since_reinforcement
Learning Principle
User feedback (upvotes/downvotes) directly modifies importance, which has the highest weight in relevance calculation. This creates a feedback loop where valuable insights become more prominent over time.
Search Architecture
Two-Phase Process
- Scoring Phase: Compute relevance for all insights with minimal filtering
- Filtering Phase: Apply user-specified relevance ranges and pagination
Distribution Metadata
Search returns relevance distribution across all insights for the given query/situation, helping clients understand what additional data exists beyond filtered results.
Semantic Matching
- Content: Uses sentence transformers for semantic similarity with substring boost
- Situation: Combines exact matching (high score) with semantic similarity fallback
- Thresholds: Content and situation relevance must exceed 0.4 to be considered matches
Data Model
{
"active_day_counter": 15,
"last_calendar_date_used": "2025-07-26",
"insights": [
{
"uuid": "abc123-def456-789",
"content": "User prefers dialogue format over instruction lists",
"situation": ["design discussion", "collaboration patterns"],
"base_importance": 0.8,
"created_at": "2025-07-23T17:00:00Z",
"importance_last_modified_at": "2025-07-25T10:30:00Z",
"daily_access_counts": [
[1, 3], // Active day 1: 3 accesses
[5, 2], // Active day 5: 2 accesses
[15, 1] // Active day 15: 1 access
]
}
]
}
Key Design Principles
Active Day System: Time only advances when system is used, preventing decay during vacations or periods of non-use.
Bounded Storage: Access history limited to recent entries (typically 90) to prevent unbounded growth while maintaining sufficient data for frequency calculations.
Reinforcement Decay: Importance modifications decay over time, requiring ongoing reinforcement to maintain high relevance.
Situational Context: Multi-element situation arrays enable flexible matching against various contextual filters.
System Constants
Core parameters that tune the temporal scoring behavior:
- Recency decay rate: 0.05 per active day
- Frequency window: 30 active days
- Upvote multiplier: 1.5×
- Downvote multiplier: 0.5×
- Relevance weights: 30% recency, 20% frequency, 35% importance, 15% context
- Match thresholds: 0.4 for content and situation relevance
- Maximum reasonable frequency: 10 accesses per day (for normalization)
Philosophy: Embracing Messiness
Traditional knowledge management systems emphasize structure: taxonomies, categories, tags, hierarchies. Hippo takes the opposite approach - embrace the mess and let value emerge organically.
Why Embrace Messiness:
- Cognitive overhead: Structured systems require constant categorization decisions
- Premature optimization: We often don't know what will be valuable until later
- Natural emergence: Usage patterns reveal value better than upfront planning
- Reduced friction: No need to "file" insights perfectly before storing them
How Messiness Works in Hippo:
- Situational context instead of rigid categories - insights tagged with when/where they occurred
- Fuzzy matching - "debugging React" can surface "debugging authentication" insights
- Temporal scoring - let time and usage naturally separate wheat from chaff
- Reinforcement learning - user feedback shapes what becomes prominent over time
The bet: A messy system with good search and temporal scoring will outperform a perfectly organized system that's too expensive to maintain.
Implementation Architecture
MCP Server Interface
Hippo implements the Model Context Protocol (MCP) providing tools for:
- record_insight: Create new insights with content, situation, and importance
- search_insights: Query insights with semantic and situational filters
- modify_insight: Update content or apply reinforcement (upvote/downvote)
Storage Layer
- JSON file storage: Single configurable file for persistence
- In-memory operations: All temporal calculations performed in memory
- Bounded growth: Access history automatically pruned to prevent unbounded storage
Search Engine
- Semantic similarity: Uses sentence transformers for content matching
- Situational matching: Combines exact and semantic matching for context
- Composite scoring: Real-time relevance calculation using temporal factors
- Distribution metadata: Provides relevance distribution for client insight
Testing Strategy
Integration Testing Philosophy
Tests validate behavior through stable MCP interfaces rather than internal implementation details:
- Temporal scenarios: Create insights, advance time, verify scoring changes
- Controllable time: Test time controller allows arbitrary day advancement
- In-memory storage: Tests run without disk I/O for speed and isolation
- Realistic workflows: Tests mirror actual usage patterns
Key Test Coverage
- Recency decay: Validates exponential decay over time
- Frequency windows: Confirms 30-day sliding window prevents dilution
- Reinforcement learning: Verifies upvote/downvote effects on importance
- Search distribution: Ensures metadata accurately reflects available data
Future Considerations
Potential Enhancements
- Graph connections: Link related insights for enhanced discovery
- Automatic triggers: Detect natural insight generation moments
- Cross-session learning: Adapt scoring based on usage patterns
- Memory hierarchy: Separate generic vs project-specific insights
Key Design Decisions
Active Day System
Time advances only when the system is actively used, making all temporal calculations "vacation-proof". This ensures insights don't decay during periods of non-use while maintaining meaningful temporal relationships.
Composite Relevance Scoring
Rather than simple recency or frequency ranking, Hippo uses a research-based weighted formula combining multiple factors. This provides more nuanced ranking that reflects actual insight value.
Reinforcement Learning Integration
User feedback directly modifies importance scores, which carry the highest weight in relevance calculation. This creates a feedback loop where valuable insights become more prominent over time.
Situational Context Matching
Insights include multi-element situation arrays enabling flexible contextual search. This allows matching against various aspects of when/where insights occurred.
Bounded Storage Growth
Access history is automatically pruned to prevent unbounded growth while maintaining sufficient data for accurate frequency calculations.
Research Foundation
The temporal scoring system is based on established research in information retrieval systems, specifically the principle that relevance should combine:
- Temporal factors: Recency and frequency of access
- Content factors: Semantic similarity and importance
- Context factors: Situational relevance to current query
The specific weighting (30/20/35/15%) reflects the relative importance of these factors for knowledge management systems where user feedback (importance) should dominate over purely temporal factors.
Validation Approach
The system includes comprehensive integration tests that validate temporal behavior through realistic scenarios:
- Create insights with known characteristics
- Advance time using controllable test infrastructure
- Verify that relevance scores change as expected
- Confirm that reinforcement learning affects ranking appropriately
This testing approach ensures the temporal scoring system behaves correctly over time and validates the core hypothesis that AI-generated insights + user reinforcement can surface valuable knowledge effectively.
For detailed API specifications and implementation details, consult the source code and test suite.