Implementation Notes

Technical design for the git-centric journal server

Core Concept

The journal server uses git as both storage engine and identifier system. Each journal section is a single file containing the current overview/synthesis, with incremental journal entries stored as git commit messages. This creates an elegant inversion where:

File contents: Always the current understanding (overview)
Commit messages: The incremental journey (journal entries)
Git history: The complete collaborative record
Git merges: Natural collaboration mechanism

File Structure

Each journal section is simply a markdown file:

journal-data/
├── project-alpha.md           # Current overview of project-alpha
├── project-beta/
│   ├── api-design.md         # Current overview of project-beta/api-design
│   └── error-handling.md     # Current overview of project-beta/error-handling
└── .git/                     # Git repository containing all history

Identifier Scheme

Journal identifiers use the format path#hash where #hash is optional:

Current overview: project-alpha/api-design
Specific journal entry: project-alpha/api-design#abc123def

The hash refers to the git commit SHA that contains the journal entry in its commit message.

MCP Server Tools

journal_search

Search journal entries by work context and content across git commit history:

Tool(
    name="journal_search",
    description="Search journal entries by work context and content",
    inputSchema={
        "type": "object",
        "properties": {
            "work_context": {"type": "string", "description": "The broader kind of work being done"},
            "content": {"type": "string", "description": "Specific content being sought"},
            "salience_threshold": {"type": "number", "default": 0.5}
        },
        "required": ["work_context", "content"]
    }
)

Returns: List of journal entries with scores and metadata:

[
    {
        "id": "project-alpha/api-design#abc123def",
        "content": "work_context: debugging memory retrieval\n\n# Today's Session...",
        "work_context_score": 0.85,
        "content_score": 0.72,
        "combined_score": 0.785,
        "timestamp": "2024-07-21T18:00:00Z"
    }
]

journal_read

Read a journal overview or specific entry:

Tool(
    name="journal_read",
    description="Read a journal overview or specific entry",
    inputSchema={
        "type": "object",
        "properties": {
            "id": {"type": "string", "description": "Journal identifier (e.g., 'project-alpha/api-design' or 'project-alpha/api-design#abc123')"}
        },
        "required": ["id"]
    }
)

Behavior:

project-alpha/api-design → Returns current file contents (overview)
project-alpha/api-design#abc123 → Returns commit message from that SHA (journal entry)
Server remembers what was read for conflict detection

journal_toc

Get the hierarchical structure of journal sections:

Tool(
    name="journal_toc", 
    description="Get the table of contents showing journal sections and subsections",
    inputSchema={
        "type": "object",
        "properties": {
            "id": {"type": "string", "description": "Starting point for TOC query (empty string for root)", "default": ""},
            "depth": {"type": "number", "description": "How many levels deep to descend", "default": 1}
        }
    }
)

Returns: Hierarchical structure with basic metadata:

{
    "id": "project-alpha",
    "type": "section",
    "last_updated": "2024-07-21T18:00:00Z",
    "entry_count": 47,  # git rev-list --count
    "subsections": [
        {
            "id": "project-alpha/api-design",
            "type": "section", 
            "last_updated": "2024-07-20T15:30:00Z",
            "entry_count": 12
        }
    ]  # if depth > 1
}

journal_list_entries

List entries for a specific journal section with chronological paging:

Tool(
    name="journal_list_entries",
    description="List entries for a specific journal section",
    inputSchema={
        "type": "object", 
        "properties": {
            "path": {"type": "string", "description": "Journal section path"},
            "start": {"type": "number", "description": "Starting index (0 = most recent)", "default": 0},
            "length": {"type": "number", "description": "Number of entries to return", "default": 10}
        },
        "required": ["path"]
    }
)

Returns: Chronological list of entries:

[
    {"id": "project-alpha#abc123", "timestamp": "2024-07-21T18:00:00Z", "summary": "debugging session"},
    {"id": "project-alpha#def456", "timestamp": "2024-07-20T15:30:00Z", "summary": "api design work"}
]

journal_write

Add a new journal entry and optionally update the overview synthesis:

Tool(
    name="journal_write",
    description="Add a new journal entry and optionally update the overview synthesis",
    inputSchema={
        "type": "object",
        "properties": {
            "path": {"type": "string", "description": "Journal section path (no #hash)"},
            "entry": {"type": "string", "description": "Journal entry that covers what has changed, been learned, etc. (becomes commit message)"},
            "overview": {"type": "string", "description": "Optional updated overview content when the entry represents a shift in overall understanding or strategy"},
            "summary": {"type": "string", "description": "Optional brief summary for the commit"}
        },
        "required": ["path", "entry"]
    }
)

Write Protection:

Writing only permitted after reading the journal section
Server tracks {session_id: {path: last_read_commit_hash}}
If HEAD has moved since read, returns merge error
Client must re-read and retry

Git Workflow

Adding Journal Entries

Each journal update creates a git commit with two distinct patterns:

Entry-only commits (most common):

Read current state: journal_read("project-alpha/api-design") (server remembers HEAD)
Add journal entry: Call journal_write with just entry parameter
File modification: System increments entry count comment based on current git history
Git commit: Full entry goes in commit message, minimal file change enables git tracking
Conflict resolution: If entry count conflicts, resolve by counting actual commits in git history for this path

Entry + overview commits (consolidation moments):

Read current state: Same as above
Update understanding: Call journal_write with both entry and overview parameters
File replacement: New overview content replaces file, entry log section preserved
Git commit: Entry in commit message, substantial file change captures new synthesis

File Structure with Entry Count

Journal files maintain a clean overview section plus an entry count for conflict avoidance:

# Current Understanding of API Design

Our current approach focuses on REST endpoints with...

[Main overview content here]

<!-- entry count: 47 -->

When journal_read loads overview content, it strips the entry count comment before returning to the LLM. The count represents the number of journal entries (git commits) for this section and provides a meaningful way to create file changes that can be automatically merged.

Commit Message Format

Commit messages contain the journal entry with structured metadata:

work_context: debugging memory retrieval issues

# Today's Debugging Session

We discovered that the async retrieval pattern was failing because...

Key insights:
- Pattern X works better than Y when dealing with temporal data  
- The salience threshold needs to be context-dependent

This led us to update our understanding of error handling patterns...

Conflict Resolution

For file conflicts: Auto-rebase and merge - journal entries are typically independent

For overview conflicts: LLM synthesis tool merges conflicting understandings:

Tool(
    name="journal_synthesize_conflict",
    description="Synthesize conflicting journal overviews using LLM",
    inputSchema={
        "section": "project-alpha",
        "version_a": "# Understanding from session 1...",
        "version_b": "# Understanding from session 2...", 
        "work_context": "what kind of work led to this conflict"
    }
)

Search Implementation

Dual-Dimension Matching

Search operates on git commit messages using semantic embeddings:

class JournalSearch:
    def __init__(self, git_repo, embeddings_model):
        self.repo = git_repo
        self.embeddings = embeddings_model
    
    async def search(self, work_context: str, content: str, salience_threshold: float = 0.5):
        # Get all commits across all journal files
        commits = self.repo.iter_commits(all=True)
        
        # Extract commit messages and metadata
        candidates = []
        for commit in commits:
            if self.is_journal_commit(commit):
                candidates.append({
                    'id': f"{self.get_journal_path(commit)}#{commit.hexsha[:7]}",
                    'content': commit.message,
                    'timestamp': commit.committed_datetime,
                    'salience': self.calculate_temporal_salience(commit.committed_datetime)
                })
        
        # Filter by temporal salience
        candidates = [c for c in candidates if c['salience'] >= salience_threshold]
        
        # Score both dimensions
        results = []
        for candidate in candidates:
            work_context_score = await self.semantic_similarity(work_context, candidate['content'])
            content_score = await self.semantic_similarity(content, candidate['content'])
            combined_score = (work_context_score + content_score) / 2
            
            if combined_score > salience_threshold:
                results.append({
                    **candidate,
                    'work_context_score': work_context_score,
                    'content_score': content_score,
                    'combined_score': combined_score
                })
        
        return sorted(results, key=lambda x: x['combined_score'], reverse=True)

Temporal Salience

Recent commits are more easily accessible, older commits require higher relevance:

def calculate_temporal_salience(commit_timestamp: datetime) -> float:
    age_days = (datetime.now() - commit_timestamp).days
    half_life_days = 30  # Configurable
    decay_factor = 0.5 ** (age_days / half_life_days)
    return decay_factor

Session Management

The server maintains session state for conflict detection:

class SessionManager:
    def __init__(self):
        self.session_reads = {}  # {session_id: {path: commit_hash}}
    
    def record_read(self, session_id: str, path: str, commit_hash: str):
        if session_id not in self.session_reads:
            self.session_reads[session_id] = {}
        self.session_reads[session_id][path] = commit_hash
    
    def check_conflicts(self, session_id: str, path: str, current_head: str) -> bool:
        if session_id not in self.session_reads:
            return True  # No read recorded, conflict
        if path not in self.session_reads[session_id]:
            return True  # Path not read, conflict
        return self.session_reads[session_id][path] != current_head

Configuration

{
    "journal_data_path": "./journal-data",
    "git_config": {
        "auto_gc": true,
        "commit_author": "Journal Server <journal@localhost>"
    },
    "temporal_decay": {
        "half_life_days": 30,
        "minimum_salience": 0.1
    },
    "search": {
        "default_salience_threshold": 0.5,
        "max_results": 20,
        "context_weight": 0.5,
        "content_weight": 0.5
    },
    "embeddings": {
        "model": "sentence-transformers/all-MiniLM-L6-v2",
        "cache_path": "./embeddings-cache"
    }
}

Future Enhancements

Git synchronization: Pull/push for multi-user collaboration
Branch support: Explore different understanding paths
Merge strategies: Advanced conflict resolution patterns
Performance optimization: Incremental search indexing
Rich commit metadata: Structured frontmatter in commit messages

Why This Design Works

This git-centric approach elegantly solves several problems:

Natural collaboration: Git's merge machinery handles multiple sessions
Simple storage: Just markdown files + git, no complex databases
Rich history: Full journey preserved in commit messages
Familiar tooling: Standard git commands work for exploration
Conflict resolution: Leverages both git automation and LLM synthesis
Temporal relevance: Git timestamps provide natural salience decay

The journal becomes a living document where the current understanding is always visible in the file, while the collaborative journey lives in the git history.

This design transforms git from a version control system into a collaborative memory engine.

Socratic Shell