Memlayer Modes

Memlayer supports three operating modes, each optimized for different use cases.

Key Difference: These modes control both salience filtering AND storage architecture.

LOCAL Mode (Default)

Best for: High-volume applications, offline usage, no ongoing costs

Uses local sentence-transformers models for both salience filtering and vector embeddings.

from memlayer.wrappers.openai import OpenAI

client = OpenAI(
    storage_path="./memories",
    user_id="user123",
    salience_mode="local"  # Default
)

Characteristics: - ✅ High accuracy with semantic understanding - ✅ No API costs after initial setup - ✅ Works completely offline - ✅ Shared model across components (optimized) - ✅ Full semantic vector search - ❌ Slow startup (~7-8s model loading) - ❌ Requires ~500MB disk space for model

Storage: Vector (ChromaDB) + Graph (NetworkX) Startup Time: ~8 seconds (first use) Per-Check Cost: $0 (free) Search Quality: High (semantic similarity)

ONLINE Mode

Best for: Production apps, serverless functions, fast cold starts

Uses OpenAI's embeddings API for both salience filtering and vector embeddings.

import os

client = OpenAI(
    storage_path="./memories",
    user_id="user123",
    salience_mode="online",
    api_key=os.getenv("OPENAI_API_KEY")  # Required
)

Characteristics: - ✅ Fast startup (~2-3s, no model loading) - ✅ No local model storage needed - ✅ Always up-to-date embeddings - ✅ Scales to serverless/edge environments - ✅ Full semantic vector search - ❌ API cost per operation (~$0.0001-0.0002) - ❌ Requires internet connection - ❌ Depends on OpenAI API availability

Storage: Vector (ChromaDB) + Graph (NetworkX) Startup Time: ~2 seconds Per-Check Cost: ~$0.0001 salience + ~$0.0001 storage (0.02¢ total) Search Quality: High (semantic similarity)

Cost Estimate: - 10,000 operations/month = ~$2.00 - 100,000 operations/month = ~$20.00

LIGHTWEIGHT Mode

Best for: Prototyping, resource-constrained environments, maximum speed

Uses keyword matching for salience and graph-only storage (no embeddings at all).

client = OpenAI(
    storage_path="./memories",
    user_id="user123",
    salience_mode="lightweight"
)

Characteristics: - ✅ Instant startup (< 1s) - ✅ No dependencies (no ML models) - ✅ No API costs - ✅ Minimal memory footprint - ✅ Perfect for rapid prototyping - ✅ Graph-based memory retrieval - ❌ No semantic search (keyword/graph only) - ❌ Lower accuracy (rule-based salience) - ❌ May miss nuanced content

Storage: Graph-only (NetworkX) - no vector storage Startup Time: < 1 second Per-Check Cost: $0 (free) Search Quality: Medium (graph traversal + keywords)

Comparison Table

Feature	LOCAL	ONLINE	LIGHTWEIGHT
Startup Time	~8s	~2s	<1s
Per-Operation Cost	$0	~$0.0002	$0
Salience Method	Semantic (local)	Semantic (API)	Keywords
Storage Type	Vector + Graph	Vector + Graph	Graph only
Search Quality	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Offline Support	✅ Yes	❌ No	✅ Yes
Disk Space	~500MB	~0MB	~0MB
Dependencies	sentence-transformers	openai	None
Best For	High-volume	Production	Prototyping

When to Use Each Mode

Use LOCAL when:

Running long-lived applications (servers, desktop apps)
Processing high volumes (>100k checks/month)
Need offline operation
Startup time doesn't matter
Want zero ongoing costs

Use ONLINE when:

Deploying to serverless (Lambda, Cloud Functions)
Need fast cold starts
Running on edge/mobile environments
Volume is moderate (<100k checks/month)
API cost is acceptable

Use LIGHTWEIGHT when:

Rapid prototyping and testing
Extremely resource-constrained environments
Maximum speed is critical
Accuracy requirements are relaxed
No internet connectivity

Benchmarking

Run the comparison script to see performance on your hardware:

python examples/compare_salience_modes.py

Example output:

Mode             Init Time       First Chat      Total First Use
----------------------------------------------------------------------
LIGHTWEIGHT        0.234s         2.156s           2.390s
ONLINE             1.892s         2.301s           4.193s
LOCAL             11.234s         2.189s          13.423s

Advanced Configuration

Combining with Custom Thresholds

# Strict LIGHTWEIGHT (only obvious facts)
client = OpenAI(
    salience_mode="lightweight",
    salience_threshold=0.2  # Higher = stricter
)

# Permissive ONLINE (save most content)
client = OpenAI(
    salience_mode="online",
    salience_threshold=-0.05  # Lower = more permissive
)

Mode-Specific Tips

LOCAL Mode: - Share embedding_model between clients for faster multi-client init - Model caching saves ~11s when creating multiple clients in same process

ONLINE Mode: - Prototype embeddings are cached at init time (~2s one-time cost) - Each salience check makes 1 API call (~$0.0001)

LIGHTWEIGHT Mode: - Customize keywords by editing SALIENT_KEYWORDS and NON_SALIENT_KEYWORDS in ml_gate.py - Adjust threshold to control sensitivity

Implementation Details

All three modes share the same two-stage filtering:

Fast Heuristic Filter (< 1ms)
Regex pattern matching
Catches obvious salient/non-salient content
Same across all modes
Semantic/Keyword Check (mode-specific)
LOCAL: Sentence-transformer embeddings + cosine similarity
ONLINE: OpenAI embeddings + cosine similarity
LIGHTWEIGHT: TF-IDF keyword matching

Migration Guide

From LOCAL to ONLINE

# Before
client = OpenAI(salience_mode="local")

# After
client = OpenAI(
    salience_mode="online",
    api_key=os.getenv("OPENAI_API_KEY")
)

Benefit: 10s faster startup, scales to serverless Cost: ~$0.0001 per salience check

From LOCAL to LIGHTWEIGHT

# Before
client = OpenAI(salience_mode="local")

# After
client = OpenAI(salience_mode="lightweight")

Benefit: 11s faster startup, no dependencies Trade-off: ~5-10% lower accuracy on edge cases

FAQ

Q: Can I switch modes after initialization? A: No, mode is set during __init__(). Create a new client to change modes.

Q: Which mode is most cost-effective? A: LOCAL for >100k checks/month, ONLINE for <100k, LIGHTWEIGHT for prototyping.

Q: Does ONLINE mode require OpenAI API key? A: Yes, it uses OpenAI's embeddings API. Set OPENAI_API_KEY environment variable.

Q: Can I use ONLINE mode with other LLM providers? A: Currently only OpenAI embeddings are supported for ONLINE mode. Use LOCAL or LIGHTWEIGHT with other providers.

Q: How accurate is LIGHTWEIGHT mode? A: ~80-90% of LOCAL/ONLINE accuracy on typical conversations. Lower on nuanced content.

Next Steps

Try all three modes with examples/compare_salience_modes.py
Read the Performance Guide for optimization tips
Check Examples for usage patterns