Memlayer Quickstart
Get started with Memlayer in under 5 minutes. This guide shows you how to add persistent memory to any LLM.
Installation
pip install memlayer
Provider-Specific Dependencies
Install the SDK for your chosen provider:
# OpenAI
pip install openai
# Anthropic Claude
pip install anthropic
# Google Gemini
pip install google-generativeai
# Ollama (local models)
pip install ollama
Quick Start Examples
OpenAI
from memlayer.wrappers.openai import OpenAI
# Initialize with memory
client = OpenAI(
api_key="your-openai-api-key",
model="gpt-4.1-mini",
user_id="alice"
)
# First conversation - teach it something
response = client.chat([
{"role": "user", "content": "My name is Alice and I work on Project Phoenix"}
])
print(response)
# Later conversation - it remembers!
response = client.chat([
{"role": "user", "content": "What project do I work on?"}
])
print(response)
# Output: "You work on Project Phoenix."
Anthropic Claude
from memlayer.wrappers.claude import Claude
client = Claude(
api_key="your-anthropic-api-key",
model="claude-3-5-sonnet-20241022",
user_id="alice"
)
# Use exactly like OpenAI wrapper
response = client.chat([
{"role": "user", "content": "Remember: my favorite color is blue"}
])
Google Gemini
from memlayer.wrappers.gemini import Gemini
client = Gemini(
api_key="your-gemini-api-key",
model="gemini-2.5-flash",
user_id="alice"
)
response = client.chat([
{"role": "user", "content": "I live in San Francisco"}
])
Ollama (Local Models)
from memlayer.wrappers.ollama import Ollama
# Make sure Ollama server is running: ollama serve
client = Ollama(
model="llama3.2",
host="http://localhost:11434",
user_id="alice",
operation_mode="local" # Use local embeddings too
)
response = client.chat([
{"role": "user", "content": "My dog's name is Max"}
])
Environment Variables
Instead of passing API keys in code, use environment variables:
# OpenAI
export OPENAI_API_KEY="your-key"
# Anthropic Claude
export ANTHROPIC_API_KEY="your-key"
# Google Gemini
export GOOGLE_API_KEY="your-key"
Then initialize without the api_key parameter:
from memlayer.wrappers.openai import OpenAI
client = OpenAI(
model="gpt-4.1-mini",
user_id="alice"
)
Basic Usage Patterns
1. Regular Chat (Non-Streaming)
# Single turn
response = client.chat([
{"role": "user", "content": "Hello!"}
])
# Multi-turn conversation
messages = [
{"role": "user", "content": "My birthday is May 15th"},
{"role": "assistant", "content": "I'll remember that!"},
{"role": "user", "content": "When is my birthday?"}
]
response = client.chat(messages)
2. Streaming Chat
# Stream response chunks as they arrive
for chunk in client.chat(
[{"role": "user", "content": "Tell me a story"}],
stream=True
):
print(chunk, end="", flush=True)
print() # Newline after stream completes
3. Direct Knowledge Import
# Import knowledge from documents/emails/notes
client.update_from_text("""
Meeting Notes - Nov 15, 2025:
- Q4 deadline is December 20th
- Budget increased by 15%
- New team member: Bob (joins Monday)
""")
# Now the LLM can answer questions about this
response = client.chat([
{"role": "user", "content": "When is the Q4 deadline?"}
])
# Output: "The Q4 deadline is December 20th."
4. Memory-Grounded Q&A
# Get a synthesized answer with sources
answer_obj = client.synthesize_answer(
"What do we know about Project Phoenix?",
return_object=True
)
print(f"Answer: {answer_obj.answer}")
print(f"Sources: {answer_obj.sources}")
print(f"Confidence: {answer_obj.confidence}")
Configuration Basics
User Isolation
Each user_id gets an isolated memory space:
alice_client = OpenAI(model="gpt-4.1-mini", user_id="alice")
bob_client = OpenAI(model="gpt-4.1-mini", user_id="bob")
# Alice's memories don't leak to Bob
alice_client.chat([{"role": "user", "content": "My secret is XYZ"}])
bob_response = bob_client.chat([{"role": "user", "content": "What's Alice's secret?"}])
# Bob won't know - different memory spaces
Storage Paths
Customize where memories are stored:
client = OpenAI(
model="gpt-4.1-mini",
user_id="alice",
chroma_dir="./memories/vector_db", # Vector embeddings
networkx_path="./memories/graph.pkl" # Knowledge graph
)
Operation Modes
Choose how embeddings are computed:
# Online mode (default) - uses OpenAI API for embeddings
client = OpenAI(model="gpt-4.1-mini", operation_mode="online")
# Local mode - uses local sentence-transformer (no API calls)
client = OpenAI(model="gpt-4.1-mini", operation_mode="local")
# Lightweight mode - no embeddings, graph-only (fastest startup)
client = OpenAI(model="gpt-4.1-mini", operation_mode="lightweight")
Common Patterns
Persistent Sessions
# Initialize once, reuse across application lifetime
client = OpenAI(model="gpt-4.1-mini", user_id="alice")
# All conversations automatically build on previous memories
client.chat([{"role": "user", "content": "I like pizza"}])
# ... later ...
client.chat([{"role": "user", "content": "What food do I like?"}])
# Remembers: "You like pizza"
Conversation History Management
# Memlayer handles memory automatically, but you control conversation history
conversation = []
# Turn 1
conversation.append({"role": "user", "content": "My name is Alice"})
response = client.chat(conversation)
conversation.append({"role": "assistant", "content": response})
# Turn 2
conversation.append({"role": "user", "content": "What's my name?"})
response = client.chat(conversation)
# LLM can answer from:
# 1. Conversation history (conversation list)
# 2. Long-term memory (knowledge graph)
Time-Sensitive Facts
# System automatically extracts expiration dates
client.chat([{
"role": "user",
"content": "The temporary password is 1234, valid for 24 hours"
}])
# After 24 hours, this fact is automatically removed by curation service
Next Steps
- Streaming Mode Guide: Learn about streaming responses
- Operation Modes: Architecture implications
- Search Tiers: Optimize search performance
- Ollama Setup: Run completely offline with local models
- Examples: Browse complete working examples
Troubleshooting
"No module named 'memlayer'"
pip install memlayer
"API key not found"
Set your environment variable or pass api_key parameter:
client = OpenAI(api_key="your-key", ...)
"Ollama connection refused"
Start the Ollama server:
ollama serve
Slow first response
First call initializes salience gate (~1-2s). Subsequent calls are fast. Use operation_mode="lightweight" for instant startup in demos.
Memory not persisting
Check that chroma_dir and networkx_path are writable directories. By default, they're created in the current working directory.