Memlayer Quickstart

Get started with Memlayer in under 5 minutes. This guide shows you how to add persistent memory to any LLM.

Installation

pip install memlayer

Provider-Specific Dependencies

Install the SDK for your chosen provider:

# OpenAI
pip install openai

# Anthropic Claude
pip install anthropic

# Google Gemini
pip install google-generativeai

# Ollama (local models)
pip install ollama

Quick Start Examples

OpenAI

from memlayer.wrappers.openai import OpenAI

# Initialize with memory
client = OpenAI(
    api_key="your-openai-api-key",
    model="gpt-4.1-mini",
    user_id="alice"
)

# First conversation - teach it something
response = client.chat([
    {"role": "user", "content": "My name is Alice and I work on Project Phoenix"}
])
print(response)

# Later conversation - it remembers!
response = client.chat([
    {"role": "user", "content": "What project do I work on?"}
])
print(response)
# Output: "You work on Project Phoenix."

Anthropic Claude

from memlayer.wrappers.claude import Claude

client = Claude(
    api_key="your-anthropic-api-key",
    model="claude-3-5-sonnet-20241022",
    user_id="alice"
)

# Use exactly like OpenAI wrapper
response = client.chat([
    {"role": "user", "content": "Remember: my favorite color is blue"}
])

Google Gemini

from memlayer.wrappers.gemini import Gemini

client = Gemini(
    api_key="your-gemini-api-key",
    model="gemini-2.5-flash",
    user_id="alice"
)

response = client.chat([
    {"role": "user", "content": "I live in San Francisco"}
])

Ollama (Local Models)

from memlayer.wrappers.ollama import Ollama

# Make sure Ollama server is running: ollama serve
client = Ollama(
    model="llama3.2",
    host="http://localhost:11434",
    user_id="alice",
    operation_mode="local"  # Use local embeddings too
)

response = client.chat([
    {"role": "user", "content": "My dog's name is Max"}
])

Environment Variables

Instead of passing API keys in code, use environment variables:

# OpenAI
export OPENAI_API_KEY="your-key"

# Anthropic Claude
export ANTHROPIC_API_KEY="your-key"

# Google Gemini
export GOOGLE_API_KEY="your-key"

Then initialize without the api_key parameter:

from memlayer.wrappers.openai import OpenAI

client = OpenAI(
    model="gpt-4.1-mini",
    user_id="alice"
)

Basic Usage Patterns

1. Regular Chat (Non-Streaming)

# Single turn
response = client.chat([
    {"role": "user", "content": "Hello!"}
])

# Multi-turn conversation
messages = [
    {"role": "user", "content": "My birthday is May 15th"},
    {"role": "assistant", "content": "I'll remember that!"},
    {"role": "user", "content": "When is my birthday?"}
]
response = client.chat(messages)

2. Streaming Chat

# Stream response chunks as they arrive
for chunk in client.chat(
    [{"role": "user", "content": "Tell me a story"}],
    stream=True
):
    print(chunk, end="", flush=True)
print()  # Newline after stream completes

3. Direct Knowledge Import

# Import knowledge from documents/emails/notes
client.update_from_text("""
Meeting Notes - Nov 15, 2025:
- Q4 deadline is December 20th
- Budget increased by 15%
- New team member: Bob (joins Monday)
""")

# Now the LLM can answer questions about this
response = client.chat([
    {"role": "user", "content": "When is the Q4 deadline?"}
])
# Output: "The Q4 deadline is December 20th."

4. Memory-Grounded Q&A

# Get a synthesized answer with sources
answer_obj = client.synthesize_answer(
    "What do we know about Project Phoenix?",
    return_object=True
)

print(f"Answer: {answer_obj.answer}")
print(f"Sources: {answer_obj.sources}")
print(f"Confidence: {answer_obj.confidence}")

Configuration Basics

User Isolation

Each user_id gets an isolated memory space:

alice_client = OpenAI(model="gpt-4.1-mini", user_id="alice")
bob_client = OpenAI(model="gpt-4.1-mini", user_id="bob")

# Alice's memories don't leak to Bob
alice_client.chat([{"role": "user", "content": "My secret is XYZ"}])
bob_response = bob_client.chat([{"role": "user", "content": "What's Alice's secret?"}])
# Bob won't know - different memory spaces

Storage Paths

Customize where memories are stored:

client = OpenAI(
    model="gpt-4.1-mini",
    user_id="alice",
    chroma_dir="./memories/vector_db",      # Vector embeddings
    networkx_path="./memories/graph.pkl"    # Knowledge graph
)

Operation Modes

Choose how embeddings are computed:

# Online mode (default) - uses OpenAI API for embeddings
client = OpenAI(model="gpt-4.1-mini", operation_mode="online")

# Local mode - uses local sentence-transformer (no API calls)
client = OpenAI(model="gpt-4.1-mini", operation_mode="local")

# Lightweight mode - no embeddings, graph-only (fastest startup)
client = OpenAI(model="gpt-4.1-mini", operation_mode="lightweight")

Common Patterns

Persistent Sessions

# Initialize once, reuse across application lifetime
client = OpenAI(model="gpt-4.1-mini", user_id="alice")

# All conversations automatically build on previous memories
client.chat([{"role": "user", "content": "I like pizza"}])
# ... later ...
client.chat([{"role": "user", "content": "What food do I like?"}])
# Remembers: "You like pizza"

Conversation History Management

# Memlayer handles memory automatically, but you control conversation history
conversation = []

# Turn 1
conversation.append({"role": "user", "content": "My name is Alice"})
response = client.chat(conversation)
conversation.append({"role": "assistant", "content": response})

# Turn 2
conversation.append({"role": "user", "content": "What's my name?"})
response = client.chat(conversation)
# LLM can answer from:
# 1. Conversation history (conversation list)
# 2. Long-term memory (knowledge graph)

Time-Sensitive Facts

# System automatically extracts expiration dates
client.chat([{
    "role": "user", 
    "content": "The temporary password is 1234, valid for 24 hours"
}])

# After 24 hours, this fact is automatically removed by curation service

Next Steps

Streaming Mode Guide: Learn about streaming responses
Operation Modes: Architecture implications
Search Tiers: Optimize search performance
Ollama Setup: Run completely offline with local models
Examples: Browse complete working examples

Troubleshooting

"No module named 'memlayer'"

pip install memlayer

"API key not found"

Set your environment variable or pass api_key parameter:

client = OpenAI(api_key="your-key", ...)

"Ollama connection refused"

Start the Ollama server:

ollama serve

Slow first response

First call initializes salience gate (~1-2s). Subsequent calls are fast. Use operation_mode="lightweight" for instant startup in demos.

Memory not persisting

Check that chroma_dir and networkx_path are writable directories. By default, they're created in the current working directory.