Provider-Specific Documentation

Memlayer supports multiple LLM providers with a unified API. Each provider has specific configuration requirements and features documented here.

Supported Providers

OpenAI

Models: GPT-4.1, GPT 5 etc
Streaming: ✅ Full support
Best for: Production applications, fastest API responses
Setup: Requires OPENAI_API_KEY environment variable

Anthropic Claude

Models: Claude 4.5 Sonnet, Claude 4 Opus, Claude 4 Haiku
Streaming: ✅ Full support
Best for: Long conversations, complex reasoning
Setup: Requires ANTHROPIC_API_KEY environment variable

Google Gemini

Models: Gemini 2.5 Flash, Gemini 2.5 Pro
Streaming: ✅ Full support
Best for: Multimodal applications, cost efficiency
Setup: Requires GOOGLE_API_KEY environment variable

Ollama (Local Models)

Models: Llama 3.2, Llama 3.1, Mistral, Phi 3, 100+ more
Streaming: ✅ Full support
Best for: Privacy, offline use, zero API costs
Setup: Requires local Ollama server (ollama serve)

LMStudio (Local Models)

Models: Llama 4, Qwen 3, 100+ more
Streaming: ✅ Full support
Best for: Privacy, offline use, zero API costs
Setup: Requires local LMStudio server

Quick Comparison

Provider	API Cost	Latency	Privacy	Offline
OpenAI	$$	Fast	Cloud	❌
Claude	$$	Fast	Cloud	❌
Gemini	$	Fast	Cloud	❌
Ollama	Free	Medium	Local	✅
LMStudio	Free	Medium	Local	✅

Configuration Basics

All providers share the same Memlayer API:

from memlayer import OpenAI
from memlayer import Claude
from memlayer import Gemini
from memlayer import Ollama
from memlayer import LMStudio
# OpenAI
client = OpenAI(
    api_key="your-key",
    model="gpt-4.1-mini",
    user_id="alice"
)

# Claude
client = Claude(
    api_key="your-key",
    model="claude-3-5-sonnet-20241022",
    user_id="alice"
)

# Gemini
client = Gemini(
    api_key="your-key",
    model="gemini-2.5-flash",
    user_id="alice"
)

# Ollama (local)
client = Ollama(
    model="llama3.2",
    host="http://localhost:11434",
    user_id="alice",
    operation_mode="local"  # Fully offline
)

client = LMStudio(
    model="llama3.2",
    host="http://localhost:1234/v1",
    user_id="alice",
    operation_mode="local"  # Fully offline
)

Common Features Across All Providers

Memory & Knowledge Graph

All providers support: - ✅ Automatic knowledge extraction - ✅ Persistent memory across sessions - ✅ Hybrid search (vector + graph) - ✅ Time-aware facts with expiration - ✅ User-isolated memory spaces

Streaming Responses

All providers support streaming:

for chunk in client.chat([
    {"role": "user", "content": "Tell me a story"}
], stream=True):
    print(chunk, end="", flush=True)

Operation Modes

All providers support three operation modes: - online: API-based embeddings (fast startup) - local: Local embeddings (privacy, offline) - lightweight: No embeddings (instant startup)

Provider-Specific Pages

Click on any provider below for detailed setup instructions:

openai.md — OpenAI configuration, models, and tips
claude.md — Anthropic Claude setup and features
gemini.md — Google Gemini configuration
ollama.md — 🆕 Complete guide to local models: installation, model recommendations, fully offline setup
lmstudio.md — 🆕 Complete guide to LMStudio local models: installation, model recommendations, fully offline setup

Getting Started

Choose a provider based on your needs (cost, privacy, performance)
Set up credentials (see individual provider pages)
Follow the quickstart — docs/basics/quickstart.md
Enable streaming (optional) — docs/basics/streaming.md

Basics Overview: How Memlayer works
Quickstart Guide: Get started in 5 minutes
Streaming Mode: Stream responses from any provider
Operation Modes: Choose online, local, or lightweight mode
Examples: Working code for each provider