Skip to content

Provider-Specific Documentation

Memlayer supports multiple LLM providers with a unified API. Each provider has specific configuration requirements and features documented here.

Supported Providers

OpenAI

  • Models: GPT-4.1, GPT 5 etc
  • Streaming: ✅ Full support
  • Best for: Production applications, fastest API responses
  • Setup: Requires OPENAI_API_KEY environment variable

Anthropic Claude

  • Models: Claude 4.5 Sonnet, Claude 4 Opus, Claude 4 Haiku
  • Streaming: ✅ Full support
  • Best for: Long conversations, complex reasoning
  • Setup: Requires ANTHROPIC_API_KEY environment variable

Google Gemini

  • Models: Gemini 2.5 Flash, Gemini 2.5 Pro
  • Streaming: ✅ Full support
  • Best for: Multimodal applications, cost efficiency
  • Setup: Requires GOOGLE_API_KEY environment variable

Ollama (Local Models)

  • Models: Llama 3.2, Llama 3.1, Mistral, Phi 3, 100+ more
  • Streaming: ✅ Full support
  • Best for: Privacy, offline use, zero API costs
  • Setup: Requires local Ollama server (ollama serve)

LMStudio (Local Models)

  • Models: Llama 4, Qwen 3, 100+ more
  • Streaming: ✅ Full support
  • Best for: Privacy, offline use, zero API costs
  • Setup: Requires local LMStudio server

Quick Comparison

Provider API Cost Latency Privacy Offline
OpenAI $$ Fast Cloud
Claude $$ Fast Cloud
Gemini $ Fast Cloud
Ollama Free Medium Local
LMStudio Free Medium Local

Configuration Basics

All providers share the same Memlayer API:

from memlayer import OpenAI
from memlayer import Claude
from memlayer import Gemini
from memlayer import Ollama
from memlayer import LMStudio
# OpenAI
client = OpenAI(
    api_key="your-key",
    model="gpt-4.1-mini",
    user_id="alice"
)

# Claude
client = Claude(
    api_key="your-key",
    model="claude-3-5-sonnet-20241022",
    user_id="alice"
)

# Gemini
client = Gemini(
    api_key="your-key",
    model="gemini-2.5-flash",
    user_id="alice"
)

# Ollama (local)
client = Ollama(
    model="llama3.2",
    host="http://localhost:11434",
    user_id="alice",
    operation_mode="local"  # Fully offline
)

client = LMStudio(
    model="llama3.2",
    host="http://localhost:1234/v1",
    user_id="alice",
    operation_mode="local"  # Fully offline
)

Common Features Across All Providers

Memory & Knowledge Graph

All providers support: - ✅ Automatic knowledge extraction - ✅ Persistent memory across sessions - ✅ Hybrid search (vector + graph) - ✅ Time-aware facts with expiration - ✅ User-isolated memory spaces

Streaming Responses

All providers support streaming:

for chunk in client.chat([
    {"role": "user", "content": "Tell me a story"}
], stream=True):
    print(chunk, end="", flush=True)

Operation Modes

All providers support three operation modes: - online: API-based embeddings (fast startup) - local: Local embeddings (privacy, offline) - lightweight: No embeddings (instant startup)

Provider-Specific Pages

Click on any provider below for detailed setup instructions:

  • openai.md — OpenAI configuration, models, and tips
  • claude.md — Anthropic Claude setup and features
  • gemini.md — Google Gemini configuration
  • ollama.md🆕 Complete guide to local models: installation, model recommendations, fully offline setup
  • lmstudio.md🆕 Complete guide to LMStudio local models: installation, model recommendations, fully offline setup

Getting Started

  1. Choose a provider based on your needs (cost, privacy, performance)
  2. Set up credentials (see individual provider pages)
  3. Follow the quickstartdocs/basics/quickstart.md
  4. Enable streaming (optional) — docs/basics/streaming.md