MCP-Server für Ollama Cloud API — Web Search, Fetch, Chat, Embeddings, Model Management
  • Python 99.2%
  • Shell 0.5%
  • Dockerfile 0.3%
Find a file
JL 2844f50673 docs: update READMEs for new structure and git.ivory.green
- Fix project structure to reflect refactored code layout
- Replace GitHub/Codeberg URLs with git.ivory.green
- Fix format=json → response_format=json in examples
- Fix OpenCode path (~/.jcode/ → ~/.config/opencode/)
- Update test count to 230+
- Remove stale directory references (skill/, .github/, start_server.sh)
- Reduce client integrations to Claude + OpenCode
- Add Links section with CHANGELOG
- Fix mypy command (add --ignore-missing-imports)
- Fix MCP Inspector path
2026-06-16 11:00:14 +02:00
.gitea/ISSUE_TEMPLATE chore: cleanup unused files and imports 2026-06-16 10:23:01 +02:00
schemas Initial commit: ollama-web-mcp v0.1.0 2026-04-30 02:32:09 +02:00
skill docs(skill): full rewrite — 18 tools, 3 prompts, 2 resources, pitfalls 2026-05-16 11:55:36 +02:00
src/ollama_web_mcp refactor: split monolithic server.py into modular package structure 2026-06-16 10:39:10 +02:00
systemd feat(deploy): add systemd service file for production deployment (closes #9) 2026-06-16 10:11:17 +02:00
tests chore: cleanup unused files and imports 2026-06-16 10:23:01 +02:00
.dockerignore Initial commit: ollama-web-mcp v0.1.0 2026-04-30 02:32:09 +02:00
.env.example docs: comprehensive MCP client installation guide for 9 programs 2026-04-30 14:54:06 +02:00
.gitignore chore: comprehensive .gitignore, production hardening 2026-05-15 10:38:32 +02:00
.pre-commit-config.yaml Add pre-commit hooks, server tests, semantic-release config 2026-04-30 11:09:48 +02:00
.woodpecker.yml ci: add Woodpecker CI config for Codeberg 2026-05-15 10:39:46 +02:00
BENCHMARKS.md feat(benchmarks): add pytest-benchmark suite + BENCHMARKS.md (closes #14) 2026-06-16 10:11:17 +02:00
CHANGELOG.md v1.2.1 — Chat streaming truncation fix 2026-05-19 11:42:18 +02:00
CONTRIBUTING.md docs: enhance issue templates and CONTRIBUTING.md 2026-05-15 10:47:49 +02:00
docker-compose.yml chore: CI badges, gitea templates, docker-compose, tool-calling test, publish script 2026-05-15 22:45:57 +02:00
Dockerfile chore: comprehensive .gitignore, production hardening 2026-05-15 10:38:32 +02:00
INSTALLATION.md docs: comprehensive MCP client installation guide for 9 programs 2026-04-30 14:54:06 +02:00
LICENSE Initial commit: ollama-web-mcp v0.1.0 2026-04-30 02:32:09 +02:00
mcp.json Initial commit: ollama-web-mcp v0.1.0 2026-04-30 02:32:09 +02:00
opencode.json Initial commit: ollama-web-mcp v0.1.0 2026-04-30 02:32:09 +02:00
publish.sh chore: CI badges, gitea templates, docker-compose, tool-calling test, publish script 2026-05-15 22:45:57 +02:00
pyproject.toml feat(benchmarks): add pytest-benchmark suite + BENCHMARKS.md (closes #14) 2026-06-16 10:11:17 +02:00
README.de.md docs: update READMEs for new structure and git.ivory.green 2026-06-16 11:00:14 +02:00
README.md docs: update READMEs for new structure and git.ivory.green 2026-06-16 11:00:14 +02:00
start_server.sh Initial commit: ollama-web-mcp v0.1.0 2026-04-30 02:32:09 +02:00
uv.lock feat(benchmarks): add pytest-benchmark suite + BENCHMARKS.md (closes #14) 2026-06-16 10:11:17 +02:00

🕸️ Ollama Web MCP Server

MCP server providing web search, web fetch, AI summarization, chat, embeddings, model management and health monitoring — powered by Ollama's cloud API.

PyPI version Python License: MIT Benchmarks

🇩🇪 German version also available: README.de.md


What is this?

This MCP server gives your AI agent (Claude, ChatGPT, Cline, OpenCode, etc.) the ability to search the web, fetch web pages, summarize content with AI, chat with any Ollama model, generate embeddings, and much more — all through Ollama's API.

Your AI agent can suddenly:

  • 🔍 Search the web live — current news, facts, prices, anything outside its training data
  • 📄 Read any web page — clean content extraction without ads or navigation
  • 🤖 AI-powered summarization — understand long articles in seconds
  • 💬 Direct LLM conversation — chat with any Ollama model (streaming, temperature, JSON mode)
  • 📊 Vector embeddings — generate embeddings for RAG, semantic search, and clustering
  • 💓 Health checks — always know if the service is running
  • Blazing fast — caching, retry logic, parallel requests

All through 18 MCP tools that your agent uses like native capabilities.


Tool Overview

Tool Description Type
websearch 🔍 Search the web with optional site, language, region & date filters Read-Only
webfetch 📄 Fetch a single web page and extract its content Read-Only
webfetch_multi 📚 Fetch multiple URLs concurrently (batch) Read-Only
webfetch_markdown 📝 Fetch a web page as clean Markdown Read-Only
page_summarize 🤖 Summarize a web page with AI Read-Only
chat 💬 Direct LLM conversation with any Ollama model (streaming, temperature, JSON mode) Read-Only
embed 📊 Vector embeddings for RAG, semantic search & clustering Read-Only
pull_model 📥 Download models from the Ollama registry Read-Only
push_model 📤 Upload models to the Ollama registry Read-Only
copy_model 📋 Create a copy of an existing model Read-Only
create_model 🏗️ Create a custom model from base components Read-Only
delete_model 🗑️ Remove a model from local storage Destructive
ps 📊 List currently running models Read-Only
show_model Get detailed model information Read-Only
generate ✍️ Text completion (non-chat) from a model Read-Only
list_models 📋 List available Ollama models Read-Only
health 💓 Server health check Read-Only
cache_clear 🧹 Clear the results cache Destructive

Quick Start

Prerequisites

  • Python 3.10+ (check: python3 --version)
  • An Ollama API Key (Get one free here)
  • An MCP client (Claude Desktop, OpenCode, Cline, VS Code Copilot, etc.)

Installation

# Clone the repository
git clone https://git.ivory.green/johannes/ollama-web-mcp.git
cd ollama-web-mcp

# Install with uv (recommended)
uv sync --extra dev

# Or with pip
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

# Create your .env file with your API key
cp .env.example .env
# Edit .env: set OLLAMA_API_KEY=your-api-key

Running the server

uv run ollama-web-mcp

For development with the MCP Inspector:

uv run mcp dev src/ollama_web_mcp/server.py

Configuration

Environment Variables

Variable Default Description
OLLAMA_HOST https://ollama.com Ollama API endpoint. If not set, auto-detects localhost:11434 or falls back to cloud.
OLLAMA_API_KEY `` Your Bearer token from ollama.com
OLLAMA_WEB_SEARCH_TIMEOUT 10 Request timeout in seconds
OLLAMA_WEB_SEARCH_RETRIES 2 Number of retries on failure
OLLAMA_WEB_SEARCH_FORMAT string Default output format (string or json)
OLLAMA_WEB_SEARCH_HIGHLIGHTS 5 Max highlight snippets per result
OLLAMA_WEB_SEARCH_CHARS 1000 Max characters per highlight
OLLAMA_WEB_SEARCH_MAX_SIZE 50000 Max content size in bytes
OLLAMA_WEB_SEARCH_MAX_RESULTS 10 Max search results total
CACHE_ENABLED true Enable/disable caching
CACHE_TTL_SECONDS 300 Cache TTL in seconds (5 min)
CACHE_STORAGE memory Cache backend: memory (RAM), disk (SQLite), both (RAM + SQLite)
CACHE_DB_PATH `` SQLite database path (default: ~/.cache/ollama-web-mcp/cache.db)
RATE_LIMIT_ENABLED true Enable/disable rate limiting
RATE_LIMIT_MAX_REQUESTS 10 Max requests per time window
RATE_LIMIT_PER_SECONDS 60.0 Time window in seconds
CONTENT_MAX_CHARS 50000 Max content length after extraction
MCP_TRANSPORT stdio Transport mode: stdio, sse, streamable-http
MCP_HOST 127.0.0.1 Network interface (SSE/HTTP only)
MCP_PORT 8000 TCP port (SSE/HTTP only)
LOG_LEVEL INFO Log level: DEBUG, INFO, WARNING, ERROR
OLLAMA_AUTO_DETECT true Auto-detect local Ollama instance, fall back to cloud if unavailable

API Key Configuration

Option 1: .env file (recommended)

OLLAMA_HOST=https://ollama.com
OLLAMA_API_KEY=your-api-key-here
CACHE_ENABLED=true
CACHE_TTL_SECONDS=300

Option 2: Environment variables

export OLLAMA_API_KEY="your-api-key"
export OLLAMA_HOST="https://ollama.com"
ollama-web-mcp

Option 3: MCP client config (for Claude Desktop, etc.)

{
  "mcpServers": {
    "ollama-web": {
      "command": "ollama-web-mcp",
      "args": [],
      "env": {
        "OLLAMA_API_KEY": "your-api-key",
        "OLLAMA_HOST": "https://ollama.com"
      },
      "transport": "stdio",
      "timeout": 300000
    }
  }
}

Client Integration

Claude Desktop

~/Library/Application Support/Claude/claude_desktop_config.json (macOS) / %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "ollama-web": {
      "command": "ollama-web-mcp",
      "args": [],
      "env": {
        "OLLAMA_API_KEY": "your-api-key"
      },
      "transport": "stdio",
      "timeout": 300000
    }
  }
}

OpenCode

~/.config/opencode/mcp.json:

{
  "servers": {
    "ollama-web": {
      "command": "/path/to/ollama-web-mcp/.venv/bin/ollama-web-mcp",
      "args": [],
      "transport": "stdio",
      "timeout": 300000
    }
  }
}

⚠️ OpenCode uses "servers" as the key, not "mcpServers".


Examples

Example 1: Web Search (string format)

> websearch(query="latest SpaceX Starship launch", max_results=3)

1. SpaceX Starship Completes Orbital Test Flight
   URL: https://example.com/spacex-starship-orbital
   Summary: SpaceX successfully completed the first orbital test flight...
   Highlights:
     - The Starship reached orbit at 09:42 UTC
     - Heat shield performed beyond expectations

Example 2: Web Search (JSON format)

> websearch(query="Python 3.13 new features", response_format="json", max_results=2)

Example 3: Search with Filters

> websearch(
    query="climate change research",
    site="nature.com",
    language="en",
    date_from="2025-01-01",
    date_to="2025-12-31"
)

Example 4: Fetch Page as Markdown

> webfetch_markdown(url="https://python.org")

Example 5: AI Summarization

> page_summarize(url="https://arxiv.org/abs/2401.00000", style="tldr")

Example 6: Direct LLM Chat

> chat(messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in 3 sentences."}
  ], model="llama3.2", temperature=0.3)

Example 7: Chat with JSON Output

> chat(messages=[
    {"role": "user", "content": "Give me 3 facts about Mars in JSON"}
  ], model="llama3.2", response_format="json")

Example 8: Generate Embeddings

> embed(text="What is machine learning?", model="nomic-embed-text")
> embed(text=["Document one", "Document two"], model="all-minilm")

Project Structure

ollama-web-mcp/
├── src/ollama_web_mcp/    ← Source code
│   ├── server.py          ← Bootstrap + FastMCP init (~190 lines)
│   ├── state.py           ← Global state (_client, _cache, _rate_limiter)
│   ├── prompts.py         ← 3 prompt functions
│   ├── resources.py       ← MCP resource endpoints
│   ├── models/
│   │   └── schemas.py     ← 15+ Pydantic input models
│   ├── tools/
│   │   ├── web.py         ← websearch, webfetch, webfetch_multi, webfetch_markdown
│   │   ├── chat.py        ← chat, embed, generate
│   │   ├── models.py      ← Model management (list, pull, push, copy, delete, ps, show)
│   │   ├── health.py      ← health, cache_clear
│   │   └── summarize.py   ← page_summarize
│   ├── cache.py, config.py, content.py, errors.py
│   ├── formatter.py, ratelimit.py, retry.py
├── tests/                 ← 230+ unit tests (pytest, 91% coverage)
├── schemas/               ← JSON Schema for Exa-compatible output
├── Dockerfile             ← Container build
├── pyproject.toml         ← Project configuration
├── .woodpecker.yml        ← CI/CD (ruff, mypy, pytest)
├── .pre-commit-config.yaml
├── README.md              ← This file (English)
└── README.de.md           ← German version

Development

Setup

git clone https://git.ivory.green/johannes/ollama-web-mcp.git
cd ollama-web-mcp
uv sync --extra dev
cp .env.example .env
# Edit .env: set OLLAMA_API_KEY
uv run pre-commit install

Code Quality

All commands must pass before a PR:

uv run ruff check .       # 0 errors
uv run ruff format .      # auto-format
uv run mypy src/ --ignore-missing-imports          # 0 type errors (strict mode)
uv run pytest tests/ -v   # all green

Features

Feature Description
Retry with Backoff 1s → 2s → 4s wait on timeout, then abort
Content Extraction HTML → clean text, headings, highlights, summary
Exa-compatible JSON Output validated against exa-response.json schema
Typed Errors Every error has code, message, and action hint
Multi-level Caching Memory, SQLite disk, or composite (memory → disk)
Pydantic Validation All tool inputs strictly validated
Rate Limiting Token-bucket limiter, 10 requests/minute default
Batch Fetching Up to 20 URLs in parallel with semaphore limit
AI Summarization 5 styles: tldr, brief, detailed, bullet, keypoints
Chat Tool Direct LLM conversation with streaming, temperature, JSON mode
Embeddings Vector embeddings for RAG workflows
Extended Search site, language, region, date_from, date_to filters
Model Auto-Detection Automatically finds first available Ollama model
Host Auto-Detection Pings localhost:11434, falls back to cloud API
mypy Strict Mode 0 type errors at strictest configuration
Pre-Commit Hooks Automatic quality checks on every commit
Semantic Release Angular-commit-based versioning + changelog

Benchmarks

See BENCHMARKS.md for the full benchmark suite.

uv run pytest tests/test_benchmarks.py --benchmark-only -v
Operation Expected Latency Notes
Health check ~1ms Pure in-memory, no network
List models 200-500ms Cloud API, 39 models
Web search (basic) 500-1000ms Cloud API + search engine
Web search (filtered) 500-1200ms Extended params via raw HTTP
Page fetch 200-1000ms Depends on target page size
Cache hit <1ms In-memory lookup

Cache architectures:

Backend Hit Latency Persistence
memory <1ms Lost on restart
disk (SQLite) 1-5ms Survives restarts
both <1ms + 1-5ms Best of both

FAQ

"Authorization header with Bearer token is required"

→ Your API key is missing or incorrect. Check:

  1. .env file exists and contains OLLAMA_API_KEY=your-key
  2. No spaces around the = sign
  3. Key without the Bearer prefix

"No module named 'ollama_web_mcp'"

→ Start the server from the project root directory:

cd /path/to/ollama-web-mcp
uv run ollama-web-mcp

list_models crashes with "/api/api/tags not found"

→ Your OLLAMA_HOST ends with /api. Change to:

OLLAMA_HOST=https://ollama.com

Contributing

See CONTRIBUTING.md for setup, code style, and PR process.

Quick summary:

  1. Fork → Branch → Changes
  2. Write tests (tests/test_xyz.py)
  3. uv run ruff check . + uv run mypy src/ --ignore-missing-imports + uv run pytest tests/ -v
  4. Run pre-commit: uv run pre-commit run --all-files
  5. Open a PR


License

MIT — see LICENSE.