MCP-Server für Ollama Cloud API — Web Search, Fetch, Chat, Embeddings, Model Management

Python 99.2%
Shell 0.5%
Dockerfile 0.3%

Find a file

JL 2844f50673 docs: update READMEs for new structure and git.ivory.green - Fix project structure to reflect refactored code layout - Replace GitHub/Codeberg URLs with git.ivory.green - Fix format=json → response_format=json in examples - Fix OpenCode path (~/.jcode/ → ~/.config/opencode/) - Update test count to 230+ - Remove stale directory references (skill/, .github/, start_server.sh) - Reduce client integrations to Claude + OpenCode - Add Links section with CHANGELOG - Fix mypy command (add --ignore-missing-imports) - Fix MCP Inspector path		2026-06-16 11:00:14 +02:00
.gitea/ISSUE_TEMPLATE	chore: cleanup unused files and imports	2026-06-16 10:23:01 +02:00
schemas	Initial commit: ollama-web-mcp v0.1.0	2026-04-30 02:32:09 +02:00
skill	docs(skill): full rewrite — 18 tools, 3 prompts, 2 resources, pitfalls	2026-05-16 11:55:36 +02:00
src/ollama_web_mcp	refactor: split monolithic server.py into modular package structure	2026-06-16 10:39:10 +02:00
systemd	feat(deploy): add systemd service file for production deployment (closes #9 )	2026-06-16 10:11:17 +02:00
tests	chore: cleanup unused files and imports	2026-06-16 10:23:01 +02:00
.dockerignore	Initial commit: ollama-web-mcp v0.1.0	2026-04-30 02:32:09 +02:00
.env.example	docs: comprehensive MCP client installation guide for 9 programs	2026-04-30 14:54:06 +02:00
.gitignore	chore: comprehensive .gitignore, production hardening	2026-05-15 10:38:32 +02:00
.pre-commit-config.yaml	Add pre-commit hooks, server tests, semantic-release config	2026-04-30 11:09:48 +02:00
.woodpecker.yml	ci: add Woodpecker CI config for Codeberg	2026-05-15 10:39:46 +02:00
BENCHMARKS.md	feat(benchmarks): add pytest-benchmark suite + BENCHMARKS.md (closes #14 )	2026-06-16 10:11:17 +02:00
CHANGELOG.md	v1.2.1 — Chat streaming truncation fix	2026-05-19 11:42:18 +02:00
CONTRIBUTING.md	docs: enhance issue templates and CONTRIBUTING.md	2026-05-15 10:47:49 +02:00
docker-compose.yml	chore: CI badges, gitea templates, docker-compose, tool-calling test, publish script	2026-05-15 22:45:57 +02:00
Dockerfile	chore: comprehensive .gitignore, production hardening	2026-05-15 10:38:32 +02:00
INSTALLATION.md	docs: comprehensive MCP client installation guide for 9 programs	2026-04-30 14:54:06 +02:00
LICENSE	Initial commit: ollama-web-mcp v0.1.0	2026-04-30 02:32:09 +02:00
mcp.json	Initial commit: ollama-web-mcp v0.1.0	2026-04-30 02:32:09 +02:00
opencode.json	Initial commit: ollama-web-mcp v0.1.0	2026-04-30 02:32:09 +02:00
publish.sh	chore: CI badges, gitea templates, docker-compose, tool-calling test, publish script	2026-05-15 22:45:57 +02:00
pyproject.toml	feat(benchmarks): add pytest-benchmark suite + BENCHMARKS.md (closes #14 )	2026-06-16 10:11:17 +02:00
README.de.md	docs: update READMEs for new structure and git.ivory.green	2026-06-16 11:00:14 +02:00
README.md	docs: update READMEs for new structure and git.ivory.green	2026-06-16 11:00:14 +02:00
start_server.sh	Initial commit: ollama-web-mcp v0.1.0	2026-04-30 02:32:09 +02:00
uv.lock	feat(benchmarks): add pytest-benchmark suite + BENCHMARKS.md (closes #14 )	2026-06-16 10:11:17 +02:00

README.md

🕸️ Ollama Web MCP Server

MCP server providing web search, web fetch, AI summarization, chat, embeddings, model management and health monitoring — powered by Ollama's cloud API.

🇩🇪 German version also available: README.de.md

What is this?

This MCP server gives your AI agent (Claude, ChatGPT, Cline, OpenCode, etc.) the ability to search the web, fetch web pages, summarize content with AI, chat with any Ollama model, generate embeddings, and much more — all through Ollama's API.

Your AI agent can suddenly:

🔍 Search the web live — current news, facts, prices, anything outside its training data
📄 Read any web page — clean content extraction without ads or navigation
🤖 AI-powered summarization — understand long articles in seconds
💬 Direct LLM conversation — chat with any Ollama model (streaming, temperature, JSON mode)
📊 Vector embeddings — generate embeddings for RAG, semantic search, and clustering
💓 Health checks — always know if the service is running
⚡ Blazing fast — caching, retry logic, parallel requests

All through 18 MCP tools that your agent uses like native capabilities.

Tool Overview

Tool	Description	Type
`websearch`	🔍 Search the web with optional site, language, region & date filters	Read-Only
`webfetch`	📄 Fetch a single web page and extract its content	Read-Only
`webfetch_multi`	📚 Fetch multiple URLs concurrently (batch)	Read-Only
`webfetch_markdown`	📝 Fetch a web page as clean Markdown	Read-Only
`page_summarize`	🤖 Summarize a web page with AI	Read-Only
`chat`	💬 Direct LLM conversation with any Ollama model (streaming, temperature, JSON mode)	Read-Only
`embed`	📊 Vector embeddings for RAG, semantic search & clustering	Read-Only
`pull_model`	📥 Download models from the Ollama registry	Read-Only
`push_model`	📤 Upload models to the Ollama registry	Read-Only
`copy_model`	📋 Create a copy of an existing model	Read-Only
`create_model`	🏗️ Create a custom model from base components	Read-Only
`delete_model`	🗑️ Remove a model from local storage	Destructive
`ps`	📊 List currently running models	Read-Only
`show_model`	ℹ️ Get detailed model information	Read-Only
`generate`	✍️ Text completion (non-chat) from a model	Read-Only
`list_models`	📋 List available Ollama models	Read-Only
`health`	💓 Server health check	Read-Only
`cache_clear`	🧹 Clear the results cache	Destructive

Quick Start

Prerequisites

Python 3.10+ (check: python3 --version)
An Ollama API Key (Get one free here)
An MCP client (Claude Desktop, OpenCode, Cline, VS Code Copilot, etc.)

Installation

# Clone the repository
git clone https://git.ivory.green/johannes/ollama-web-mcp.git
cd ollama-web-mcp

# Install with uv (recommended)
uv sync --extra dev

# Or with pip
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

# Create your .env file with your API key
cp .env.example .env
# Edit .env: set OLLAMA_API_KEY=your-api-key

Running the server

uv run ollama-web-mcp

For development with the MCP Inspector:

uv run mcp dev src/ollama_web_mcp/server.py

Configuration

Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	`https://ollama.com`	Ollama API endpoint. If not set, auto-detects localhost:11434 or falls back to cloud.
`OLLAMA_API_KEY`	``	Your Bearer token from ollama.com
`OLLAMA_WEB_SEARCH_TIMEOUT`	`10`	Request timeout in seconds
`OLLAMA_WEB_SEARCH_RETRIES`	`2`	Number of retries on failure
`OLLAMA_WEB_SEARCH_FORMAT`	`string`	Default output format (`string` or `json`)
`OLLAMA_WEB_SEARCH_HIGHLIGHTS`	`5`	Max highlight snippets per result
`OLLAMA_WEB_SEARCH_CHARS`	`1000`	Max characters per highlight
`OLLAMA_WEB_SEARCH_MAX_SIZE`	`50000`	Max content size in bytes
`OLLAMA_WEB_SEARCH_MAX_RESULTS`	`10`	Max search results total
`CACHE_ENABLED`	`true`	Enable/disable caching
`CACHE_TTL_SECONDS`	`300`	Cache TTL in seconds (5 min)
`CACHE_STORAGE`	`memory`	Cache backend: `memory` (RAM), `disk` (SQLite), `both` (RAM + SQLite)
`CACHE_DB_PATH`	``	SQLite database path (default: `~/.cache/ollama-web-mcp/cache.db`)
`RATE_LIMIT_ENABLED`	`true`	Enable/disable rate limiting
`RATE_LIMIT_MAX_REQUESTS`	`10`	Max requests per time window
`RATE_LIMIT_PER_SECONDS`	`60.0`	Time window in seconds
`CONTENT_MAX_CHARS`	`50000`	Max content length after extraction
`MCP_TRANSPORT`	`stdio`	Transport mode: `stdio`, `sse`, `streamable-http`
`MCP_HOST`	`127.0.0.1`	Network interface (SSE/HTTP only)
`MCP_PORT`	`8000`	TCP port (SSE/HTTP only)
`LOG_LEVEL`	`INFO`	Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`
`OLLAMA_AUTO_DETECT`	`true`	Auto-detect local Ollama instance, fall back to cloud if unavailable

API Key Configuration

Option 1: .env file (recommended)

OLLAMA_HOST=https://ollama.com
OLLAMA_API_KEY=your-api-key-here
CACHE_ENABLED=true
CACHE_TTL_SECONDS=300

Option 2: Environment variables

export OLLAMA_API_KEY="your-api-key"
export OLLAMA_HOST="https://ollama.com"
ollama-web-mcp

Option 3: MCP client config (for Claude Desktop, etc.)

{
  "mcpServers": {
    "ollama-web": {
      "command": "ollama-web-mcp",
      "args": [],
      "env": {
        "OLLAMA_API_KEY": "your-api-key",
        "OLLAMA_HOST": "https://ollama.com"
      },
      "transport": "stdio",
      "timeout": 300000
    }
  }
}

Client Integration

Claude Desktop

~/Library/Application Support/Claude/claude_desktop_config.json (macOS) / %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "ollama-web": {
      "command": "ollama-web-mcp",
      "args": [],
      "env": {
        "OLLAMA_API_KEY": "your-api-key"
      },
      "transport": "stdio",
      "timeout": 300000
    }
  }
}

OpenCode

~/.config/opencode/mcp.json:

{
  "servers": {
    "ollama-web": {
      "command": "/path/to/ollama-web-mcp/.venv/bin/ollama-web-mcp",
      "args": [],
      "transport": "stdio",
      "timeout": 300000
    }
  }
}

⚠️ OpenCode uses "servers" as the key, not "mcpServers".

Examples

Example 1: Web Search (string format)

> websearch(query="latest SpaceX Starship launch", max_results=3)

1. SpaceX Starship Completes Orbital Test Flight
   URL: https://example.com/spacex-starship-orbital
   Summary: SpaceX successfully completed the first orbital test flight...
   Highlights:
     - The Starship reached orbit at 09:42 UTC
     - Heat shield performed beyond expectations

Example 2: Web Search (JSON format)

> websearch(query="Python 3.13 new features", response_format="json", max_results=2)

Example 3: Search with Filters

> websearch(
    query="climate change research",
    site="nature.com",
    language="en",
    date_from="2025-01-01",
    date_to="2025-12-31"
)

Example 4: Fetch Page as Markdown

> webfetch_markdown(url="https://python.org")

Example 5: AI Summarization

> page_summarize(url="https://arxiv.org/abs/2401.00000", style="tldr")

Example 6: Direct LLM Chat

> chat(messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in 3 sentences."}
  ], model="llama3.2", temperature=0.3)

Example 7: Chat with JSON Output

> chat(messages=[
    {"role": "user", "content": "Give me 3 facts about Mars in JSON"}
  ], model="llama3.2", response_format="json")

Example 8: Generate Embeddings

> embed(text="What is machine learning?", model="nomic-embed-text")
> embed(text=["Document one", "Document two"], model="all-minilm")

Project Structure

ollama-web-mcp/
├── src/ollama_web_mcp/    ← Source code
│   ├── server.py          ← Bootstrap + FastMCP init (~190 lines)
│   ├── state.py           ← Global state (_client, _cache, _rate_limiter)
│   ├── prompts.py         ← 3 prompt functions
│   ├── resources.py       ← MCP resource endpoints
│   ├── models/
│   │   └── schemas.py     ← 15+ Pydantic input models
│   ├── tools/
│   │   ├── web.py         ← websearch, webfetch, webfetch_multi, webfetch_markdown
│   │   ├── chat.py        ← chat, embed, generate
│   │   ├── models.py      ← Model management (list, pull, push, copy, delete, ps, show)
│   │   ├── health.py      ← health, cache_clear
│   │   └── summarize.py   ← page_summarize
│   ├── cache.py, config.py, content.py, errors.py
│   ├── formatter.py, ratelimit.py, retry.py
├── tests/                 ← 230+ unit tests (pytest, 91% coverage)
├── schemas/               ← JSON Schema for Exa-compatible output
├── Dockerfile             ← Container build
├── pyproject.toml         ← Project configuration
├── .woodpecker.yml        ← CI/CD (ruff, mypy, pytest)
├── .pre-commit-config.yaml
├── README.md              ← This file (English)
└── README.de.md           ← German version

Development

Setup

git clone https://git.ivory.green/johannes/ollama-web-mcp.git
cd ollama-web-mcp
uv sync --extra dev
cp .env.example .env
# Edit .env: set OLLAMA_API_KEY
uv run pre-commit install

Code Quality

All commands must pass before a PR:

uv run ruff check .       # 0 errors
uv run ruff format .      # auto-format
uv run mypy src/ --ignore-missing-imports          # 0 type errors (strict mode)
uv run pytest tests/ -v   # all green

Features

Feature	Description
Retry with Backoff	1s → 2s → 4s wait on timeout, then abort
Content Extraction	HTML → clean text, headings, highlights, summary
Exa-compatible JSON	Output validated against exa-response.json schema
Typed Errors	Every error has code, message, and action hint
Multi-level Caching	Memory, SQLite disk, or composite (memory → disk)
Pydantic Validation	All tool inputs strictly validated
Rate Limiting	Token-bucket limiter, 10 requests/minute default
Batch Fetching	Up to 20 URLs in parallel with semaphore limit
AI Summarization	5 styles: tldr, brief, detailed, bullet, keypoints
Chat Tool	Direct LLM conversation with streaming, temperature, JSON mode
Embeddings	Vector embeddings for RAG workflows
Extended Search	`site`, `language`, `region`, `date_from`, `date_to` filters
Model Auto-Detection	Automatically finds first available Ollama model
Host Auto-Detection	Pings localhost:11434, falls back to cloud API
mypy Strict Mode	0 type errors at strictest configuration
Pre-Commit Hooks	Automatic quality checks on every commit
Semantic Release	Angular-commit-based versioning + changelog

Benchmarks

See BENCHMARKS.md for the full benchmark suite.

uv run pytest tests/test_benchmarks.py --benchmark-only -v

Operation	Expected Latency	Notes
Health check	~1ms	Pure in-memory, no network
List models	200-500ms	Cloud API, 39 models
Web search (basic)	500-1000ms	Cloud API + search engine
Web search (filtered)	500-1200ms	Extended params via raw HTTP
Page fetch	200-1000ms	Depends on target page size
Cache hit	<1ms	In-memory lookup

Cache architectures:

Backend	Hit Latency	Persistence
`memory`	<1ms	❌ Lost on restart
`disk` (SQLite)	1-5ms	✅ Survives restarts
`both`	<1ms + 1-5ms	✅ Best of both

FAQ

"Authorization header with Bearer token is required"

→ Your API key is missing or incorrect. Check:

.env file exists and contains OLLAMA_API_KEY=your-key
No spaces around the = sign
Key without the Bearer prefix

"No module named 'ollama_web_mcp'"

→ Start the server from the project root directory:

cd /path/to/ollama-web-mcp
uv run ollama-web-mcp

`list_models` crashes with "/api/api/tags not found"

→ Your OLLAMA_HOST ends with /api. Change to:

OLLAMA_HOST=https://ollama.com

Contributing

See CONTRIBUTING.md for setup, code style, and PR process.

Quick summary:

Fork → Branch → Changes
Write tests (tests/test_xyz.py)
uv run ruff check . + uv run mypy src/ --ignore-missing-imports + uv run pytest tests/ -v
Run pre-commit: uv run pre-commit run --all-files
Open a PR

🔗 Links

Repository: git.ivory.green/johannes/ollama-web-mcp
PyPI: pypi.org/project/ollama-web-mcp
Ollama API Docs: ollama.com/docs
Ollama API Keys: ollama.com/settings/api-keys
MCP Spec: modelcontextprotocol.io
FastMCP Docs: gofastmcp.com
Changelog: CHANGELOG.md

License

MIT — see LICENSE.

README.md Unescape Escape