- Python 99.2%
- Shell 0.5%
- Dockerfile 0.3%
- Fix project structure to reflect refactored code layout - Replace GitHub/Codeberg URLs with git.ivory.green - Fix format=json → response_format=json in examples - Fix OpenCode path (~/.jcode/ → ~/.config/opencode/) - Update test count to 230+ - Remove stale directory references (skill/, .github/, start_server.sh) - Reduce client integrations to Claude + OpenCode - Add Links section with CHANGELOG - Fix mypy command (add --ignore-missing-imports) - Fix MCP Inspector path |
||
|---|---|---|
| .gitea/ISSUE_TEMPLATE | ||
| schemas | ||
| skill | ||
| src/ollama_web_mcp | ||
| systemd | ||
| tests | ||
| .dockerignore | ||
| .env.example | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| .woodpecker.yml | ||
| BENCHMARKS.md | ||
| CHANGELOG.md | ||
| CONTRIBUTING.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| INSTALLATION.md | ||
| LICENSE | ||
| mcp.json | ||
| opencode.json | ||
| publish.sh | ||
| pyproject.toml | ||
| README.de.md | ||
| README.md | ||
| start_server.sh | ||
| uv.lock | ||
🕸️ Ollama Web MCP Server
MCP server providing web search, web fetch, AI summarization, chat, embeddings, model management and health monitoring — powered by Ollama's cloud API.
🇩🇪 German version also available: README.de.md
What is this?
This MCP server gives your AI agent (Claude, ChatGPT, Cline, OpenCode, etc.) the ability to search the web, fetch web pages, summarize content with AI, chat with any Ollama model, generate embeddings, and much more — all through Ollama's API.
Your AI agent can suddenly:
- 🔍 Search the web live — current news, facts, prices, anything outside its training data
- 📄 Read any web page — clean content extraction without ads or navigation
- 🤖 AI-powered summarization — understand long articles in seconds
- 💬 Direct LLM conversation — chat with any Ollama model (streaming, temperature, JSON mode)
- 📊 Vector embeddings — generate embeddings for RAG, semantic search, and clustering
- 💓 Health checks — always know if the service is running
- ⚡ Blazing fast — caching, retry logic, parallel requests
All through 18 MCP tools that your agent uses like native capabilities.
Tool Overview
| Tool | Description | Type |
|---|---|---|
websearch |
🔍 Search the web with optional site, language, region & date filters | Read-Only |
webfetch |
📄 Fetch a single web page and extract its content | Read-Only |
webfetch_multi |
📚 Fetch multiple URLs concurrently (batch) | Read-Only |
webfetch_markdown |
📝 Fetch a web page as clean Markdown | Read-Only |
page_summarize |
🤖 Summarize a web page with AI | Read-Only |
chat |
💬 Direct LLM conversation with any Ollama model (streaming, temperature, JSON mode) | Read-Only |
embed |
📊 Vector embeddings for RAG, semantic search & clustering | Read-Only |
pull_model |
📥 Download models from the Ollama registry | Read-Only |
push_model |
📤 Upload models to the Ollama registry | Read-Only |
copy_model |
📋 Create a copy of an existing model | Read-Only |
create_model |
🏗️ Create a custom model from base components | Read-Only |
delete_model |
🗑️ Remove a model from local storage | Destructive |
ps |
📊 List currently running models | Read-Only |
show_model |
ℹ️ Get detailed model information | Read-Only |
generate |
✍️ Text completion (non-chat) from a model | Read-Only |
list_models |
📋 List available Ollama models | Read-Only |
health |
💓 Server health check | Read-Only |
cache_clear |
🧹 Clear the results cache | Destructive |
Quick Start
Prerequisites
- Python 3.10+ (check:
python3 --version) - An Ollama API Key (Get one free here)
- An MCP client (Claude Desktop, OpenCode, Cline, VS Code Copilot, etc.)
Installation
# Clone the repository
git clone https://git.ivory.green/johannes/ollama-web-mcp.git
cd ollama-web-mcp
# Install with uv (recommended)
uv sync --extra dev
# Or with pip
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
# Create your .env file with your API key
cp .env.example .env
# Edit .env: set OLLAMA_API_KEY=your-api-key
Running the server
uv run ollama-web-mcp
For development with the MCP Inspector:
uv run mcp dev src/ollama_web_mcp/server.py
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
https://ollama.com |
Ollama API endpoint. If not set, auto-detects localhost:11434 or falls back to cloud. |
OLLAMA_API_KEY |
`` | Your Bearer token from ollama.com |
OLLAMA_WEB_SEARCH_TIMEOUT |
10 |
Request timeout in seconds |
OLLAMA_WEB_SEARCH_RETRIES |
2 |
Number of retries on failure |
OLLAMA_WEB_SEARCH_FORMAT |
string |
Default output format (string or json) |
OLLAMA_WEB_SEARCH_HIGHLIGHTS |
5 |
Max highlight snippets per result |
OLLAMA_WEB_SEARCH_CHARS |
1000 |
Max characters per highlight |
OLLAMA_WEB_SEARCH_MAX_SIZE |
50000 |
Max content size in bytes |
OLLAMA_WEB_SEARCH_MAX_RESULTS |
10 |
Max search results total |
CACHE_ENABLED |
true |
Enable/disable caching |
CACHE_TTL_SECONDS |
300 |
Cache TTL in seconds (5 min) |
CACHE_STORAGE |
memory |
Cache backend: memory (RAM), disk (SQLite), both (RAM + SQLite) |
CACHE_DB_PATH |
`` | SQLite database path (default: ~/.cache/ollama-web-mcp/cache.db) |
RATE_LIMIT_ENABLED |
true |
Enable/disable rate limiting |
RATE_LIMIT_MAX_REQUESTS |
10 |
Max requests per time window |
RATE_LIMIT_PER_SECONDS |
60.0 |
Time window in seconds |
CONTENT_MAX_CHARS |
50000 |
Max content length after extraction |
MCP_TRANSPORT |
stdio |
Transport mode: stdio, sse, streamable-http |
MCP_HOST |
127.0.0.1 |
Network interface (SSE/HTTP only) |
MCP_PORT |
8000 |
TCP port (SSE/HTTP only) |
LOG_LEVEL |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR |
OLLAMA_AUTO_DETECT |
true |
Auto-detect local Ollama instance, fall back to cloud if unavailable |
API Key Configuration
Option 1: .env file (recommended)
OLLAMA_HOST=https://ollama.com
OLLAMA_API_KEY=your-api-key-here
CACHE_ENABLED=true
CACHE_TTL_SECONDS=300
Option 2: Environment variables
export OLLAMA_API_KEY="your-api-key"
export OLLAMA_HOST="https://ollama.com"
ollama-web-mcp
Option 3: MCP client config (for Claude Desktop, etc.)
{
"mcpServers": {
"ollama-web": {
"command": "ollama-web-mcp",
"args": [],
"env": {
"OLLAMA_API_KEY": "your-api-key",
"OLLAMA_HOST": "https://ollama.com"
},
"transport": "stdio",
"timeout": 300000
}
}
}
Client Integration
Claude Desktop
~/Library/Application Support/Claude/claude_desktop_config.json (macOS) / %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"ollama-web": {
"command": "ollama-web-mcp",
"args": [],
"env": {
"OLLAMA_API_KEY": "your-api-key"
},
"transport": "stdio",
"timeout": 300000
}
}
}
OpenCode
~/.config/opencode/mcp.json:
{
"servers": {
"ollama-web": {
"command": "/path/to/ollama-web-mcp/.venv/bin/ollama-web-mcp",
"args": [],
"transport": "stdio",
"timeout": 300000
}
}
}
⚠️ OpenCode uses
"servers"as the key, not"mcpServers".
Examples
Example 1: Web Search (string format)
> websearch(query="latest SpaceX Starship launch", max_results=3)
1. SpaceX Starship Completes Orbital Test Flight
URL: https://example.com/spacex-starship-orbital
Summary: SpaceX successfully completed the first orbital test flight...
Highlights:
- The Starship reached orbit at 09:42 UTC
- Heat shield performed beyond expectations
Example 2: Web Search (JSON format)
> websearch(query="Python 3.13 new features", response_format="json", max_results=2)
Example 3: Search with Filters
> websearch(
query="climate change research",
site="nature.com",
language="en",
date_from="2025-01-01",
date_to="2025-12-31"
)
Example 4: Fetch Page as Markdown
> webfetch_markdown(url="https://python.org")
Example 5: AI Summarization
> page_summarize(url="https://arxiv.org/abs/2401.00000", style="tldr")
Example 6: Direct LLM Chat
> chat(messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."}
], model="llama3.2", temperature=0.3)
Example 7: Chat with JSON Output
> chat(messages=[
{"role": "user", "content": "Give me 3 facts about Mars in JSON"}
], model="llama3.2", response_format="json")
Example 8: Generate Embeddings
> embed(text="What is machine learning?", model="nomic-embed-text")
> embed(text=["Document one", "Document two"], model="all-minilm")
Project Structure
ollama-web-mcp/
├── src/ollama_web_mcp/ ← Source code
│ ├── server.py ← Bootstrap + FastMCP init (~190 lines)
│ ├── state.py ← Global state (_client, _cache, _rate_limiter)
│ ├── prompts.py ← 3 prompt functions
│ ├── resources.py ← MCP resource endpoints
│ ├── models/
│ │ └── schemas.py ← 15+ Pydantic input models
│ ├── tools/
│ │ ├── web.py ← websearch, webfetch, webfetch_multi, webfetch_markdown
│ │ ├── chat.py ← chat, embed, generate
│ │ ├── models.py ← Model management (list, pull, push, copy, delete, ps, show)
│ │ ├── health.py ← health, cache_clear
│ │ └── summarize.py ← page_summarize
│ ├── cache.py, config.py, content.py, errors.py
│ ├── formatter.py, ratelimit.py, retry.py
├── tests/ ← 230+ unit tests (pytest, 91% coverage)
├── schemas/ ← JSON Schema for Exa-compatible output
├── Dockerfile ← Container build
├── pyproject.toml ← Project configuration
├── .woodpecker.yml ← CI/CD (ruff, mypy, pytest)
├── .pre-commit-config.yaml
├── README.md ← This file (English)
└── README.de.md ← German version
Development
Setup
git clone https://git.ivory.green/johannes/ollama-web-mcp.git
cd ollama-web-mcp
uv sync --extra dev
cp .env.example .env
# Edit .env: set OLLAMA_API_KEY
uv run pre-commit install
Code Quality
All commands must pass before a PR:
uv run ruff check . # 0 errors
uv run ruff format . # auto-format
uv run mypy src/ --ignore-missing-imports # 0 type errors (strict mode)
uv run pytest tests/ -v # all green
Features
| Feature | Description |
|---|---|
| Retry with Backoff | 1s → 2s → 4s wait on timeout, then abort |
| Content Extraction | HTML → clean text, headings, highlights, summary |
| Exa-compatible JSON | Output validated against exa-response.json schema |
| Typed Errors | Every error has code, message, and action hint |
| Multi-level Caching | Memory, SQLite disk, or composite (memory → disk) |
| Pydantic Validation | All tool inputs strictly validated |
| Rate Limiting | Token-bucket limiter, 10 requests/minute default |
| Batch Fetching | Up to 20 URLs in parallel with semaphore limit |
| AI Summarization | 5 styles: tldr, brief, detailed, bullet, keypoints |
| Chat Tool | Direct LLM conversation with streaming, temperature, JSON mode |
| Embeddings | Vector embeddings for RAG workflows |
| Extended Search | site, language, region, date_from, date_to filters |
| Model Auto-Detection | Automatically finds first available Ollama model |
| Host Auto-Detection | Pings localhost:11434, falls back to cloud API |
| mypy Strict Mode | 0 type errors at strictest configuration |
| Pre-Commit Hooks | Automatic quality checks on every commit |
| Semantic Release | Angular-commit-based versioning + changelog |
Benchmarks
See BENCHMARKS.md for the full benchmark suite.
uv run pytest tests/test_benchmarks.py --benchmark-only -v
| Operation | Expected Latency | Notes |
|---|---|---|
| Health check | ~1ms | Pure in-memory, no network |
| List models | 200-500ms | Cloud API, 39 models |
| Web search (basic) | 500-1000ms | Cloud API + search engine |
| Web search (filtered) | 500-1200ms | Extended params via raw HTTP |
| Page fetch | 200-1000ms | Depends on target page size |
| Cache hit | <1ms | In-memory lookup |
Cache architectures:
| Backend | Hit Latency | Persistence |
|---|---|---|
memory |
<1ms | ❌ Lost on restart |
disk (SQLite) |
1-5ms | ✅ Survives restarts |
both |
<1ms + 1-5ms | ✅ Best of both |
FAQ
"Authorization header with Bearer token is required"
→ Your API key is missing or incorrect. Check:
.envfile exists and containsOLLAMA_API_KEY=your-key- No spaces around the
=sign - Key without the
Bearerprefix
"No module named 'ollama_web_mcp'"
→ Start the server from the project root directory:
cd /path/to/ollama-web-mcp
uv run ollama-web-mcp
list_models crashes with "/api/api/tags not found"
→ Your OLLAMA_HOST ends with /api. Change to:
OLLAMA_HOST=https://ollama.com
Contributing
See CONTRIBUTING.md for setup, code style, and PR process.
Quick summary:
- Fork → Branch → Changes
- Write tests (
tests/test_xyz.py) uv run ruff check .+uv run mypy src/ --ignore-missing-imports+uv run pytest tests/ -v- Run pre-commit:
uv run pre-commit run --all-files - Open a PR
🔗 Links
- Repository: git.ivory.green/johannes/ollama-web-mcp
- PyPI: pypi.org/project/ollama-web-mcp
- Ollama API Docs: ollama.com/docs
- Ollama API Keys: ollama.com/settings/api-keys
- MCP Spec: modelcontextprotocol.io
- FastMCP Docs: gofastmcp.com
- Changelog: CHANGELOG.md
License
MIT — see LICENSE.