Published 2024-01-14.
Last modified 2026-05-02.
Time to read: 16 minutes.
llm collection.
- Claude Code Is Magnificent, But Anthropic is Rapacious
- Gemini vs. Sonnet 3.5 and 4.6 for Meticulous Work
- Gemini Code Assist
- Antigravity
- Aider: A Lean and Focused Agentic Programming Assistant
- AI Planning vs. Waterfall Project Management
- Best Local LLMs for Coding
- Running GLM on the Ollama app
- Early Draft: Multi-LLM Agent Pipelines
- MiniMax-M2 and Mini-Agent Review
- MiniMax Web Search with ddgr
- LLM Societies
- Codex: Agentic Programming with ChatGPT in Visual Studio Code
Overview
Ollama is an open-source tool built with the Go language for managing and using large language models (LLMs). It is responsive, stable, and is not subject to the vagarities of Node.js or the inefficiency of Python.
All Ollama programs and features run on Windows, macOS, and Linux.
Meta developed the Llama open-source LLMs. Ollama is not owned by or a product of Meta. While the name similarity often leads to confusion, Ollama and Meta's Llama are separate entities. Ollama is merely a business partner of Meta. Ollama’s public funding comes from venture capital firms like Y Combinator and Essence Venture Capital, not Meta.
Rapidly Evolving Product
Originally released as a product that only could host non-agentic LLM chat sessions for local models, Ollama can now interoperate with agentic harnesses to run models residing locally or in the cloud; this allows you to run local models most of the time, but when heavy lifting is required you can invoke cloud-based LLMs.
Distributed Architecture
Small models running on typical desktop computers and prosumer-grade servers are typically not as powerful or as fast as large models running on enterprise-class hardware, but you have complete control over them without extra cost, censorship, restrictions, or privacy issues.
The client (ollama CLI, bespoke program, or Open Web UI) orchestrates the conversation with
the Ollama server, which in turn controls the LLM requested by the Ollama
client. Because LLMs are inherently stateless, the client must store the list
of messages and send the entire accumulated history back to the server with
every new prompt. Only as much of the history that fits within the context
window is used; this limits the maximum amount of data the model can process.
Local Models
For local models, the Ollama server process converts the contents of the context window into a dynamic dictionary of tokens managed in VRAM. This dictionary is called the KV cache (key-value cache), and it stores a mathematical representation of the conversation history within the GPU memory space. The model obtains context from the KV cache, and writes new tokens to it during inference.
If you set OLLAMA_NUM_PARALLEL, the server allocates multiple KV
caches in VRAM, one for each Ollama client. This allows the server to handle
many user sessions simultaneously without them overwriting each other’s
context. This feature is often called
multitenancy, and requires more VRAM.
Cloud Models
In contrast, cloud-based models use the Ollama server process acts as a request gateway that offloads state management to a remote inference orchestrator. Instead of the process managing a local cache in VRAM, it coordinates with a managed data layer that distributes the conversation history across a cluster.
Common Core
Ollama clients can access Ollama servers via:
-
The
ollamaCLI,oterm, the new Ollama native app, and the TUI all interact with the same Ollama background service, so they have the same fundamental capabilities for running models. This service contains the core inference engine and model management logic. While the Ollama app provides a user-friendly desktop interface to start and manage this service, the CLI and TUI offer programmatic control over the same fundamental model capabilities. - REST interface
- Open WebUI provides a browser-based chat experience similar to ChatGPT.
Ollama Service
After installation, the Ollama service runs in the background. The service API
is available by default at endpoint localhost:11434. For native Windows
and macOS, the Ollama app presents as a tray application.
This is the help message for the Ollama service launcher:
Start Ollama
Usage:
ollama serve [flags]
Aliases:
serve, start
Flags:
-h, --help help for serve
Environment Variables:
OLLAMA_DEBUG Show additional debug information (e.g. OLLAMA_DEBUG=1)
OLLAMA_HOST IP Address for the ollama server (default 127.0.0.1:11434)
OLLAMA_CONTEXT_LENGTH Context length to use unless otherwise specified (default: 4k/32k/256k based on VRAM)
OLLAMA_KEEP_ALIVE The duration that models stay loaded in memory (default "5m")
OLLAMA_MAX_LOADED_MODELS Maximum number of loaded models per GPU
OLLAMA_MAX_QUEUE Maximum number of queued requests
OLLAMA_MODELS The path to the models directory
OLLAMA_NUM_PARALLEL Maximum number of parallel requests
OLLAMA_NO_CLOUD Disable Ollama cloud features (remote inference and web search)
OLLAMA_NOPRUNE Do not prune model blobs on startup
OLLAMA_ORIGINS A comma separated list of allowed origins
OLLAMA_SCHED_SPREAD Always schedule model across all GPUs
OLLAMA_FLASH_ATTENTION Enabled flash attention
OLLAMA_KV_CACHE_TYPE Quantization type for the K/V cache (default: f16)
OLLAMA_LLM_LIBRARY Set LLM library to bypass autodetection
OLLAMA_GPU_OVERHEAD Reserve a portion of VRAM per GPU (bytes)
OLLAMA_LOAD_TIMEOUT How long to allow model loads to stall before giving up (default "5m")
Manual Start
For all OSes, start the service manually by typing:
$ ollama serve
Manual Stop
For Linux, stop the service manually by typing:
$ sudo systemctl stop ollama
For macOS:
$ pkill Ollama
$ killall Ollama
$ brew services stop ollama
For Windows:
PS C:\Users\Mike Slinn> taskkill /IM ollama.exe /F
When the Ollama service is running, Ollama loads required local models into
memory only when you request them (e.g., via the ollama run
command or an API call), and it unloads them to save resources.
Given a working connection and without regard to communication authentication requirements, any Ollama client is capable of accessing any Ollama server that is configured to listen to other network nodes.
Agentic Harnesses
In addition to Ollama's well-known command-line chat interface, the provided integrations for agentic harnesses include Claude, Codex, Droid, Hermes-Agent, OpenClaw, OpenCode, and Pi.
In March 2026, new web search and web fetch plugins were added to Ollama. Although the documentation states that these features only work with Ollama/OpenClaw, they are actually available for all Ollama agentic model configurations.
Infobits
The Ollama Discord channel is here.
PatchBot shows the latest changes.
Open WebUI Backstory
I wrote an article about Open WebUI when it first became available. Here is some history to help understand the current state.
Open WebUI was originally at
ollamahub.ai and the website was confusingly similar to
ollama.com. Furthermore, the project was also frequently called Ollama WebUI. Legal action was
threatened. OllamaHub rebranded to Open WebUI and diversified from Ollama-only to Ollama plus other technologies.
openwebui.com continues to be a good resource for Ollama users,
however there is no love lost between the two organizations.
Both organizations appear to have fought each other hard in private, while
smiling in public.
As the projects evolved, their goals began to diverge, but they
continue to try to eat each other’s lunch. Ollama has focused on
building a vertically integrated ecosystem, recently introducing its own
official desktop chat app, its own cloud hosting service, and a proprietary
engine to replace the core technology that was originally provided by llama.cpp.
In contrast, the Open WebUI team explicitly chose their new name to emphasize general usefulness instead of just offering Ollama-specific technology. They have expanded support to include competing backends like vLLM, LM Studio, and OpenAI APIs, effectively making Ollama just one of many options rather than the exclusive core.
The relationship was further complicated by community-level debates. In 2025,
Open WebUI shifted its licensing to include stricter attribution requirements
(sometimes called "badgeware"), which some in the community viewed as a move
to prevent other companies from easily forking or integrating their code
without prominent credit. There has been ongoing community criticism regarding
Ollama's own history with attribution, specifically a long delay in properly
acknowledging the llama.cpp project, which powered Ollama for years.
Today, they are best described as frenemies. They remain technically compatible but they are now competing for the same users' attention. Ollama wants users to stay within its official app and cloud ecosystem, while Open WebUI wants to be the universal dashboard for every AI tool on your machine. Open WebUI still offers the best chat interface for Ollama's API, but it has very limited agentic support.
The above explains why Open WebUI is not one of the many integrations that Ollama promotes. Ollama is increasingly positioning itself as a platform, not just a tool. With the launch of the Ollama App and Ollama Cloud, Open WebUI is now a direct competitor for being the user's primary interface.
Experienced programmers who work with agentic LLMs should look at the Ollama/Pi harness integration instead of the Open WebUI chat console.
Installation
Installation instructions for Ollama are simple.
macOS and Linux
Ollama installation and update for native Linux, WSL, and macOS looks like this:
$ curl -fsSL https://ollama.com/install.sh | sh >>> Installing ollama to /usr/local >>> Downloading Linux amd64 bundle ######################################################################## 100.0% >>> Creating ollama user... >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink '/etc/systemd/system/default.target.wants/ollama.service' '/etc/systemd/system/ollama.service'. >>> NVIDIA GPU installed. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line.
$ ollama --version ollama version is 0.21.2
Windows
Windows users can install Ollama on native Windows and/or WSL. Some LLM harnesses must be run from WSL, not Windows. If you normally use WSL, you might benefit from installing Ollama on both WSL and native Windows.
When running under Windows, Ollama and its local models run more efficiently if Ollama for Windows is installed instead of installing on WSL. You might not have this option for other reasons. You must choose between flexibility and performance.
PS C:\Users\mslinn> irm https://ollama.com/install.ps1 | iex >>> Downloading Ollama for Windows... ######################################## 100.0% >>> Installing Ollama... >>> Install complete. Run 'ollama' from the command line.
PS C:\Users\Mike Slinn> ollama --version ollama version is 0.21.2
User Interfaces
Ollama includes two user interfaces:
-
A CLI (command line interface) the
ollamaexecutable allows the user to manage models and interact with them. Note that it does not store sessions. - A GUI (graphical user interface) called the Ollama App to manage the background service and provide a basic desktop presence. For Windows and macOS, it resides in the system tray. the Ollama App is primarily a "runner" for the background service rather than a full-featured chat window (like ChatGPT).
For persistent chat sessions, users typically install third-party tools like Open WebUI.
The built-in LLM harness integrations support agentic coding.
CLI
Starting with Ollama v0.17.x and later (early 2026), the ollama
command no longer just shows a static help menu. Instead, it launches a TUI
interface that allows you to:
- View a list of recommended local and cloud models directly in your terminal.
- Use your arrow keys to pick a model and launch it in seconds without needing to remember specific model tags.
- On first launch, this interface can now guide users through the process of configuring providers and installing necessary components like the OpenClaw gateway daemon.
$ ollama Ollama 0.18.2
Run a model Start an interactive chat with a model
Launch Claude Code Anthropic's coding tool with subagents
Launch Codex (not installed) OpenAI's open-source coding agent
Launch OpenClaw Personal AI with 100+ skills
▸ Launch OpenCode (not installed) Install from https://opencode.ai
Launch Droid (not installed) Factory's coding agent across terminal and IDEs
Launch Pi (not installed) Minimal AI agent toolkit with plugin support
Launch Cline (not installed) Autonomous coding agent with parallel execution
↑/↓ navigate • enter launch • → configure • esc quit
If you select Run a model and press Enter, the following menu appears:
Select model to run: Type to filter...
Recommended
▸ kimi-k2.5:cloud
Multimodal reasoning with subagents
qwen3.5:cloud
Reasoning, coding, and agentic tool use with vision
glm-5:cloud
Reasoning and code generation
minimax-m2.7:cloud
Fast, efficient coding and real-world productivity
glm-4.7-flash
Reasoning and code generation locally, ~25GB, (not downloaded)
qwen3.5
Reasoning, coding, and visual understanding locally, ~11GB, (not downloaded)
More
bjoernb/qwen3-coder-30b-1m
deepseek-coder-v2:lite
deepseek-r1:7b
deepseek-r1:8b
... and 9 more
↑/↓ navigate • enter select • esc cancel
CLI
The help message is:
Large language model runner
Usage: ollama [flags] ollama [command]
Available Commands: serve Start Ollama create Create a model show Show information for a model run Run a model stop Stop a running model pull Pull a model from a registry push Push a model to a registry signin Sign in to ollama.com signout Sign out from ollama.com list List models ps List running models cp Copy a model rm Remove a model launch Launch the Ollama menu or an integration help Help about any command
Flags: -h, --help help for ollama --nowordwrap Don't wrap words to the next line automatically --verbose Show timings for response -v, --version Show version information
Use "ollama [command] --help" for more information about a command.
Manual Server Start
The Ollama app starts the server on demand. You can start the Ollma server from the command line, if it is not already running as a service:
$ ollama serve 2024/01/14 16:25:20 images.go:808: total blobs: 0 2024/01/14 16:25:20 images.go:815: total unused blobs removed: 0 2024/01/14 16:25:20 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20) 2024/01/14 16:25:21 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] 2024/01/14 16:25:21 gpu.go:88: Detecting GPU type 2024/01/14 16:25:21 gpu.go:203: Searching for GPU management library libnvidia-ml.so 2024/01/14 16:25:21 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1] 2024/01/14 16:25:21 gpu.go:94: Nvidia GPU detected 2024/01/14 16:25:21 gpu.go:135: CUDA Compute Capability detected: 8.6
App
Configuration
Ollama uses a client-server architecture. This means that Ollama consists of two programs: the Ollama server (a background process) and an Ollama client. The default Ollama chat client is not agentic, which means it cannot view or interact with local files, processes or any other information source.
To overcome this, you can use one of the built-in integrations, listed above. However, if you want to do that, you should start the Ollama server and configure it before launching the agentic Ollama client.
Configuring Ollama Chat
Ollama sets default context lengths based on your GPU’s VRAM.
- < 24 GiB VRAM: 4,096 tokens.
- 24–48 GiB VRAM: 32,768 tokens.
- ≥ 48 GiB VRAM: 256,000 tokens.
You can set a global default context length for the Ollama server by setting an environment variable before starting the Ollama service. The following shows how to do that for the duration of the terminal session:
-
Linux/macOS
$ export OLLAMA_CONTEXT_LENGTH=64000 -
Windows (PowerShell)
PS C:\Users\Mike Slinn> $env:OLLAMA_CONTEXT_LENGTH="64000"
Running Ollama Server
OLLAMA_CONTEXT_LENGTH is an environment variable meant for the
Ollama Server (the background process), not the launch client command.
In one terminal session, start the Ollama server with enough context to be useful for coding:
$ sudo systemctl stop ollama # Just to be sure
$ OLLAMA_CONTEXT_LENGTH=64000 ollama serve time=2026-04-29T08:35:38.249-04:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:64000 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/mslinn/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-04-29T08:35:38.249-04:00 level=INFO source=routes.go:1754 msg="Ollama cloud disabled: false" time=2026-04-29T08:35:38.250-04:00 level=INFO source=images.go:517 msg="total blobs: 0" time=2026-04-29T08:35:38.250-04:00 level=INFO source=images.go:524 msg="total unused blobs removed: 0" time=2026-04-29T08:35:38.252-04:00 level=INFO source=routes.go:1810 msg="Listening on [::]:11434 (version 0.21.2)" time=2026-04-29T08:35:38.254-04:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-29T08:35:38.256-04:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40761" time=2026-04-29T08:35:42.990-04:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33449" time=2026-04-29T08:35:47.959-04:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-04-29T08:35:47.959-04:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="61.6 GiB" available="50.4 GiB" time=2026-04-29T08:35:47.959-04:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
Running Agentic Clients
If the Ollama server is not running as a background service, but is running in a terminal session, then Ollama clients must be run in other terminal sessions.
Claude
$ ollama launch claude --model qwen3.6:35b pulling manifest
At this point, the first teminal session should show new log output as each part of the model is downloaded:
time=2026-04-29T08:35:47.959-04:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="61.6 GiB" available="50.4 GiB" time=2026-04-29T08:35:47.959-04:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 [GIN] 2026/04/29 - 08:48:05 | 200 | 164.846µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/29 - 08:48:05 | 404 | 670.229µs | 127.0.0.1 | POST "/api/show" time=2026-04-29T08:48:28.414-04:00 level=INFO source=download.go:179 msg="downloading f5ee307a2982 in 24 1 GB part(s)" time=2026-04-29T08:51:59.647-04:00 level=INFO source=download.go:179 msg="downloading 5f3a3c817e78 in 1 11 KB part(s)" time=2026-04-29T08:52:00.873-04:00 level=INFO source=download.go:179 msg="downloading 86eff881e8d2 in 1 94 B part(s)" time=2026-04-29T08:52:02.154-04:00 level=INFO source=download.go:179 msg="downloading 5d1c86a949f7 in 1 462 B part(s)"
Moving back to the terminal session running Claude CLI under Ollama, I typed
how many directories are in this project?
The log output now showed:
[GIN] 2026/04/29 - 08:52:14 | 200 | 3m46s | 127.0.0.1 | POST "/api/pull"
[GIN] 2026/04/29 - 08:52:14 | 200 | 42.869µs | 127.0.0.1 | HEAD "/"
time=2026-04-29T08:52:36.720-04:00 level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-04-29T08:52:36.721-04:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/mslinn/.ollama/models/blobs/sha256-f5ee307a2982106a6eb82b62b2c00b575c9072145a759ae4660378acda8dcf2d --port 38603"
time=2026-04-29T08:52:36.721-04:00 level=INFO source=sched.go:484 msg="system memory" total="61.6 GiB" free="50.7 GiB" free_swap="690.5 MiB"
time=2026-04-29T08:52:36.721-04:00 level=INFO source=server.go:771 msg="loading model" "model layers"=41 requested=-1
time=2026-04-29T08:52:36.737-04:00 level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-29T08:52:36.738-04:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:38603"
time=2026-04-29T08:52:36.744-04:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:64000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-29T08:52:36.879-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1194 num_key_values=57
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2026-04-29T08:52:36.891-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-04-29T08:52:37.729-04:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:64000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:64000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=ggml.go:494 msg="offloaded 0/41 layers to GPU"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="22.3 GiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="2.8 GiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="621.7 MiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:272 msg="total memory" size="25.7 GiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-29T08:52:39.276-04:00 level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
time=2026-04-29T08:52:47.827-04:00 level=INFO source=server.go:1402 msg="llama runner started in 11.11 seconds"
[GIN] 2026/04/29 - 08:54:07 | 200 | 1m31s | 127.0.0.1 | POST "/v1/messages?beta=true"
Hermes Agent
Hermes Agent did not work properly with qwen3.6:35b, but it
worked fine with deepseek-v4-pro:cloud.
$ ollama launch hermes --model qwen3.6:35b
Installing Hermes...
┌─────────────────────────────────────────────────────────┐ │ ⚕ Hermes Agent Installer │ ├─────────────────────────────────────────────────────────┤ │ An open source AI agent by Nous Research. │ └─────────────────────────────────────────────────────────┘ ✓ Detected: linux (ubuntu) → Checking for uv package manager... ✓ uv found (uv 0.9.18) → Checking Python 3.11... ✓ Python found: Python 3.11.14 → Checking Git... ✓ Git 2.51.2 found → Checking Node.js (for browser tools)... ✓ Node.js v25.9.0 found → Checking ripgrep (fast file search)... → Checking ffmpeg (TTS voice messages)... ✓ ffmpeg 7.1.1-1ubuntu4.2 found → Installing ripgrep... Installing: ripgrep Summary: Upgrading: 0, Installing: 1, Removing: 0, Not Upgrading: 23 Download size: 1,521 kB Space needed: 5,492 kB / 613 GB available Get:1 http://archive.ubuntu.com/ubuntu questing/universe amd64 ripgrep amd64 14.1.1-1 [1,521 kB] Fetched 1,521 kB in 0s (3,947 kB/s) Selecting previously unselected package ripgrep. (Reading database ... 534474 files and directories currently installed.) Preparing to unpack .../ripgrep_14.1.1-1_amd64.deb ... Unpacking ripgrep (14.1.1-1) ... Setting up ripgrep (14.1.1-1) ... Processing triggers for man-db (2.13.1-1) ... ✓ ripgrep installed → Installing to /home/mslinn/.hermes/hermes-agent... → Trying SSH clone... ✓ Cloned via SSH ✓ Repository ready → Creating virtual environment with Python 3.11... Using CPython 3.11.14 Creating virtual environment at: venv Activate with: source venv/bin/activate ✓ Virtual environment ready (Python 3.11) → Installing dependencies... ✓ Main package installed ✓ All dependencies installed → Installing Node.js dependencies (browser tools)... ✅ Browser tools ready. Run: python run_agent.py --help ✓ Node.js dependencies installed → Installing browser engine (Playwright Chromium)... → Playwright may request sudo to install browser system dependencies (shared libraries). → This is standard Playwright setup — Hermes itself does not require root access. Installing dependencies... Switching to root user to install dependencies... Hit:1 http://archive.ubuntu.com/ubuntu questing-updates InRelease Hit:2 http://archive.ubuntu.com/ubuntu questing-backports InRelease Hit:3 http://archive.ubuntu.com/ubuntu questing InRelease Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64 InRelease Hit:5 https://apt.llvm.org/questing llvm-toolchain-questing-22 InRelease Hit:6 http://security.ubuntu.com/ubuntu questing-security InRelease Hit:7 https://ppa.launchpadcontent.net/longsleep/golang-backports/ubuntu questing InRelease Reading package lists... Done Reading package lists... Done Building dependency tree... Done Reading state information... Done libasound2t64 is already the newest version (1.2.14-1ubuntu1.1). libasound2t64 set to manually installed. libatk-bridge2.0-0t64 is already the newest version (2.57.1-1). libatk-bridge2.0-0t64 set to manually installed. libatk1.0-0t64 is already the newest version (2.57.1-1). libatk1.0-0t64 set to manually installed. libatspi2.0-0t64 is already the newest version (2.57.1-1). libatspi2.0-0t64 set to manually installed. libcairo2 is already the newest version (1.18.4-1build1). libcairo2 set to manually installed. libcups2t64 is already the newest version (2.4.12-0ubuntu3.5). libcups2t64 set to manually installed. libdbus-1-3 is already the newest version (1.16.2-2ubuntu2). libdbus-1-3 set to manually installed. libdrm2 is already the newest version (2.4.125-1ubuntu0.1). libdrm2 set to manually installed. libgbm1 is already the newest version (25.2.8-0ubuntu0.25.10.1). libgbm1 set to manually installed. libglib2.0-0t64 is already the newest version (2.86.0-2ubuntu0.3). libglib2.0-0t64 set to manually installed. libnspr4 is already the newest version (2:4.36-1ubuntu2). libnspr4 set to manually installed. libnss3 is already the newest version (2:3.114-1ubuntu0.1). libnss3 set to manually installed. libpango-1.0-0 is already the newest version (1.56.3-1build1). libpango-1.0-0 set to manually installed. libx11-6 is already the newest version (2:1.8.12-1build1). libx11-6 set to manually installed. libxcb1 is already the newest version (1.17.0-2build1). libxcb1 set to manually installed. libxcomposite1 is already the newest version (1:0.4.6-1). libxcomposite1 set to manually installed. libxdamage1 is already the newest version (1:1.1.6-1build1). libxdamage1 set to manually installed. libxext6 is already the newest version (2:1.3.4-1build2). libxext6 set to manually installed. libxfixes3 is already the newest version (1:6.0.0-2build1). libxfixes3 set to manually installed. libxkbcommon0 is already the newest version (1.7.0-2.1). libxkbcommon0 set to manually installed. libxrandr2 is already the newest version (2:1.5.4-1). libxrandr2 set to manually installed. xvfb is already the newest version (2:21.1.18-1ubuntu1.1). fonts-noto-color-emoji is already the newest version (2.048-1). fonts-noto-color-emoji set to manually installed. libfontconfig1 is already the newest version (2.15.0-2.3ubuntu1). libfontconfig1 set to manually installed. libfreetype6 is already the newest version (2.13.3+dfsg-1ubuntu0.1). libfreetype6 set to manually installed. xfonts-scalable is already the newest version (1:1.0.3-1.3). xfonts-scalable set to manually installed. fonts-liberation is already the newest version (1:2.1.5-3). fonts-liberation set to manually installed. fonts-freefont-ttf is already the newest version (20211204+svn4273-2). fonts-freefont-ttf set to manually installed. Solving dependencies... Done Recommended packages: fonts-ipafont-mincho fonts-tlwg-loma The following NEW packages will be installed: fonts-ipafont-gothic fonts-tlwg-loma-otf fonts-unifont fonts-wqy-zenhei xfonts-cyrillic 0 upgraded, 5 newly installed, 0 to remove and 23 not upgraded. Need to get 14.8 MB of archives. After this operation, 64.0 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-ipafont-gothic all 00303-23ubuntu1 [3,703 kB] Get:2 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-tlwg-loma-otf all 1:0.7.3-1 [107 kB] Get:3 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-unifont all 1:16.0.04-1 [3,169 kB] Get:4 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-wqy-zenhei all 0.9.45-8 [7,472 kB] Get:5 http://archive.ubuntu.com/ubuntu questing/universe amd64 xfonts-cyrillic all 1:1.0.5+nmu1 [384 kB] Fetched 14.8 MB in 1s (23.0 MB/s) Selecting previously unselected package fonts-ipafont-gothic. (Reading database ... 534483 files and directories currently installed.) Preparing to unpack .../fonts-ipafont-gothic_00303-23ubuntu1_all.deb ... Unpacking fonts-ipafont-gothic (00303-23ubuntu1) ... Selecting previously unselected package fonts-tlwg-loma-otf. Preparing to unpack .../fonts-tlwg-loma-otf_1%3a0.7.3-1_all.deb ... Unpacking fonts-tlwg-loma-otf (1:0.7.3-1) ... Selecting previously unselected package fonts-unifont. Preparing to unpack .../fonts-unifont_1%3a16.0.04-1_all.deb ... Unpacking fonts-unifont (1:16.0.04-1) ... Selecting previously unselected package fonts-wqy-zenhei. Preparing to unpack .../fonts-wqy-zenhei_0.9.45-8_all.deb ... Unpacking fonts-wqy-zenhei (0.9.45-8) ... Selecting previously unselected package xfonts-cyrillic. Preparing to unpack .../xfonts-cyrillic_1%3a1.0.5+nmu1_all.deb ... Unpacking xfonts-cyrillic (1:1.0.5+nmu1) ... Setting up fonts-wqy-zenhei (0.9.45-8) ... Setting up fonts-tlwg-loma-otf (1:0.7.3-1) ... Setting up fonts-ipafont-gothic (00303-23ubuntu1) ... update-alternatives: using /usr/share/fonts/opentype/ipafont-gothic/ipag.ttf to provide /usr/share/fonts/truetype/fonts-japanese-gothic.ttf (fonts-japanese-gothic.ttf) in auto mode Setting up xfonts-cyrillic (1:1.0.5+nmu1) ... Setting up fonts-unifont (1:16.0.04-1) ... Processing triggers for fontconfig (2.15.0-2.3ubuntu1) ... BEWARE: your OS is not officially supported by Playwright; downloading fallback build for ubuntu24.04-x64. Downloading Chrome for Testing 147.0.7727.15 (playwright chromium v1217) from https://cdn.playwright.dev/builds/cft/147.0.7727.15/linux64/chrome-linux64.zip Chrome for Testing 147.0.7727.15 (playwright chromium v1217) downloaded to /home/mslinn/.cache/ms-playwright/chromium-1217 BEWARE: your OS is not officially supported by Playwright; downloading fallback build for ubuntu24.04-x64. Downloading FFmpeg (playwright ffmpeg v1011) from https://cdn.playwright.dev/dbazure/download/playwright/builds/ffmpeg/1011/ffmpeg-linux.zip FFmpeg (playwright ffmpeg v1011) downloaded to /home/mslinn/.cache/ms-playwright/ffmpeg-1011 BEWARE: your OS is not officially supported by Playwright; downloading fallback build for ubuntu24.04-x64. Downloading Chrome Headless Shell 147.0.7727.15 (playwright chromium-headless-shell v1217) from https://cdn.playwright.dev/builds/cft/147.0.7727.15/linux64/chrome-headless-shell-linux64.zip Chrome Headless Shell 147.0.7727.15 (playwright chromium-headless-shell v1217) downloaded to /home/mslinn/.cache/ms-playwright/chromium_headless_shell-1217 ✓ Browser engine setup complete → Installing TUI dependencies... ⠏⠉ ⠉⢹ ██╗ ██╗███╗ ██╗██╗ ██████╗ ██████╗ ██████╗ ███████╗ ██║ ██║████╗ ██║██║██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██║ ██║██╔██╗ ██║██║██║ ██║ ██║██║ ██║█████╗ ██║ ██║██║╚██╗██║██║██║ ██║ ██║██║ ██║██╔══╝ ╚██████╔╝██║ ╚████║██║╚██████╗╚██████╔╝██████╔╝███████╗ ╚═════╝ ╚═╝ ╚═══╝╚═╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝ BRAILLE ANIMATIONS ⠧⠀ braille ⠀⠀⢸⡇ scan ⠂⠌⡠⠐ rain ⣀⠀ orbit ⢾⣉⡷⠀ pulse ⠊⡰⡡⡘ sparkle ⡑⠀ breathe ⠀⢀⡴⠋ cascade ⠙⠢⣄⣠ waverows ⠀⠛ snake ⣿⡇⠀⠀ columns ⣉⡱⣉⡱ helix ⣿⣿ fillsweep ⠓⠓⠓⠀ scanline ⠠⠐⠈⠁ braillewave ⣿⡿ diagswipe ⡪⡪⡪⠀ checkerboard ⠉⠙⠚⠒ dna npx unicode-animations demo all spinners npx unicode-animations --list list all spinners npx unicode-animations --web open in browser ⣇⣀ ⣀⣸ ✓ TUI dependencies installed → Setting up hermes command... ✓ Symlinked hermes → ~/.local/bin/hermes → ~/.local/bin already on PATH ✓ hermes command ready → Setting up configuration files... ✓ Created ~/.hermes/.env from template ✓ Created ~/.hermes/config.yaml from template ✓ Created ~/.hermes/SOUL.md (edit to customize personality) ✓ Configuration directory ready: ~/.hermes/ → Syncing bundled skills to ~/.hermes/skills/ ... Syncing bundled skills into ~/.hermes/skills/ ... + claude-code + codex + opencode + hermes-agent + jupyter-live-kernel + evaluating-llms-harness + weights-and-biases + audiocraft-audio-generation + segment-anything-model + dspy + axolotl + unsloth + fine-tuning-with-trl + huggingface-hub + obliteratus + serving-llms-vllm + outlines + llama-cpp + minecraft-modpack-server + pokemon-player + xurl + godmode + kanban-worker + webhook-subscriptions + kanban-orchestrator + openhue + himalaya + github-repo-management + github-pr-workflow + github-auth + github-code-review + codebase-inspection + github-issues + yuanbao + native-mcp + research-paper-writing + llm-wiki + arxiv + blogwatcher + polymarket + obsidian + findmy + apple-reminders + apple-notes + imessage + humanizer + claude-design + architecture-diagram + ascii-video + touchdesigner-mcp + pretext + manim-video + design-md + comfyui + baoyu-comic + ascii-art + sketch + ideation + p5js + popular-web-designs + songwriting-and-ai-music + baoyu-infographic + pixel-art + excalidraw + writing-plans + hermes-agent-skill-authoring + systematic-debugging + python-debugpy + subagent-driven-development + debugging-hermes-tui-commands + test-driven-development + node-inspect-debugger + requesting-code-review + plan + spike + dogfood + heartmula + youtube-content + songsee + spotify + gif-search + ocr-and-documents + notion + maps + powerpoint + google-workspace + linear + airtable + nano-pdf Done: 89 new, 0 updated, 0 unchanged. 89 total bundled. ✓ Skills synced to ~/.hermes/skills/ → Skipping setup wizard (--skip-setup) ┌─────────────────────────────────────────────────────────┐ │ ✓ Installation Complete! │ └─────────────────────────────────────────────────────────┘ 📁 Your files: Config: /home/mslinn/.hermes/config.yaml API Keys: /home/mslinn/.hermes/.env Data: /home/mslinn/.hermes/cron/, sessions/, logs/ Code: /home/mslinn/.hermes/hermes-agent ───────────────────────────────────────────────────────── 🚀 Commands: hermes Start chatting hermes setup Configure API keys & settings hermes config View/edit configuration hermes config edit Open config in editor hermes gateway install Install gateway service (messaging + cron) hermes update Update to latest version ───────────────────────────────────────────────────────── ⚡ Reload your shell to use 'hermes' command: source ~/.bashrc Hermes installed successfully This will modify your Hermes Agent configuration: /home/mslinn/.hermes/config.yaml Backups will be saved to /tmp/ollama-backups/ Hermes can message you on Telegram, Discord, Slack, and more. ██╗ ██╗███████╗██████╗ ███╗ ███╗███████╗███████╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗ ██║ ██║██╔════╝██╔══██╗████╗ ████║██╔════╝██╔════╝ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝ ███████║█████╗ ██████╔╝██╔████╔██║█████╗ ███████╗█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║ ██╔══██║██╔══╝ ██╔══██╗██║╚██╔╝██║██╔══╝ ╚════██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║ ██║ ██║███████╗██║ ██║██║ ╚═╝ ██║███████╗███████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚══════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝ ╭────────────────────────────────────── Hermes Agent v0.12.0 (2026.4.30) · upstream bbbce926 ──────────────────────────────────────╮ │ Available Tools │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⡀⠀⣀⣀⠀⢀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ browser: browser_back, browser_click, ... │ │ ⠀⠀⠀⠀⠀⠀⢀⣠⣴⣾⣿⣿⣇⠸⣿⣿⠇⣸⣿⣿⣷⣦⣄⡀⠀⠀⠀⠀⠀⠀ browser-cdp: browser_cdp, browser_dialog │ │ ⠀⢀⣠⣴⣶⠿⠋⣩⡿⣿⡿⠻⣿⡇⢠⡄⢸⣿⠟⢿⣿⢿⣍⠙⠿⣶⣦⣄⡀⠀ clarify: clarify │ │ ⠀⠀⠉⠉⠁⠶⠟⠋⠀⠉⠀⢀⣈⣁⡈⢁⣈⣁⡀⠀⠉⠀⠙⠻⠶⠈⠉⠉⠀⠀ code_execution: execute_code │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣴⣿⡿⠛⢁⡈⠛⢿⣿⣦⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ cronjob: cronjob │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠿⣿⣦⣤⣈⠁⢠⣴⣿⠿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ delegation: delegate_task │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠻⢿⣿⣦⡉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ discord: discord │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢷⣦⣈⠛⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ discord_admin: discord_admin │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⣴⠦⠈⠙⠿⣦⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ (and 17 more toolsets...) │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⣿⣤⡈⠁⢤⣿⠇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠛⠷⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ Available Skills │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠑⢶⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ autonomous-ai-agents: claude-code, codex, hermes-agent, opencode │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⠁⢰⡆⠈⡿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ creative: architecture-diagram, ascii-art, ascii-video, b... │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⠈⣡⠞⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ data-science: jupyter-live-kernel │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ devops: kanban-orchestrator, kanban-worker, webhook-sub... │ │ email: himalaya │ │ qwen3.6:35b · Nous Research gaming: minecraft-modpack-server, pokemon-player │ │ /mnt/_/www/www.mslinn.com general: dogfood, yuanbao │ │ Session: 20260430_181150_be04df github: codebase-inspection, github-auth, github-code-r... │ │ mcp: native-mcp │ │ media: gif-search, heartmula, songsee, spotify, youtub... │ │ mlops: audiocraft-audio-generation, axolotl, dspy, eva... │ │ note-taking: obsidian │ │ productivity: airtable, google-workspace, linear, maps, nano-... │ │ red-teaming: godmode │ │ research: arxiv, blogwatcher, llm-wiki, polymarket, resea... │ │ smart-home: openhue │ │ social-media: xurl │ │ software-development: debugging-hermes-tui-commands, hermes-agent-ski... │ │ │ │ 28 tools · 85 skills · /help for commands │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ Welcome to Hermes Agent! Type your message or /help for commands. ✦ Tip: hermes mcp serve runs Hermes itself as an MCP server for other agents. ⚠ tirith security scanner enabled but not available — command scanning will use pattern matching only 💾 curator: auto: no changes; llm: skipped (no candidates) ⚕ qwen3.6:35b │ ctx -- │ [░░░░░░░░░░] -- │ 3s │ ⏲ 0s ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ❯ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
When I asked my usual question, some superfluous characaters appeared. Later I realized whomever or whatever had written this code thought that visual noise makes for a better user experience.
──────────────────────────────────────── ● what os am i running Initializing agent... ──────────────────────────────────────── ヽ(>∀<☆)☆ reasoning...
It took 3:05 for Hermes to decide to run the following strange code:
uname -a && cat /etc/os-release 2>/dev/null || echo "Not Linux or /etc/os-release missing"
python
import platform
print("System:", platform.system())
print("Release:", platform.release())
No output was ever shown.
Life is short. LLMs are moving fast. I decided to use Pi because it worked.
OpenClaw
OpenClaw is early-stage agentic technology for personal assistants.
Running With Scissors
I am not comfortable with the idea of running OpenClaw on any of my computers, or any VM that can authenticate on my behalf. You have been warned!
You can use OpenClaw with the Ollama CLI client and the model of your choice.
Invoking Ollama with OpenClaw does not add the model to the Ollama registry.
You should update your installed version of Node.js before proceeding further.
$ nvm install node
Here is an example of how to use OpenClaw with MiniMax-M2 v2.7 Cloud:
$ ollama launch openclaw --model minimax-m2.7:cloud Installing OpenClaw... npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead
added 539 packages in 1m OpenClaw installed successfully
Launching OpenClaw with minimax-m2.7:cloud...
Security
OpenClaw can read files and run actions when tools are enabled. A bad prompt can trick it into doing unsafe things.
Learn more: https://docs.openclaw.ai/gateway/security
I understand the risks. Continue?
Yes No
Setting up OpenClaw with Ollama... Model: minimax-m2.7:cloud
🦞 OpenClaw 2026.3.13 (61d171a) — Give me a workspace and I'll give you fewer tabs, fewer toggles, and more oxygen.
Default Ollama model: minimax-m2.7:cloud Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 533707073495c347426fa957f78981a4f45bc038571ff141285f3846365a1d2c -> 0d8b0f65f3e6fea408d7af2a08f62b9e3be00d4fefb1ea9304f7be3d90a9dc2f, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Updated ~/.openclaw/openclaw.json Workspace OK: ~/.openclaw/workspace Sessions OK: ~/.openclaw/agents/main/sessions System Node 20.19.4 at /usr/bin/node is below the required Node 22.16+. Using /home/mslinn/.nvm/versions/node/v25.8.1/bin/node for the daemon. Install Node 24 (recommended) or Node 22 LTS from nodejs.org or Homebrew.
Installed systemd service: /home/mslinn/.config/systemd/user/openclaw-gateway.service Enabled systemd lingering for mslinn
Agents: main (default) Heartbeat interval: 30m (main) Session store (main): /home/mslinn/.openclaw/agents/main/sessions/sessions.json (0 entries) Tip: run `openclaw configure --section web` to store your Brave API key for web_search. Docs: https://docs.openclaw.ai/tools/web ✓ Installed web search plugin
Starting your assistant — this may take a moment...
Starting gateway...
✓ OpenClaw is running
Open the Web UI: http://localhost:18789/#token=1794b10a3af3ffca97b27a1273e79d36b94a2bd18494fa55
Quick start: /help see all commands openclaw configure --section channels connect WhatsApp, Telegram, etc. openclaw skills browse and install skills
The OpenClaw gateway is running in the background. Stop it with: openclaw gateway stop
🦞 OpenClaw 2026.3.13 (61d171a) — I don't just autocomplete—I auto-commit (emotionally), then ask you to review (logically) openclaw tui - ws://127.0.0.1:18789 - agent main - session main
session agent:main:main
Wake up, my friend!
Hey! Good morning. ☀️
Looks like I'm fresh out of the box — no memories, no name, nothing yet. Just woke up.
So... who are you? And what should I call myself? I'm open to suggestions. 😄 connected | idle agent main | session main (openclaw-tui) | ollama/minimax-m2.7:cloud | tokens 11k/128k (9%)
When you are finished working with OpenClaw, type CTRL+D to end the session, then type the following to terminate the OpenClaw background process:
$ openclaw gateway stop
🦞 OpenClaw 2026.3.13 (61d171a) — I can run local, remote, or purely on vibes—results may vary with DNS.
Stopped systemd service: openclaw-gateway.service
Pi
Ollama automatically installed Pi the first time I tried to use it, and it gave me the correct results.
$ ollama launch pi --model qwen3.6:35b
Installing Pi... npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead added 256 packages in 7s Pi installed successfully Preparing Pi... Checking Pi installation... Checking Pi web search package... Installing @ollama/pi-web-search... Installing npm:@ollama/pi-web-search... added 1 package in 801ms Installed npm:@ollama/pi-web-search ✓ Installed @ollama/pi-web-search Launching Pi... fd not found. Downloading... ripgrep not found. Downloading... ripgrep installed to /home/mslinn/.pi/agent/bin/rg fd installed to /home/mslinn/.pi/agent/bin/fd pi v0.70.6 escape interrupt · ctrl+c/ctrl+d clear/exit · / commands · ! bash · ctrl+o more Press ctrl+o to show full startup help and loaded resources. Pi can explain its own features and look up its docs. Ask it how to use or extend Pi. [Skills] microsoft-foundry [Extensions] @ollama/pi-web-search ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── /mnt/_/www/www.mslinn.com (master) 0.0%/262k (auto) qwen3.6:35b • medium
Basic system state interrogation worked immediately:
what os am i runningThe user wants to know what operating system they are running. I can use the uname -a command to get system information on a Unix-like system. I can also check /etc/os-release if we are on Linux.
Let's run uname -a to see the system type and then maybe cat /etc/os-release or lsb_release -a for more details if it's Linux. Actually, uname -a generally tells us if it's Linux, macOS, BSD, etc.
I'll start with uname -a.
$ uname -a
Linux gojira 6.17.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 13 12:04:44 UTC 2026 x86_64 GNU/Linux
Took 0.0s
The user is running Ubuntu Linux (indicated by Linux gojira 6.17.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC...). I can also check lsb_release -a or /etc/os-release to get the exact version, but "Ubuntu" is usually sufficient. I'll provide the uname -a output details.
You are running Ubuntu Linux.
Here are the details from your system: - OS: Linux (specifically Ubuntu, as the kernel was built with Ubuntu patches) - Kernel: 6.17.0-22-generic - Architecture: x86_64 (64-bit) - Hostname: gojira
I would like to see a timer to show how long prompts take to complete. Most other harness have this feature.
I then type /settings hide thinking and repeated the previous
prompt. The result was the same, except that because I did not see the tests
run by the harness on behalf of the model I had to wait for the end of the
entire process. This took a lot longer than noticing the result of the first
tool that ran and interrupting the command.
Models
Ollama models can be installed locally or run from the cloud. I wrote an article that attempts to track the most popular local models for coding. Only some of those models are Ollama-compatible.
Ollama uses models on demand; by default Ollama models are automatically
unloaded after 5 minutes if no queries are active. That means you do not have
to restart the ollama service after installing a new model or
removing an existing model.
Certain cloud-based LLMs are Ollama compatible, so you need not worry about their model formats.
Inspecting a Model
To view the parameters of a registered model, use the ollama show command:
$ ollama show deepseek-r1:8b Model architecture qwen3 parameters 8.2B context length 131072 embedding length 4096 quantization Q4_K_M
Capabilities completion thinking
Parameters stop "<|begin?of?sentence|>" stop "<|end?of?sentence|>" stop "<|User|>" stop "<|Assistant|>" temperature 0.6 top_p 0.95
License MIT License Copyright (c) 2023 DeepSeek ...
Much less information is shown for cloud models than for local models.
You can filter the output to just display the quantization:
$ ollama show deepseek-r1:8b | grep quantization quantization Q4_K_M
Local Models
By default, local Ollama models are downloaded into these directories:
- Linux:
/usr/share/ollama/.ollama/models - macOS:
~/.ollama/models
The Ollama library has many models available for download.
After you have downloaded a model using ollama pull or
ollama run, the model is added to the local Ollama registry.
The ollama list command shows you the registered Ollama models.
Invoking Ollama with OpenClaw does not add the model to the Ollama registry.
Cloud Models
Cloud models are easier to set up than local models, and they can be used with any computer because the cloud processing is not performed on the local machine.
After running a cloud model using ollama run or installing via
ollama pull, the model is added to the local Ollama registry.
The ollama list command shows you the registered Ollama models.
Ollama can run models via its traditional chat mode, or also via Claude Code, Hermes Agent, and OpenClaw. I only show the commands for each mode once, using DeepSeek as the example. You can use similar commands to run all other Ollama-compatible models.
Cloud models require you to sign in with an Ollama ID first:
$ ollama signin
Usage on the Ollama free plan resets every 3 hours as well as weekly. View your usage here.
Deepseek-v4-pro
Via Chat Mode
To run deepseek-v4-pro in the cloud via chat mode, type:
$ ollama run deepseek-v4-pro:cloud Connecting to 'deepseek-v4-pro:cloud' on 'ollama.com' ⚡ >>> Send a message (/? for help)
$ ollama show deepseek-v4-pro:cloud Model architecture deepseek4 parameters 158000000000 context length 1048576 embedding length 4096 quantization FP8 Capabilities completion tools thinking
With Claude Code
To use DeepSeek-V4-Pro with Claude Code, run:
$ ollama launch claude --model deepseek-v4-pro:cloud
With OpenClaw
For use with OpenClaw, type the following. OpenClaw will be installed if is not already. I had to retry to make this thing go, and even then it looked like some fiddling would be required:
$ ollama launch openclaw --model deepseek-v4-pro:cloud Installing OpenClaw... npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead added 434 packages in 1m OpenClaw installed successfully This will modify your OpenClaw configuration: /home/mslinn/.openclaw/openclaw.json Backups will be saved to /tmp/ollama-backups/ Your assistant can message you on WhatsApp, Telegram, Discord, and more. Starting your assistant — this may take a moment... Warning: daemon restart failed: exit status 1 Warning: gateway did not come back after restart Starting gateway... Error: gateway did not start on localhost:18789
$ ollama launch openclaw --model deepseek-v4-pro:cloud Security OpenClaw can read files and run actions when tools are enabled. A bad prompt can trick it into doing unsafe things. Learn more: https://docs.openclaw.ai/gateway/security Updating OpenClaw... │ ◇ ✓ Updating via package manager (32.63s) │ ◇ ✓ Running doctor checks (30.24s) Update Result: OK Root: /home/mslinn/.nvm/versions/node/v25.8.2/lib/node_modules/openclaw Before: 2026.4.26 After: 2026.4.26 Total time: 65.28s Updating plugins... Downloading @ollama/openclaw-web-search… Extracting /tmp/openclaw-npm-pack-1dYItP/ollama-openclaw-web-search-0.2.2.tgz… Installing to /home/mslinn/.openclaw/extensions/openclaw-web-search… Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 85b674df87782e7e73fb5cfde53c5036a7fdf32eb8f34aa47b386e426cce2ff0 -> 757aba9cf42030c2a71750a5cebfb7b9cdbcedff3cdcb37179fb4c502d925822, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Config auto-restored from backup: /home/mslinn/.openclaw/openclaw.json (size-drop-vs-last-good:7398->820, gateway-mode-missing-vs-last-good) Config observe anomaly: /home/mslinn/.openclaw/openclaw.json (size-drop-vs-last-good:7398->715, missing-meta-vs-last-good, gateway-mode-missing-vs-last-good) npm plugins: 0 updated, 1 unchanged. Completion cache update failed: Error: spawnSync /home/mslinn/.nvm/versions/node/v25.8.2/bin/node ETIMEDOUT Restarting service... Gateway did not become healthy after restart. Gateway version mismatch: expected 2026.4.26, running gateway reported unavailable. Service runtime: status=running, state=active, pid=1625547, lastExit=0 Gateway port 18789 status: free. Restart log: /home/mslinn/.openclaw/logs/gateway-restart.log Run `openclaw gateway status --deep` for details. Setting up OpenClaw with Ollama... Model: deepseek-v4-pro:cloud 🦞 OpenClaw 2026.4.26 (be8c246) — Your config is valid, your assumptions are not. Default Ollama model: deepseek-v4-pro:cloud Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 85b674df87782e7e73fb5cfde53c5036a7fdf32eb8f34aa47b386e426cce2ff0 -> d81f9a372cb185254e060580c65b6f0619ef60b70d8086d2fc189442b3eb7452, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Updated ~/.openclaw/openclaw.json Workspace OK: ~/.openclaw/workspace Sessions OK: ~/.openclaw/agents/main/sessions System Node 20.19.5 at /usr/bin/node is below the required Node 22.14+. Using /home/mslinn/.nvm/versions/node/v25.8.2/bin/node for the daemon. Install Node 24 (recommended) or Node 22 LTS from nodejs.org or Homebrew. Installed systemd service: /home/mslinn/.config/systemd/user/openclaw-gateway.service Previous unit backed up to: /home/mslinn/.config/systemd/user/openclaw-gateway.service.bak Tip: run `openclaw configure --section web` to store your Brave API key for web_search. Docs: https://docs.openclaw.ai/tools/web Your assistant can message you on WhatsApp, Telegram, Discord, and more. Connect a channel (messaging app) now? Yes Set up later Starting your assistant — this may take a moment... Starting gateway... ✓ OpenClaw is running Open the Web UI: http://localhost:18789/#token=fd77d96ff87a06b5d521b81bfffd152a3ebfa41d17c223f3 Quick start: /help see all commands openclaw skills browse and install skills The OpenClaw gateway is running in the background. Stop it with: openclaw gateway stop 🦞 OpenClaw 2026.4.26 (be8c246) — One CLI to rule them all, and one more restart because you changed the port. openclaw tui - ws://127.0.0.1:18789 - agent main - session main connecting | idle gateway disconnected: closed | idle
With Hermes Agent
For use with Hermes Agent, type the following. Hermes Agent will be installed if it not already present:
$ ollama launch hermes --model deepseek-v4-pro:cloud Installing Hermes... ┌─────────────────────────────────────────────────────────┐ │ ⚕ Hermes Agent Installer │ ├─────────────────────────────────────────────────────────┤ │ An open source AI agent by Nous Research. │ └─────────────────────────────────────────────────────────┘ ✓ Detected: linux (ubuntu) → Checking for uv package manager... ✓ uv found (uv 0.8.15) → Checking Python 3.11... → Python 3.11 not found, installing via uv... Installed Python 3.11.13 in 1.23s + cpython-3.11.13-linux-x86_64-gnu (python3.11) ✓ Python installed: Python 3.11.13 → Checking Git... ✓ Git 2.51.0 found → Checking Node.js (for browser tools)... ✓ Node.js v25.8.2 found → Checking ripgrep (fast file search)... → Checking ffmpeg (TTS voice messages)... ✓ ffmpeg 7.1.1-1ubuntu4.2 found → Installing ripgrep... Installing: ripgrep Summary: Upgrading: 0, Installing: 1, Removing: 0, Not Upgrading: 32 Download size: 1521 kB Space needed: 5492 kB / 47.9 GB available Get:1 http://archive.ubuntu.com/ubuntu questing/universe amd64 ripgrep amd64 14.1.1-1 [1521 kB] Fetched 1521 kB in 0s (8189 kB/s) Selecting previously unselected package ripgrep. (Reading database ... 487883 files and directories currently installed.) Preparing to unpack .../ripgrep_14.1.1-1_amd64.deb ... Unpacking ripgrep (14.1.1-1) ... Setting up ripgrep (14.1.1-1) ... Processing triggers for man-db (2.13.1-1) ... Scanning processes... Scanning candidates... Restarting services... Service restarts being deferred: systemctl restart NetworkManager.service /etc/needrestart/restart.d/dbus.service systemctl restart gdm.service systemctl restart systemd-logind.service systemctl restart unattended-upgrades.service systemctl restart wpa_supplicant.service No containers need to be restarted. User sessions running outdated binaries: mslinn @ user manager: (sd-pam)[1364] mslinn @ user service: at-spi-dbus-bus.service[46449,46475], dbus.service[1490,46429], filter-chain.service[1494], gnome-keyring-daemon.service[46430], mpris-proxy.service[1538], pipewire-pulse.service[1542], pipewire.service[1492], wireplumber.service[1540] No VM guests are running outdated hypervisor (qemu) binaries on this host. ✓ ripgrep installed → Installing to /home/mslinn/.hermes/hermes-agent... → Trying SSH clone... ✓ Cloned via SSH ✓ Repository ready → Creating virtual environment with Python 3.11... warning: Failed to parse `pyproject.toml` during settings discovery: TOML parse error at line 172, column 17 | 172 | exclude-newer = "7 days" | ^^^^^^^^ failed to parse year in date "7 days": failed to parse "7 da" as year (a four digit integer): invalid digit, expected 0-9 but got Using CPython 3.11.13 Creating virtual environment at: venv Activate with: source venv/bin/activate ✓ Virtual environment ready (Python 3.11) → Installing dependencies... ✓ Main package installed ✓ All dependencies installed → Installing Node.js dependencies (browser tools)... ✅ Browser tools ready. Run: python run_agent.py --help ✓ Node.js dependencies installed → Installing browser engine (Playwright Chromium)... → Playwright may request sudo to install browser system dependencies (shared libraries). → This is standard Playwright setup — Hermes itself does not require root access. Installing dependencies... Switching to root user to install dependencies... Hit:1 http://archive.ubuntu.com/ubuntu questing InRelease Hit:2 http://archive.ubuntu.com/ubuntu questing-updates InRelease Hit:3 http://security.ubuntu.com/ubuntu questing-security InRelease Hit:4 http://archive.ubuntu.com/ubuntu questing-backports InRelease Ign:5 https://apt.kitware.com/ubuntu questing InRelease Err:6 https://apt.kitware.com/ubuntu questing Release 404 Not Found [IP: 66.194.253.25 443] Hit:7 https://ppa.launchpadcontent.net/longsleep/golang-backports/ubuntu questing InRelease Reading package lists... Done Failed to install browsers Error: Installation process exited with code: 100 ⚠ Playwright browser installation failed — browser tools will not work. ⚠ Try running manually: cd /home/mslinn/.hermes/hermes-agent && npx playwright install --with-deps chromium ✓ Browser engine setup complete → Installing TUI dependencies... ⠏⠉ ⠉⢹ ██╗ ██╗███╗ ██╗██╗ ██████╗ ██████╗ ██████╗ ███████╗ ██║ ██║████╗ ██║██║██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██║ ██║██╔██╗ ██║██║██║ ██║ ██║██║ ██║█████╗ ██║ ██║██║╚██╗██║██║██║ ██║ ██║██║ ██║██╔══╝ ╚██████╔╝██║ ╚████║██║╚██████╗╚██████╔╝██████╔╝███████╗ ╚═════╝ ╚═╝ ╚═══╝╚═╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝ BRAILLE ANIMATIONS ⠧⠀ braille ⠀⠀⢸⡇ scan ⠂⠌⡠⠐ rain ⣀⠀ orbit ⢾⣉⡷⠀ pulse ⠊⡰⡡⡘ sparkle ⡑⠀ breathe ⠀⢀⡴⠋ cascade ⠙⠢⣄⣠ waverows ⠀⠛ snake ⣿⡇⠀⠀ columns ⣉⡱⣉⡱ helix ⣿⣿ fillsweep ⠓⠓⠓⠀ scanline ⠠⠐⠈⠁ braillewave ⣿⡿ diagswipe ⡪⡪⡪⠀ checkerboard ⠉⠙⠚⠒ dna npx unicode-animations demo all spinners npx unicode-animations --list list all spinners npx unicode-animations --web open in browser ⣇⣀ ⣀⣸ ✓ TUI dependencies installed → Setting up hermes command... ✓ Symlinked hermes → ~/.local/bin/hermes → ~/.local/bin already on PATH ✓ hermes command ready → Setting up configuration files... ✓ Created ~/.hermes/.env from template ✓ Created ~/.hermes/config.yaml from template ✓ Created ~/.hermes/SOUL.md (edit to customize personality) ✓ Configuration directory ready: ~/.hermes/ → Syncing bundled skills to ~/.hermes/skills/ ... Syncing bundled skills into ~/.hermes/skills/ ... + pokemon-player + minecraft-modpack-server + dogfood + llm-wiki + arxiv + polymarket + research-paper-writing + blogwatcher + github-issues + github-repo-management + github-code-review + github-pr-workflow + codebase-inspection + github-auth + xurl + himalaya + hermes-agent-skill-authoring + systematic-debugging + python-debugpy + node-inspect-debugger + writing-plans + plan + test-driven-development + requesting-code-review + subagent-driven-development + debugging-hermes-tui-commands + webhook-subscriptions + powerpoint + linear + notion + airtable + ocr-and-documents + maps + nano-pdf + google-workspace + opencode + hermes-agent + codex + claude-code + dspy + obliteratus + outlines + serving-llms-vllm + llama-cpp + axolotl + unsloth + fine-tuning-with-trl + huggingface-hub + audiocraft-audio-generation + segment-anything-model + weights-and-biases + evaluating-llms-harness + yuanbao + imessage + apple-reminders + apple-notes + findmy + claude-design + pixel-art + baoyu-infographic + ascii-art + humanizer + design-md + manim-video + baoyu-comic + touchdesigner-mcp + p5js + ascii-video + popular-web-designs + songwriting-and-ai-music + architecture-diagram + ideation + excalidraw + youtube-content + spotify + songsee + gif-search + heartmula + openhue + obsidian + godmode + jupyter-live-kernel + native-mcp Done: 83 new, 0 updated, 0 unchanged. 83 total bundled. ✓ Skills synced to ~/.hermes/skills/ → Skipping setup wizard (--skip-setup) ┌─────────────────────────────────────────────────────────┐ │ ✓ Installation Complete! │ └─────────────────────────────────────────────────────────┘ 📁 Your files: Config: /home/mslinn/.hermes/config.yaml API Keys: /home/mslinn/.hermes/.env Data: /home/mslinn/.hermes/cron/, sessions/, logs/ Code: /home/mslinn/.hermes/hermes-agent ───────────────────────────────────────────────────────── 🚀 Commands: hermes Start chatting hermes setup Configure API keys & settings hermes config View/edit configuration hermes config edit Open config in editor hermes gateway install Install gateway service (messaging + cron) hermes update Update to latest version ───────────────────────────────────────────────────────── ⚡ Reload your shell to use 'hermes' command: source ~/.bashrc Hermes installed successfully This will modify your Hermes Agent configuration: /home/mslinn/.hermes/config.yaml Backups will be saved to /tmp/ollama-backups/ Hermes can message you on Telegram, Discord, Slack, and more.
GPT-OSS
Released in August 2025, GPT-OSS is a family of open-weight
reasoning models developed by OpenAI in collaboration with partners, bringing
reasoning and tool-use capabilities to self-hosted and edge environments.These
models are released under the Apache 2.0 license, allowing for broad access
and customization.
gpt-oss-120b requires an NVIDIA H100 or A100, which is why many
users run it from the cloud.
$ ollama run gpt-oss:120b-cloud
Other commands are:
$ ollama launch claude --model gpt-oss:120b-cloud $ ollama launch codex --model gpt-oss:120b-cloud $ ollama launch droid --model gpt-oss:120b-cloud $ ollama launch hermes --model gpt-oss:120b-cloud $ ollama launch openclaw --model gpt-oss:120b-cloud $ ollama launch opencode --model gpt-oss:120b-cloud $ ollama launch pi --model gpt-oss:120b-cloud
Its smaller sibling, gpt-oss-20b, is designed for edge devices
and consumer hardware with 16 GB of memory, such as the NVIDIA 3060. However,
it can also be run from the cloud if for you want to do that some reason:
$ ollma run gpt-oss:20b-cloud
Other commands are:
$ ollama launch claude --model gpt-oss:20b-cloud $ ollama launch codex --model gpt-oss:20b-cloud $ ollama launch droid --model gpt-oss:20b-cloud $ ollama launch hermes --model gpt-oss:20b-cloud $ ollama launch openclaw --model gpt-oss:20b-cloud $ ollama launch opencode --model gpt-oss:20b-cloud $ ollama launch pi --model gpt-oss:20b-cloud
Minimax-m2.7
See MiniMax-M2 and Mini-Agent Review and MiniMax M2.7.
$ ollama run minimax-m2.7:cloud Connecting to 'minimax-m2.7:cloud' on 'ollama.com' ⚡ >>> Send a message (/? for help)
Other commands are:
$ ollama launch claude --model minimax-m2.7:cloud $ ollama launch codex --model minimax-m2.7:cloud $ ollama launch droid --model minimax-m2.7:cloud $ ollama launch hermes --model minimax-m2.7:cloud $ ollama launch openclaw --model minimax-m2.7:cloud $ ollama launch opencode --model minimax-m2.7:cloud $ ollama launch pi --model minimax-m2.7:cloud
$ ollama show minimax-m2.7:cloud Model architecture minimax-m2 parameters 0 context length 204800 embedding length 3072 quantization Capabilities completion tools thinking
Nemotron 3 Super
Mar 11, 2026
The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging.
This model tackles the “context explosion” with a native 1M-token context window that gives agents long-term memory for aligned, high-accuracy reasoning. The model is fully open with open weights, datasets, and recipes so developers can easily customize, optimize, and deploy it on their own infrastructure.
$ ollama run nemotron-3-super:cloud Connecting to 'nemotron-3-super:cloud' on 'ollama.com' ⚡ >>> Use Ctrl + d or /bye to exit. >>> CTRL+D $ ollama list NAME ID SIZE MODIFIED nemotron-3-super:cloud be3943c5a818 - 6 seconds ago
Other commands are:
$ ollama launch claude --model nemotron-3-super:cloud $ ollama launch codex --model nemotron-3-super:cloud $ ollama launch droid --model nemotron-3-super:cloud $ ollama launch hermes --model nemotron-3-super:cloud $ ollama launch openclaw --model nemotron-3-super:cloud $ ollama launch opencode --model nemotron-3-super:cloud $ ollama launch pi --model nemotron-3-super:cloud
QWEN 3.x
QWEN 3.6 was released in April 2026, but the Ollama cloud models had not been updated as of 2026-05-01.
qwen3-coder:480b-cloud is a specialized heavyweight model
specifically tuned for autonomous coding agents.
$ ollama launch claude --model qwen3-coder:480b-cloud ╭─── Claude Code v2.1.121 ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ Tips for getting started │ │ Welcome back Mike! │ Run /init to create a CLAUDE.md file with instructions for Claude │ │ │ ─────────────────────────────────────────────────────────────────────────── │ │ ▐▛███▜▌ │ What's new │ │ ▝▜█████▛▘ │ Added `alwaysLoad` option to MCP server config — when `true`, all tools fr… │ │ ▘▘ ▝▝ │ Added `claude plugin prune` to remove orphaned auto-installed plugin depen… │ │ qwen3-coder:480b-cloud · API Usage Billing · │ Added a type-to-filter search box to `/skills` so you can find a skill in … │ │ mslinn@mslinn.com's Organization │ /release-notes for more │ │ /mnt/f/sites/intranet.mslinn.com │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ❯ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── mslinn@Bear:intranet.mslinn.com [qwen3-coder:480b-cloud] 🌿 master; 0 edits; ; 0 tokens (+0)
qwen3.5:cloud is a smarter generalist with better conversational ability.
$ ollama launch claude --model qwen3.5:cloud
Other commands are:
$ ollama launch claude --model qwen3.5:cloud $ ollama launch codex --model qwen3.5:cloud $ ollama launch droid --model qwen3.5:cloud $ ollama launch hermes --model qwen3.5:cloud $ ollama launch openclaw --model qwen3.5:cloud $ ollama launch opencode --model qwen3.5:cloud $ ollama launch pi --model qwen3.5:cloud
Local Models
The Ollama default model depth is Q4 (4-bit quantized), which is faster but can be much less accurate than Q8 (8-bit quantization) models. Install Q8 versions if possible.
Installation
To install or update a model without running it, type ollama pull,
followed by the name of the model.
You can install and run any LLAMA-compatible model by typing
ollama run, followed by the name of the model.
To list the models registered on your computer, use the
ollama list command:
$ ollama list NAME ID SIZE MODIFIED deepseek-v4-pro:cloud 22bfd5026abd - 55 minutes ago nemotron-3-super:cloud be3943c5a818 - 3 weeks ago gpt-oss:120b-cloud 569662207105 - 3 weeks ago minimax-m2.7:cloud 06daa293c105 - 3 weeks ago qwen3-coder:480b-cloud e30e45586389 - 3 weeks ago minimax-m2.5:cloud c0d5751c800f - 2 months ago test:latest 3b86bc070971 4.9 GB 4 months ago fluffy/l3-8b-stheno-v3.2:latest f1afe09480f3 4.9 GB 5 months ago bjoernb/qwen3-coder-30b-1m:latest fb3efb7f8d40 18 GB 5 months ago deepseek-coder-v2:lite 63fb193b3a9b 8.9 GB 5 months ago nomic-embed-text:latest 0a109f422b47 274 MB 5 months ago sllm/glm-z1-9b:latest 4b2d0d3f65a6 8.6 GB 5 months ago mistral-small3.2:latest 5a408ab55df5 15 GB 7 months ago deepseek-r1:8b 6995872bfe4c 5.2 GB 7 months ago deepseek-r1:7b 755ced02ce7b 4.7 GB 7 months ago llama3:8b 365c0bd3c000 4.7 GB 7 months ago llama2-uncensored:70b bdd0ec2f5ec5 38 GB 2 years ago
DeepSeek
$ ollama pull deepseek-r1:8b pulling manifest pulling e6a7edc1a4d7: 100% ▕████████████████████████████ ▏ 5.2 GB/5.2 GB 63 MB/s 0s pulling c5ad996bda6e: 100% ▕████████████████████████████▏ 556 B pulling 6e4c38e1172f: 100% ▕████████████████████████████▏ 1.1 KB pulling ed8474dc73db: 100% ▕████████████████████████████▏ 179 B pulling f64cd5418e4b: 100% ▕████████████████████████████▏ 487 B verifying sha256 digest writing manifest success
You can also download and run in one step by typing:
$ ollama run deepseek-r1:8b
fluffy/l3-8b-stheno-v3.2
fluffy/l3-8b-stheno-v3.2 is a small,
uncensored model that will even run slowly on a laptop without a powerful video card.
$ ollama run fluffy/l3-8b-stheno-v3.2
llama2-uncensored
The uncensored Llama2 70B model requires a powerful machine with lots of GPU RAM.
$ ollama pull llama2-uncensored:70b pulling manifest pulling abca3de387b6... 100% ▕███████████████████████████▏ 38 GB pulling 9224016baa40... 100% ▕███████████████████████████▏ 7.0 KB pulling 1195ea171610... 100% ▕███████████████████████████▏ 4.8 KB pulling 28577ba2177f... 100% ▕███████████████████████████▏ 55 B pulling ddaa351c1f3d... 100% ▕███████████████████████████▏ 51 B pulling 9256cd2888b0... 100% ▕███████████████████████████▏ 530 B verifying sha256 digest writing manifest removing any unused layers success
QWEN 3.x
For coding, you want a balance of reasoning depth and speed. The best models for coding with the NVIDIA 3060, which has 12GB RAM, and the NVIDIA 3060, which has 24GB RAM, as of late April 2026 are shown below.
qwen3.6:35b
This is a Mixture-of-Experts (MoE) model. Even though it has 35B total parameters, because it is an MoE, only about 10% of all parameters are active at any given moment.
This model is fast, especially on an NVIDIA 4090. An NVIDIA 3060 can provide high tokens-per-second while maintaining the intelligence of a much larger model. Because the Q4_K_M quant is about 22GB, it will spill over into system RAM, but because it is an MoE, the performance hit should be minimal.
$ ollama run qwen3.6:35b pulling manifest pulling f5ee307a2982: 100% ▕█████████████████████████▏ 23 GB pulling 5f3a3c817e78: 100% ▕█████████████████████████▏ 11 KB pulling 86eff881e8d2: 100% ▕█████████████████████████▏ 94 B pulling 5d1c86a949f7: 100% ▕█████████████████████████▏ 462 B verifying sha256 digest writing manifest success
>>> Send a message (/? for help) /show info Model architecture qwen35moe parameters 36.0B context length 262144 embedding length 2048 quantization Q4_K_M Capabilities completion vision tools thinking Parameters min_p 0 presence_penalty 1.5 repeat_penalty 1 temperature 1 top_k 20 top_p 0.95 License Apache License Version 2.0, January 2004 ...
I noticed that architecture above is shown as
qwen35moe, not qwen36moe as expected.
Other commands are:
ollama launch claude --model qwen3.6:35b ollama launch codex --model qwen3.6:35b ollama launch droid --model qwen3.6:35b ollama launch hermes --model qwen3.6:35b ollama launch openclaw --model qwen3.6:35b ollama launch opencode --model qwen3.6:35b ollama launch pi --model qwen3.6:35b
qwen3.6:27b
This model often outperforms the qwen3.6:35b version on complex
logic and multi-file refactoring.
This is a dense model, meaning it uses all its parameters for every token. It is arguably the most capable coding model that can realistically run on consumer hardware.
The qwen3.6:27b provides fast performance on a GPU with 24GB
VRAM, for example the NVIDIA 4090. To fit this on the NVIDIA 3060 with room
for code context, use a 4-bit quantization.
$ ollama run qwen3.6:27b pulling manifest pulling 83c54730a5fe: 100% ▕██████████████████████████▏ 17 GB pulling 5f3a3c817e78: 100% ▕██████████████████████████▏ 11 KB pulling 86eff881e8d2: 100% ▕██████████████████████████▏ 94 B pulling 728c795c7762: 100% ▕██████████████████████████▏ 456 B verifying sha256 digest writing manifest success
>>> Send a message (/? for help) /show info Model architecture qwen35 parameters 27.8B context length 262144 embedding length 5120 quantization Q4_K_M Capabilities completion vision tools thinking Parameters top_k 20 top_p 0.95 min_p 0 presence_penalty 1.5 repeat_penalty 1 temperature 1 License Apache License Version 2.0, January 2004 ...
I noticed that architecture above is shown as
qwen35, not qwen36 as expected.
Other commands are:
ollama launch claude --model qwen3.6:27b ollama launch codex --model qwen3.6:27b ollama launch droid --model qwen3.6:27b ollama launch hermes --model qwen3.6:27b ollama launch openclaw --model qwen3.6:27b ollama launch opencode --model qwen3.6:35b ollama launch pi --model qwen3.6:35b
Running Queries
Ollama queries can be run in several ways:
- REST API
otermweb-ui- Computer language bindings (e.g. Python, JavaScript/TypeScript)
REST API
I used curl to query the Ollama REST API from the command line,
then I used jq and fold to process the response.
The -s option for curl prevents the progress meter
from cluttering up the screen, and the jq filter removes
everything from the response except the desired text. The fold
command wraps the text response to a width of 72 characters.
$ curl -s http://localhost:11434/api/generate -d '{ "model": "llama2:70b", "prompt": "Why is there air?", "stream": false }' | jq -r .response | fold -w 72 -s Air, or more specifically oxygen, is essential for life as we know it. It exists because of the delicate balance of chemical reactions in Earth’s atmosphere, which has allowed complex organisms like ourselves to evolve.
But if you’re asking about air in a broader sense, it serves many functions: it helps maintain a stable climate, protects living things from harmful solar radiation, and provides buoyancy for various forms of life, such as fish or birds.
Go Binding
The official Go language bindings can be added to a Go project as follows (additional Go libraries exist):
$ mkdir /tmp/blah
$ cd /tmp/blah
$ go mod init github.com/mslinn/demo go: creating new go.mod: module github.com/mslinn/demo
$ go get github.com/ollama/ollama/api go: downloading golang.org/x/sys v0.37.0 go: added github.com/bahlo/generic-list-go v0.2.0 go: added github.com/buger/jsonparser v1.1.1 go: added github.com/google/uuid v1.6.0 go: added github.com/mailru/easyjson v0.7.7 go: added github.com/ollama/ollama v0.18.2 go: added github.com/wk8/go-ordered-map/v2 v2.1.8 go: added golang.org/x/crypto v0.43.0 go: added golang.org/x/sys v0.37.0 go: added gopkg.in/yaml.v3 v3.0.1
Ruby Binding
I wrote this Ruby method to describe images.
def describe_image(image_filename)
@client = Ollama.new(
credentials: { address: @address },
options: {
server_sent_events: true,
temperature: @temperature,
connection: { request: { timeout: @timeout, read_timeout: @timeout } },
}
)
result = @client.generate(
{
model: @model,
prompt: 'Please describe this image.',
images: [Base64.strict_encode64(File.read(image_filename))],
}
)
puts result.map { |x| x['response'] }.join
end
The results with the llama2:70b model were ridiculous - an
example of the famous hallucinations that LLMs entertain their audience with.
As the public becomes enculturated with these hallucinations, we may come to
prefer them over human comedians. Certainly there will be a lot of material
for the human comedians to fight back with. For example, when describing a
photo of me:
$ ollama pull llama2:70b
$ describe -m llama2:70b /mnt/c/bestPhotoOfMike.png This is an image of a vibrant and colorful sunrise over the ocean, with the sun peeking above the horizon, casting warm, golden hues over the sky and water below. The sunlight reflects off the rippled surface of the water, creating shimmering patterns that contrast with the tranquil darkness of the receding waters. In the foreground, a solitary figure is silhouetted against the rising sun, perhaps lost in thought or finding inspiration in the breathtaking beauty of the scene.
The
llava model
is supposed to be good at describing images, so I installed it and tried
again, with excellent results:
$ ollama pull llava:13b
$ describe -m llava:13b /mnt/c/bestPhotoOfMike.png The image features a smiling man wearing glasses and dressed in a suit and tie. He has a well-groomed appearance. The man's attire includes a jacket, dress shirt, and a patterned tie that complements his professional outfit. The setting appears to be a studio environment, as there is a background behind the man that has an evenly lit texture. The man's smile conveys confidence and approachability, making him appear knowledgeable in his field or simply happy to pose for this photograph.
You can try the latest LLaVA model online.
Ollama and Claude CLI
I wrote a review of Claude CLI. It can be used as a harness to run Ollama-compatible models, no matter if they are local or in the cloud. Documentation is here.
Local Models
llama2
My Windows workstation has 64 GB RAM, a 13th generation Intel i7 and a modest NVIDIA 3060. I decided to try the biggest Llama 2 model to see what might happen. I downloaded and executed the Llama 2 70B model with the following incantation. An NVIDIA 4090 would have been a better video card for this Ollama model, and it would still have been slow.
$ ollama run llama2:70b pulling manifest pulling 68bbe6dc9cf4... 100% ▕██████████████████████████▏ 38 GB pulling 8c17c2ebb0ea... 100% ▕██████████████████████████▏ 7.0 KB pulling 7c23fb36d801... 100% ▕██████████████████████████▏ 4.8 KB pulling 2e0493f67d0c... 100% ▕██████████████████████████▏ 59 B pulling fa304d675061... 100% ▕██████████████████████████▏ 91 B pulling 7c96b46dca6c... 100% ▕██████████████████████████▏ 558 B verifying sha256 digest writing manifest removing any unused layers success >>> Send a message (/? for help)
I played around to learn what the available messages were. For more information, see Tutorial: Set Session System Message in Ollama CLI by Ingrid Stevens.
>>> /? Available Commands: /set Set session variables /show Show model information /bye Exit /?, /help Help for a command /? shortcuts Help for keyboard shortcuts Use """ to begin a multi-line message. >>> Send a message (/? for help) >>> /show Available Commands: /show info Show details for this model /show license Show model license /show modelfile Show Modelfile for this model /show parameters Show parameters for this model /show system Show system message /show template Show prompt template >>> /show modelfile # Modelfile generated by "ollama show" # To build a new Modelfile based on this one, replace the FROM line with: # FROM llama2:70b FROM /usr/share/ollama/.ollama/models/blobs/sha256:68bbe6dc9cf42eb60c9a7f96137fb8d472f752de6ebf53e9942f267f1a1e2577 TEMPLATE """[INST] <<SYS>>{{ .System }}<</SYS>> {{ .Prompt }} [/INST] """ PARAMETER stop "[INST]" PARAMETER stop "[/INST]" PARAMETER stop "<<SYS>>" >>> /show system No system message was specified for this model.
>>> /show template [INST] <<SYS>>{{ .System }}<</SYS>>
{{ .Prompt }} [/INST] >>> %}/bye
USER: and ASSISTANT: are helpful when writing a request for the model to reply to.
QWEN 3.5
The 9B model is the default when running Ollama locally. It fits comfortably on a 12GB GPU like an RTX 3060, and supports text, image input, thinking, and tool calling. 4b, 2b, and 0.8b models are also available. To run the default 9B model locally, type:
$ ollama launch claude --model qwen3.5
NVIDIA Nemotron 3 Super
You can use a free Ollama account to run the NVIDIA Nemotron 3 Super model in the cloud under the control (or lack of control) of OpenClaw.
$ ollama launch openclaw --model nemotron-3-super:cloud
Installing OpenClaw... npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead
added 539 packages in 18s OpenClaw installed successfully
To use nemotron-3-super:cloud, please sign in.
Navigate to: https://ollama.com/connect?name=Bear&key=c3NoLWVkMjU1MTkgQUFBQUMzTnphQzFsWkRJMU5URTVBQUFBSU9odVJTM0FMdVMvUGZid3M0STJHVUdJekFyTlJpL1J3MmtVR210ZmlXaUY
⠸ Waiting for sign in to complete... Launching OpenClaw with nemotron-3-super:cloud...
Security
OpenClaw can read files and run actions when tools are enabled. A bad prompt can trick it into doing unsafe things.
Learn more: https://docs.openclaw.ai/gateway/security
I understand the risks. Continue?
Setting up OpenClaw with Ollama... Model: nemotron-3-super:cloud
🦞 OpenClaw 2026.3.13 (61d171a) — Half butler, half debugger, full crustacean. Default Ollama model: nemotron-3-super:cloud Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 85b674df87782e7e73fb5cfde53c5036a7fdf32eb8f34aa47b386e426cce2ff0 -> 2da265b895dc0f25e207bdda2d7183df4e8972ea0ff30bda366eb6cde757d0f8, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Updated ~/.openclaw/openclaw.json Workspace OK: ~/.openclaw/workspace Sessions OK: ~/.openclaw/agents/main/sessions System Node 20.19.5 at /usr/bin/node is below the required Node 22.16+. Using /home/mslinn/.nvm/versions/node/v25.8.1/bin/node for the daemon. Install Node 24 (recommended) or Node 22 LTS from nodejs.org or Homebrew. Installed systemd service: /home/mslinn/.config/systemd/user/openclaw-gateway.service Enabled systemd lingering for mslinn. │ ◇ Agents: main (default) Heartbeat interval: 30m (main) Session store (main): /home/mslinn/.openclaw/agents/main/sessions/sessions.json (0 entries) Tip: run `openclaw configure --section web` to store your Brave API key for web_search. Docs: https://docs.openclaw.ai/tools/web ✓ Installed web search plugin
Starting your assistant — this may take a moment...
Starting gateway...
✓ OpenClaw is running
Open the Web UI: http://localhost:18789/#token=1032095f6293e02b513eff40fca59bb947096c70b2906eee
Quick start: /help see all commands openclaw configure --section channels connect WhatsApp, Telegram, etc. openclaw skills browse and install skills
The OpenClaw gateway is running in the background. Stop it with: openclaw gateway stop
🦞 OpenClaw 2026.3.13 (61d171a) — Somewhere between 'hello world' and 'oh god what have I built.'
openclaw tui - ws://127.0.0.1:18789 - agent main - session main
session agent:main:main
Wake up, my friend!
I need to check my daily memory file first, as per the startup routine.
Let me read the memory file for today's date (2026-03-20):I'll create the memory directory and today's memory file since it doesn't exist yet:Now let me check yesterday's memory file (2026-03-19) for recent context:Let me also check if there's a MEMORY.md file for long-term memory (only load in main session):Now let me read the core files to understand who I am and who I'm helping: ⠴ running • 12s | connected agent main | session main | unknown | tokens ?/128k
Background Agents
Ollama can launch non-interactive agents using the new --yes
flag. This enables running OpenClaw and other agents in the background without
interruption on servers, scripts, and other environments that do not support the
OpenClaw gateway.
OpenClaw and other background agents hand control of your computer to autonomous LLM models that are widely known to be unreliable and prone to extremely destructive acts.
To launch a non-interactive OpenClaw agent with the
minimax-m2.7:cloud model, run:
$ ollama launch openclaw \
--model minimax-m2.7:cloud \
--yes -- agent \
--agent main \
--local \
--message "Prepare a pre-read for my next meeting"
"Prepare a pre-read" means creating a document or set of materials for attendees to review before a meeting starts. The goal is to provide necessary background and context so you can skip the catch-up phase and dive straight into productive discussion or decision-making during the actual meeting.
Recording a Session
See Recording Chat Transcripts to obtain the
record script and to learn various ways of viewing the
transcript.
The following shows how to use record to launch Ollama and run
the qwen3:4b model.
$ record -c 'ollama run qwen3:4b' Press Ctrl+D to end the chat and stop recording. Script started, output log file is '2025-12-12_20-06-39_chat.log'.
>>> /show Available Commands: /show info Show details for this model /show license Show model license /show modelfile Show Modelfile for this model /show parameters Show parameters for this model /show system Show system message /show template Show prompt template
>>> ^D # Exit the ollama session
Script done. Recording finished. Log saved to /home/mslinn/2025-12-12_20-06-39_chat.log
Documentation
- Claude Code Is Magnificent, But Anthropic is Rapacious
- Gemini vs. Sonnet 3.5 and 4.6 for Meticulous Work
- Gemini Code Assist
- Antigravity
- Aider: A Lean and Focused Agentic Programming Assistant
- AI Planning vs. Waterfall Project Management
- Best Local LLMs for Coding
- Running GLM on the Ollama app
- Early Draft: Multi-LLM Agent Pipelines
- MiniMax-M2 and Mini-Agent Review
- MiniMax Web Search with ddgr
- LLM Societies
- Codex: Agentic Programming with ChatGPT in Visual Studio Code