Large Language Models

Large Language Models with Ollama

Published 2024-01-14. Last modified 2026-05-02.
Time to read: 16 minutes.

This page is part of the llm collection.

Overview

Ollama is an open-source tool built with the Go language for managing and using large language models (LLMs). It is responsive, stable, and is not subject to the vagarities of Node.js or the inefficiency of Python.

All Ollama programs and features run on Windows, macOS, and Linux.

Meta developed the Llama open-source LLMs. Ollama is not owned by or a product of Meta. While the name similarity often leads to confusion, Ollama and Meta's Llama are separate entities. Ollama is merely a business partner of Meta. Ollama’s public funding comes from venture capital firms like Y Combinator and Essence Venture Capital, not Meta.

Rapidly Evolving Product

Originally released as a product that only could host non-agentic LLM chat sessions for local models, Ollama can now interoperate with agentic harnesses to run models residing locally or in the cloud; this allows you to run local models most of the time, but when heavy lifting is required you can invoke cloud-based LLMs.

Distributed Architecture

Small models running on typical desktop computers and prosumer-grade servers are typically not as powerful or as fast as large models running on enterprise-class hardware, but you have complete control over them without extra cost, censorship, restrictions, or privacy issues.

The client (ollama CLI, bespoke program, or Open Web UI) orchestrates the conversation with the Ollama server, which in turn controls the LLM requested by the Ollama client. Because LLMs are inherently stateless, the client must store the list of messages and send the entire accumulated history back to the server with every new prompt. Only as much of the history that fits within the context window is used; this limits the maximum amount of data the model can process.

Local Models

For local models, the Ollama server process converts the contents of the context window into a dynamic dictionary of tokens managed in VRAM. This dictionary is called the KV cache (key-value cache), and it stores a mathematical representation of the conversation history within the GPU memory space. The model obtains context from the KV cache, and writes new tokens to it during inference.

If you set OLLAMA_NUM_PARALLEL, the server allocates multiple KV caches in VRAM, one for each Ollama client. This allows the server to handle many user sessions simultaneously without them overwriting each other’s context. This feature is often called multitenancy, and requires more VRAM.

Cloud Models

In contrast, cloud-based models use the Ollama server process acts as a request gateway that offloads state management to a remote inference orchestrator. Instead of the process managing a local cache in VRAM, it coordinates with a managed data layer that distributes the conversation history across a cluster.

Common Core

Ollama clients can access Ollama servers via:

  • The ollama CLI, oterm, the new Ollama native app, and the TUI all interact with the same Ollama background service, so they have the same fundamental capabilities for running models. This service contains the core inference engine and model management logic. While the Ollama app provides a user-friendly desktop interface to start and manage this service, the CLI and TUI offer programmatic control over the same fundamental model capabilities.
  • REST interface
  • Open WebUI provides a browser-based chat experience similar to ChatGPT.

Ollama Service

After installation, the Ollama service runs in the background. The service API is available by default at endpoint localhost:11434. For native Windows and macOS, the Ollama app presents as a tray application.

This is the help message for the Ollama service launcher:

Output of ollama serve --help
Start Ollama

Usage:
  ollama serve [flags]

Aliases:
  serve, start

Flags:
  -h, --help   help for serve

Environment Variables:
      OLLAMA_DEBUG               Show additional debug information (e.g. OLLAMA_DEBUG=1)
      OLLAMA_HOST                IP Address for the ollama server (default 127.0.0.1:11434)
      OLLAMA_CONTEXT_LENGTH      Context length to use unless otherwise specified (default: 4k/32k/256k based on VRAM)
      OLLAMA_KEEP_ALIVE          The duration that models stay loaded in memory (default "5m")
      OLLAMA_MAX_LOADED_MODELS   Maximum number of loaded models per GPU
      OLLAMA_MAX_QUEUE           Maximum number of queued requests
      OLLAMA_MODELS              The path to the models directory
      OLLAMA_NUM_PARALLEL        Maximum number of parallel requests
      OLLAMA_NO_CLOUD            Disable Ollama cloud features (remote inference and web search)
      OLLAMA_NOPRUNE             Do not prune model blobs on startup
      OLLAMA_ORIGINS             A comma separated list of allowed origins
      OLLAMA_SCHED_SPREAD        Always schedule model across all GPUs
      OLLAMA_FLASH_ATTENTION     Enabled flash attention
      OLLAMA_KV_CACHE_TYPE       Quantization type for the K/V cache (default: f16)
      OLLAMA_LLM_LIBRARY         Set LLM library to bypass autodetection
      OLLAMA_GPU_OVERHEAD        Reserve a portion of VRAM per GPU (bytes)
      OLLAMA_LOAD_TIMEOUT        How long to allow model loads to stall before giving up (default "5m")

Manual Start

For all OSes, start the service manually by typing:

Shell
$ ollama serve

Manual Stop

For Linux, stop the service manually by typing:

Shell
$ sudo systemctl stop ollama

For macOS:

3 macOS alternatives
$ pkill Ollama
$ killall Ollama
$ brew services stop ollama

For Windows:

PowerShell or CMD
PS C:\Users\Mike Slinn> taskkill /IM ollama.exe /F

When the Ollama service is running, Ollama loads required local models into memory only when you request them (e.g., via the ollama run command or an API call), and it unloads them to save resources.

Given a working connection and without regard to communication authentication requirements, any Ollama client is capable of accessing any Ollama server that is configured to listen to other network nodes.

Agentic Harnesses

In addition to Ollama's well-known command-line chat interface, the provided integrations for agentic harnesses include Claude, Codex, Droid, Hermes-Agent, OpenClaw, OpenCode, and Pi.

In March 2026, new web search and web fetch plugins were added to Ollama. Although the documentation states that these features only work with Ollama/OpenClaw, they are actually available for all Ollama agentic model configurations.

Infobits

The Ollama Discord channel is here.

PatchBot shows the latest changes.

Open WebUI Backstory

I wrote an article about Open WebUI when it first became available. Here is some history to help understand the current state.

Open WebUI was originally at ollamahub.ai and the website was confusingly similar to ollama.com. Furthermore, the project was also frequently called Ollama WebUI. Legal action was threatened. OllamaHub rebranded to Open WebUI and diversified from Ollama-only to Ollama plus other technologies. openwebui.com continues to be a good resource for Ollama users, however there is no love lost between the two organizations. Both organizations appear to have fought each other hard in private, while smiling in public.

As the projects evolved, their goals began to diverge, but they continue to try to eat each other’s lunch. Ollama has focused on building a vertically integrated ecosystem, recently introducing its own official desktop chat app, its own cloud hosting service, and a proprietary engine to replace the core technology that was originally provided by llama.cpp.

In contrast, the Open WebUI team explicitly chose their new name to emphasize general usefulness instead of just offering Ollama-specific technology. They have expanded support to include competing backends like vLLM, LM Studio, and OpenAI APIs, effectively making Ollama just one of many options rather than the exclusive core.

The relationship was further complicated by community-level debates. In 2025, Open WebUI shifted its licensing to include stricter attribution requirements (sometimes called "badgeware"), which some in the community viewed as a move to prevent other companies from easily forking or integrating their code without prominent credit. There has been ongoing community criticism regarding Ollama's own history with attribution, specifically a long delay in properly acknowledging the llama.cpp project, which powered Ollama for years.

Today, they are best described as frenemies. They remain technically compatible but they are now competing for the same users' attention. Ollama wants users to stay within its official app and cloud ecosystem, while Open WebUI wants to be the universal dashboard for every AI tool on your machine. Open WebUI still offers the best chat interface for Ollama's API, but it has very limited agentic support.

The above explains why Open WebUI is not one of the many integrations that Ollama promotes. Ollama is increasingly positioning itself as a platform, not just a tool. With the launch of the Ollama App and Ollama Cloud, Open WebUI is now a direct competitor for being the user's primary interface.

Experienced programmers who work with agentic LLMs should look at the Ollama/Pi harness integration instead of the Open WebUI chat console.

Installation

Installation instructions for Ollama are simple.

macOS and Linux

Ollama installation and update for native Linux, WSL, and macOS looks like this:

Shell
$ curl -fsSL https://ollama.com/install.sh | sh
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink '/etc/systemd/system/default.target.wants/ollama.service'  '/etc/systemd/system/ollama.service'.
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line. 
$ ollama --version ollama version is 0.21.2

Windows

Windows users can install Ollama on native Windows and/or WSL. Some LLM harnesses must be run from WSL, not Windows. If you normally use WSL, you might benefit from installing Ollama on both WSL and native Windows.

When running under Windows, Ollama and its local models run more efficiently if Ollama for Windows is installed instead of installing on WSL. You might not have this option for other reasons. You must choose between flexibility and performance.

PowerShell
PS C:\Users\mslinn> irm https://ollama.com/install.ps1 | iex
>>> Downloading Ollama for Windows...
######################################## 100.0%
>>> Installing Ollama...
>>> Install complete. Run 'ollama' from the command line. 
PS C:\Users\Mike Slinn> ollama --version ollama version is 0.21.2

User Interfaces

Ollama includes two user interfaces:

  1. A CLI (command line interface) the ollama executable allows the user to manage models and interact with them. Note that it does not store sessions.
  2. A GUI (graphical user interface) called the Ollama App to manage the background service and provide a basic desktop presence. For Windows and macOS, it resides in the system tray. the Ollama App is primarily a "runner" for the background service rather than a full-featured chat window (like ChatGPT).

For persistent chat sessions, users typically install third-party tools like Open WebUI.

The built-in LLM harness integrations support agentic coding.

CLI

Starting with Ollama v0.17.x and later (early 2026), the ollama command no longer just shows a static help menu. Instead, it launches a TUI interface that allows you to:

  • View a list of recommended local and cloud models directly in your terminal.
  • Use your arrow keys to pick a model and launch it in seconds without needing to remember specific model tags.
  • On first launch, this interface can now guide users through the process of configuring providers and installing necessary components like the OpenClaw gateway daemon.
Shell
$ ollama
Ollama 0.18.2
Run a model Start an interactive chat with a model
Launch Claude Code Anthropic's coding tool with subagents
Launch Codex (not installed) OpenAI's open-source coding agent
Launch OpenClaw Personal AI with 100+ skills
▸ Launch OpenCode (not installed) Install from https://opencode.ai
Launch Droid (not installed) Factory's coding agent across terminal and IDEs
Launch Pi (not installed) Minimal AI agent toolkit with plugin support
Launch Cline (not installed) Autonomous coding agent with parallel execution

↑/↓ navigate • enter launch • → configure • esc quit

If you select Run a model and press Enter, the following menu appears:

Ollama TUI session (continued)
Select model to run: Type to filter...
Recommended ▸ kimi-k2.5:cloud Multimodal reasoning with subagents qwen3.5:cloud Reasoning, coding, and agentic tool use with vision glm-5:cloud Reasoning and code generation minimax-m2.7:cloud Fast, efficient coding and real-world productivity glm-4.7-flash Reasoning and code generation locally, ~25GB, (not downloaded) qwen3.5 Reasoning, coding, and visual understanding locally, ~11GB, (not downloaded)
More bjoernb/qwen3-coder-30b-1m deepseek-coder-v2:lite deepseek-r1:7b deepseek-r1:8b ... and 9 more
↑/↓ navigate • enter select • esc cancel

CLI

The help message is:

Output of ollama -h
Large language model runner
Usage: ollama [flags] ollama [command]
Available Commands: serve Start Ollama create Create a model show Show information for a model run Run a model stop Stop a running model pull Pull a model from a registry push Push a model to a registry signin Sign in to ollama.com signout Sign out from ollama.com list List models ps List running models cp Copy a model rm Remove a model launch Launch the Ollama menu or an integration help Help about any command
Flags: -h, --help help for ollama --nowordwrap Don't wrap words to the next line automatically --verbose Show timings for response -v, --version Show version information
Use "ollama [command] --help" for more information about a command.

Manual Server Start

The Ollama app starts the server on demand. You can start the Ollma server from the command line, if it is not already running as a service:

Shell
$ ollama serve
2024/01/14 16:25:20 images.go:808: total blobs: 0
2024/01/14 16:25:20 images.go:815: total unused blobs removed: 0
2024/01/14 16:25:20 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
2024/01/14 16:25:21 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/14 16:25:21 gpu.go:88: Detecting GPU type
2024/01/14 16:25:21 gpu.go:203: Searching for GPU management library libnvidia-ml.so
2024/01/14 16:25:21 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1]
2024/01/14 16:25:21 gpu.go:94: Nvidia GPU detected
2024/01/14 16:25:21 gpu.go:135: CUDA Compute Capability detected: 8.6 

App

Download Ollama App here.

Configuration

Ollama uses a client-server architecture. This means that Ollama consists of two programs: the Ollama server (a background process) and an Ollama client. The default Ollama chat client is not agentic, which means it cannot view or interact with local files, processes or any other information source.

To overcome this, you can use one of the built-in integrations, listed above. However, if you want to do that, you should start the Ollama server and configure it before launching the agentic Ollama client.

Configuring Ollama Chat

Ollama sets default context lengths based on your GPU’s VRAM.

  • < 24 GiB VRAM: 4,096 tokens.
  • 24–48 GiB VRAM: 32,768 tokens.
  • ≥ 48 GiB VRAM: 256,000 tokens.

You can set a global default context length for the Ollama server by setting an environment variable before starting the Ollama service. The following shows how to do that for the duration of the terminal session:

  • Linux/macOS
    $ export OLLAMA_CONTEXT_LENGTH=64000
  • Windows (PowerShell)
    PS C:\Users\Mike Slinn> $env:OLLAMA_CONTEXT_LENGTH="64000"

Running Ollama Server

OLLAMA_CONTEXT_LENGTH is an environment variable meant for the Ollama Server (the background process), not the launch client command.

In one terminal session, start the Ollama server with enough context to be useful for coding:

Shell
$ sudo systemctl stop ollama # Just to be sure
$ OLLAMA_CONTEXT_LENGTH=64000 ollama serve time=2026-04-29T08:35:38.249-04:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:64000 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/mslinn/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-04-29T08:35:38.249-04:00 level=INFO source=routes.go:1754 msg="Ollama cloud disabled: false" time=2026-04-29T08:35:38.250-04:00 level=INFO source=images.go:517 msg="total blobs: 0" time=2026-04-29T08:35:38.250-04:00 level=INFO source=images.go:524 msg="total unused blobs removed: 0" time=2026-04-29T08:35:38.252-04:00 level=INFO source=routes.go:1810 msg="Listening on [::]:11434 (version 0.21.2)" time=2026-04-29T08:35:38.254-04:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-29T08:35:38.256-04:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 40761" time=2026-04-29T08:35:42.990-04:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33449" time=2026-04-29T08:35:47.959-04:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-04-29T08:35:47.959-04:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="61.6 GiB" available="50.4 GiB" time=2026-04-29T08:35:47.959-04:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096

Running Agentic Clients

If the Ollama server is not running as a background service, but is running in a terminal session, then Ollama clients must be run in other terminal sessions.

Claude

Shell
$ ollama launch claude --model qwen3.6:35b
pulling manifest 

At this point, the first teminal session should show new log output as each part of the model is downloaded:

First terminal session output
time=2026-04-29T08:35:47.959-04:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="61.6 GiB" available="50.4 GiB"
time=2026-04-29T08:35:47.959-04:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[GIN] 2026/04/29 - 08:48:05 | 200 |     164.846µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/29 - 08:48:05 | 404 |     670.229µs |       127.0.0.1 | POST     "/api/show"
time=2026-04-29T08:48:28.414-04:00 level=INFO source=download.go:179 msg="downloading f5ee307a2982 in 24 1 GB part(s)"
time=2026-04-29T08:51:59.647-04:00 level=INFO source=download.go:179 msg="downloading 5f3a3c817e78 in 1 11 KB part(s)"
time=2026-04-29T08:52:00.873-04:00 level=INFO source=download.go:179 msg="downloading 86eff881e8d2 in 1 94 B part(s)"
time=2026-04-29T08:52:02.154-04:00 level=INFO source=download.go:179 msg="downloading 5d1c86a949f7 in 1 462 B part(s)"

Moving back to the terminal session running Claude CLI under Ollama, I typed how many directories are in this project? The log output now showed:

Second terminal session output
[GIN] 2026/04/29 - 08:52:14 | 200 |         3m46s |       127.0.0.1 | POST     "/api/pull"
[GIN] 2026/04/29 - 08:52:14 | 200 |      42.869µs |       127.0.0.1 | HEAD     "/"
time=2026-04-29T08:52:36.720-04:00 level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-04-29T08:52:36.721-04:00 level=INFO source=server.go:444 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/mslinn/.ollama/models/blobs/sha256-f5ee307a2982106a6eb82b62b2c00b575c9072145a759ae4660378acda8dcf2d --port 38603"
time=2026-04-29T08:52:36.721-04:00 level=INFO source=sched.go:484 msg="system memory" total="61.6 GiB" free="50.7 GiB" free_swap="690.5 MiB"
time=2026-04-29T08:52:36.721-04:00 level=INFO source=server.go:771 msg="loading model" "model layers"=41 requested=-1
time=2026-04-29T08:52:36.737-04:00 level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-29T08:52:36.738-04:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:38603"
time=2026-04-29T08:52:36.744-04:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:64000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-29T08:52:36.879-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=1194 num_key_values=57
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2026-04-29T08:52:36.891-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-04-29T08:52:37.729-04:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:64000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:64000 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=ggml.go:494 msg="offloaded 0/41 layers to GPU"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="22.3 GiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="2.8 GiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="621.7 MiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=device.go:272 msg="total memory" size="25.7 GiB"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-29T08:52:39.276-04:00 level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-04-29T08:52:39.276-04:00 level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
time=2026-04-29T08:52:47.827-04:00 level=INFO source=server.go:1402 msg="llama runner started in 11.11 seconds"
[GIN] 2026/04/29 - 08:54:07 | 200 |         1m31s |       127.0.0.1 | POST     "/v1/messages?beta=true"

Hermes Agent

Hermes Agent did not work properly with qwen3.6:35b, but it worked fine with deepseek-v4-pro:cloud.

Shell
$ ollama launch hermes --model qwen3.6:35b


Installing Hermes...

┌─────────────────────────────────────────────────────────┐ │ ⚕ Hermes Agent Installer │ ├─────────────────────────────────────────────────────────┤ │ An open source AI agent by Nous Research. │ └─────────────────────────────────────────────────────────┘ ✓ Detected: linux (ubuntu) → Checking for uv package manager... ✓ uv found (uv 0.9.18) → Checking Python 3.11... ✓ Python found: Python 3.11.14 → Checking Git... ✓ Git 2.51.2 found → Checking Node.js (for browser tools)... ✓ Node.js v25.9.0 found → Checking ripgrep (fast file search)... → Checking ffmpeg (TTS voice messages)... ✓ ffmpeg 7.1.1-1ubuntu4.2 found → Installing ripgrep... Installing: ripgrep Summary: Upgrading: 0, Installing: 1, Removing: 0, Not Upgrading: 23 Download size: 1,521 kB Space needed: 5,492 kB / 613 GB available Get:1 http://archive.ubuntu.com/ubuntu questing/universe amd64 ripgrep amd64 14.1.1-1 [1,521 kB] Fetched 1,521 kB in 0s (3,947 kB/s) Selecting previously unselected package ripgrep. (Reading database ... 534474 files and directories currently installed.) Preparing to unpack .../ripgrep_14.1.1-1_amd64.deb ... Unpacking ripgrep (14.1.1-1) ... Setting up ripgrep (14.1.1-1) ... Processing triggers for man-db (2.13.1-1) ... ✓ ripgrep installed → Installing to /home/mslinn/.hermes/hermes-agent... → Trying SSH clone... ✓ Cloned via SSH ✓ Repository ready → Creating virtual environment with Python 3.11... Using CPython 3.11.14 Creating virtual environment at: venv Activate with: source venv/bin/activate ✓ Virtual environment ready (Python 3.11) → Installing dependencies... ✓ Main package installed ✓ All dependencies installed → Installing Node.js dependencies (browser tools)... ✅ Browser tools ready. Run: python run_agent.py --help ✓ Node.js dependencies installed → Installing browser engine (Playwright Chromium)... → Playwright may request sudo to install browser system dependencies (shared libraries). → This is standard Playwright setup — Hermes itself does not require root access. Installing dependencies... Switching to root user to install dependencies... Hit:1 http://archive.ubuntu.com/ubuntu questing-updates InRelease Hit:2 http://archive.ubuntu.com/ubuntu questing-backports InRelease Hit:3 http://archive.ubuntu.com/ubuntu questing InRelease Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64 InRelease Hit:5 https://apt.llvm.org/questing llvm-toolchain-questing-22 InRelease Hit:6 http://security.ubuntu.com/ubuntu questing-security InRelease Hit:7 https://ppa.launchpadcontent.net/longsleep/golang-backports/ubuntu questing InRelease Reading package lists... Done Reading package lists... Done Building dependency tree... Done Reading state information... Done libasound2t64 is already the newest version (1.2.14-1ubuntu1.1). libasound2t64 set to manually installed. libatk-bridge2.0-0t64 is already the newest version (2.57.1-1). libatk-bridge2.0-0t64 set to manually installed. libatk1.0-0t64 is already the newest version (2.57.1-1). libatk1.0-0t64 set to manually installed. libatspi2.0-0t64 is already the newest version (2.57.1-1). libatspi2.0-0t64 set to manually installed. libcairo2 is already the newest version (1.18.4-1build1). libcairo2 set to manually installed. libcups2t64 is already the newest version (2.4.12-0ubuntu3.5). libcups2t64 set to manually installed. libdbus-1-3 is already the newest version (1.16.2-2ubuntu2). libdbus-1-3 set to manually installed. libdrm2 is already the newest version (2.4.125-1ubuntu0.1). libdrm2 set to manually installed. libgbm1 is already the newest version (25.2.8-0ubuntu0.25.10.1). libgbm1 set to manually installed. libglib2.0-0t64 is already the newest version (2.86.0-2ubuntu0.3). libglib2.0-0t64 set to manually installed. libnspr4 is already the newest version (2:4.36-1ubuntu2). libnspr4 set to manually installed. libnss3 is already the newest version (2:3.114-1ubuntu0.1). libnss3 set to manually installed. libpango-1.0-0 is already the newest version (1.56.3-1build1). libpango-1.0-0 set to manually installed. libx11-6 is already the newest version (2:1.8.12-1build1). libx11-6 set to manually installed. libxcb1 is already the newest version (1.17.0-2build1). libxcb1 set to manually installed. libxcomposite1 is already the newest version (1:0.4.6-1). libxcomposite1 set to manually installed. libxdamage1 is already the newest version (1:1.1.6-1build1). libxdamage1 set to manually installed. libxext6 is already the newest version (2:1.3.4-1build2). libxext6 set to manually installed. libxfixes3 is already the newest version (1:6.0.0-2build1). libxfixes3 set to manually installed. libxkbcommon0 is already the newest version (1.7.0-2.1). libxkbcommon0 set to manually installed. libxrandr2 is already the newest version (2:1.5.4-1). libxrandr2 set to manually installed. xvfb is already the newest version (2:21.1.18-1ubuntu1.1). fonts-noto-color-emoji is already the newest version (2.048-1). fonts-noto-color-emoji set to manually installed. libfontconfig1 is already the newest version (2.15.0-2.3ubuntu1). libfontconfig1 set to manually installed. libfreetype6 is already the newest version (2.13.3+dfsg-1ubuntu0.1). libfreetype6 set to manually installed. xfonts-scalable is already the newest version (1:1.0.3-1.3). xfonts-scalable set to manually installed. fonts-liberation is already the newest version (1:2.1.5-3). fonts-liberation set to manually installed. fonts-freefont-ttf is already the newest version (20211204+svn4273-2). fonts-freefont-ttf set to manually installed. Solving dependencies... Done Recommended packages: fonts-ipafont-mincho fonts-tlwg-loma The following NEW packages will be installed: fonts-ipafont-gothic fonts-tlwg-loma-otf fonts-unifont fonts-wqy-zenhei xfonts-cyrillic 0 upgraded, 5 newly installed, 0 to remove and 23 not upgraded. Need to get 14.8 MB of archives. After this operation, 64.0 MB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-ipafont-gothic all 00303-23ubuntu1 [3,703 kB] Get:2 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-tlwg-loma-otf all 1:0.7.3-1 [107 kB] Get:3 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-unifont all 1:16.0.04-1 [3,169 kB] Get:4 http://archive.ubuntu.com/ubuntu questing/universe amd64 fonts-wqy-zenhei all 0.9.45-8 [7,472 kB] Get:5 http://archive.ubuntu.com/ubuntu questing/universe amd64 xfonts-cyrillic all 1:1.0.5+nmu1 [384 kB] Fetched 14.8 MB in 1s (23.0 MB/s) Selecting previously unselected package fonts-ipafont-gothic. (Reading database ... 534483 files and directories currently installed.) Preparing to unpack .../fonts-ipafont-gothic_00303-23ubuntu1_all.deb ... Unpacking fonts-ipafont-gothic (00303-23ubuntu1) ... Selecting previously unselected package fonts-tlwg-loma-otf. Preparing to unpack .../fonts-tlwg-loma-otf_1%3a0.7.3-1_all.deb ... Unpacking fonts-tlwg-loma-otf (1:0.7.3-1) ... Selecting previously unselected package fonts-unifont. Preparing to unpack .../fonts-unifont_1%3a16.0.04-1_all.deb ... Unpacking fonts-unifont (1:16.0.04-1) ... Selecting previously unselected package fonts-wqy-zenhei. Preparing to unpack .../fonts-wqy-zenhei_0.9.45-8_all.deb ... Unpacking fonts-wqy-zenhei (0.9.45-8) ... Selecting previously unselected package xfonts-cyrillic. Preparing to unpack .../xfonts-cyrillic_1%3a1.0.5+nmu1_all.deb ... Unpacking xfonts-cyrillic (1:1.0.5+nmu1) ... Setting up fonts-wqy-zenhei (0.9.45-8) ... Setting up fonts-tlwg-loma-otf (1:0.7.3-1) ... Setting up fonts-ipafont-gothic (00303-23ubuntu1) ... update-alternatives: using /usr/share/fonts/opentype/ipafont-gothic/ipag.ttf to provide /usr/share/fonts/truetype/fonts-japanese-gothic.ttf (fonts-japanese-gothic.ttf) in auto mode Setting up xfonts-cyrillic (1:1.0.5+nmu1) ... Setting up fonts-unifont (1:16.0.04-1) ... Processing triggers for fontconfig (2.15.0-2.3ubuntu1) ... BEWARE: your OS is not officially supported by Playwright; downloading fallback build for ubuntu24.04-x64. Downloading Chrome for Testing 147.0.7727.15 (playwright chromium v1217) from https://cdn.playwright.dev/builds/cft/147.0.7727.15/linux64/chrome-linux64.zip Chrome for Testing 147.0.7727.15 (playwright chromium v1217) downloaded to /home/mslinn/.cache/ms-playwright/chromium-1217 BEWARE: your OS is not officially supported by Playwright; downloading fallback build for ubuntu24.04-x64. Downloading FFmpeg (playwright ffmpeg v1011) from https://cdn.playwright.dev/dbazure/download/playwright/builds/ffmpeg/1011/ffmpeg-linux.zip FFmpeg (playwright ffmpeg v1011) downloaded to /home/mslinn/.cache/ms-playwright/ffmpeg-1011 BEWARE: your OS is not officially supported by Playwright; downloading fallback build for ubuntu24.04-x64. Downloading Chrome Headless Shell 147.0.7727.15 (playwright chromium-headless-shell v1217) from https://cdn.playwright.dev/builds/cft/147.0.7727.15/linux64/chrome-headless-shell-linux64.zip Chrome Headless Shell 147.0.7727.15 (playwright chromium-headless-shell v1217) downloaded to /home/mslinn/.cache/ms-playwright/chromium_headless_shell-1217 ✓ Browser engine setup complete → Installing TUI dependencies... ⠏⠉ ⠉⢹ ██╗ ██╗███╗ ██╗██╗ ██████╗ ██████╗ ██████╗ ███████╗ ██║ ██║████╗ ██║██║██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██║ ██║██╔██╗ ██║██║██║ ██║ ██║██║ ██║█████╗ ██║ ██║██║╚██╗██║██║██║ ██║ ██║██║ ██║██╔══╝ ╚██████╔╝██║ ╚████║██║╚██████╗╚██████╔╝██████╔╝███████╗ ╚═════╝ ╚═╝ ╚═══╝╚═╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝ BRAILLE ANIMATIONS ⠧⠀ braille ⠀⠀⢸⡇ scan ⠂⠌⡠⠐ rain ⣀⠀ orbit ⢾⣉⡷⠀ pulse ⠊⡰⡡⡘ sparkle ⡑⠀ breathe ⠀⢀⡴⠋ cascade ⠙⠢⣄⣠ waverows ⠀⠛ snake ⣿⡇⠀⠀ columns ⣉⡱⣉⡱ helix ⣿⣿ fillsweep ⠓⠓⠓⠀ scanline ⠠⠐⠈⠁ braillewave ⣿⡿ diagswipe ⡪⡪⡪⠀ checkerboard ⠉⠙⠚⠒ dna npx unicode-animations demo all spinners npx unicode-animations --list list all spinners npx unicode-animations --web open in browser ⣇⣀ ⣀⣸ ✓ TUI dependencies installed → Setting up hermes command... ✓ Symlinked hermes → ~/.local/bin/hermes → ~/.local/bin already on PATH ✓ hermes command ready → Setting up configuration files... ✓ Created ~/.hermes/.env from template ✓ Created ~/.hermes/config.yaml from template ✓ Created ~/.hermes/SOUL.md (edit to customize personality) ✓ Configuration directory ready: ~/.hermes/ → Syncing bundled skills to ~/.hermes/skills/ ... Syncing bundled skills into ~/.hermes/skills/ ... + claude-code + codex + opencode + hermes-agent + jupyter-live-kernel + evaluating-llms-harness + weights-and-biases + audiocraft-audio-generation + segment-anything-model + dspy + axolotl + unsloth + fine-tuning-with-trl + huggingface-hub + obliteratus + serving-llms-vllm + outlines + llama-cpp + minecraft-modpack-server + pokemon-player + xurl + godmode + kanban-worker + webhook-subscriptions + kanban-orchestrator + openhue + himalaya + github-repo-management + github-pr-workflow + github-auth + github-code-review + codebase-inspection + github-issues + yuanbao + native-mcp + research-paper-writing + llm-wiki + arxiv + blogwatcher + polymarket + obsidian + findmy + apple-reminders + apple-notes + imessage + humanizer + claude-design + architecture-diagram + ascii-video + touchdesigner-mcp + pretext + manim-video + design-md + comfyui + baoyu-comic + ascii-art + sketch + ideation + p5js + popular-web-designs + songwriting-and-ai-music + baoyu-infographic + pixel-art + excalidraw + writing-plans + hermes-agent-skill-authoring + systematic-debugging + python-debugpy + subagent-driven-development + debugging-hermes-tui-commands + test-driven-development + node-inspect-debugger + requesting-code-review + plan + spike + dogfood + heartmula + youtube-content + songsee + spotify + gif-search + ocr-and-documents + notion + maps + powerpoint + google-workspace + linear + airtable + nano-pdf Done: 89 new, 0 updated, 0 unchanged. 89 total bundled. ✓ Skills synced to ~/.hermes/skills/ → Skipping setup wizard (--skip-setup) ┌─────────────────────────────────────────────────────────┐ │ ✓ Installation Complete! │ └─────────────────────────────────────────────────────────┘ 📁 Your files: Config: /home/mslinn/.hermes/config.yaml API Keys: /home/mslinn/.hermes/.env Data: /home/mslinn/.hermes/cron/, sessions/, logs/ Code: /home/mslinn/.hermes/hermes-agent ───────────────────────────────────────────────────────── 🚀 Commands: hermes Start chatting hermes setup Configure API keys & settings hermes config View/edit configuration hermes config edit Open config in editor hermes gateway install Install gateway service (messaging + cron) hermes update Update to latest version ───────────────────────────────────────────────────────── ⚡ Reload your shell to use 'hermes' command: source ~/.bashrc Hermes installed successfully This will modify your Hermes Agent configuration: /home/mslinn/.hermes/config.yaml Backups will be saved to /tmp/ollama-backups/ Hermes can message you on Telegram, Discord, Slack, and more. ██╗ ██╗███████╗██████╗ ███╗ ███╗███████╗███████╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗ ██║ ██║██╔════╝██╔══██╗████╗ ████║██╔════╝██╔════╝ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝ ███████║█████╗ ██████╔╝██╔████╔██║█████╗ ███████╗█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║ ██╔══██║██╔══╝ ██╔══██╗██║╚██╔╝██║██╔══╝ ╚════██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║ ██║ ██║███████╗██║ ██║██║ ╚═╝ ██║███████╗███████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚══════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝ ╭────────────────────────────────────── Hermes Agent v0.12.0 (2026.4.30) · upstream bbbce926 ──────────────────────────────────────╮ │ Available Tools │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⡀⠀⣀⣀⠀⢀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ browser: browser_back, browser_click, ... │ │ ⠀⠀⠀⠀⠀⠀⢀⣠⣴⣾⣿⣿⣇⠸⣿⣿⠇⣸⣿⣿⣷⣦⣄⡀⠀⠀⠀⠀⠀⠀ browser-cdp: browser_cdp, browser_dialog │ │ ⠀⢀⣠⣴⣶⠿⠋⣩⡿⣿⡿⠻⣿⡇⢠⡄⢸⣿⠟⢿⣿⢿⣍⠙⠿⣶⣦⣄⡀⠀ clarify: clarify │ │ ⠀⠀⠉⠉⠁⠶⠟⠋⠀⠉⠀⢀⣈⣁⡈⢁⣈⣁⡀⠀⠉⠀⠙⠻⠶⠈⠉⠉⠀⠀ code_execution: execute_code │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣴⣿⡿⠛⢁⡈⠛⢿⣿⣦⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ cronjob: cronjob │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠿⣿⣦⣤⣈⠁⢠⣴⣿⠿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ delegation: delegate_task │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠻⢿⣿⣦⡉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ discord: discord │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢷⣦⣈⠛⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ discord_admin: discord_admin │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⣴⠦⠈⠙⠿⣦⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ (and 17 more toolsets...) │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⣿⣤⡈⠁⢤⣿⠇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠛⠷⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ Available Skills │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠑⢶⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ autonomous-ai-agents: claude-code, codex, hermes-agent, opencode │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⠁⢰⡆⠈⡿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ creative: architecture-diagram, ascii-art, ascii-video, b... │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⠈⣡⠞⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ data-science: jupyter-live-kernel │ │ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ devops: kanban-orchestrator, kanban-worker, webhook-sub... │ │ email: himalaya │ │ qwen3.6:35b · Nous Research gaming: minecraft-modpack-server, pokemon-player │ │ /mnt/_/www/www.mslinn.com general: dogfood, yuanbao │ │ Session: 20260430_181150_be04df github: codebase-inspection, github-auth, github-code-r... │ │ mcp: native-mcp │ │ media: gif-search, heartmula, songsee, spotify, youtub... │ │ mlops: audiocraft-audio-generation, axolotl, dspy, eva... │ │ note-taking: obsidian │ │ productivity: airtable, google-workspace, linear, maps, nano-... │ │ red-teaming: godmode │ │ research: arxiv, blogwatcher, llm-wiki, polymarket, resea... │ │ smart-home: openhue │ │ social-media: xurl │ │ software-development: debugging-hermes-tui-commands, hermes-agent-ski... │ │ │ │ 28 tools · 85 skills · /help for commands │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ Welcome to Hermes Agent! Type your message or /help for commands. ✦ Tip: hermes mcp serve runs Hermes itself as an MCP server for other agents. ⚠ tirith security scanner enabled but not available — command scanning will use pattern matching only 💾 curator: auto: no changes; llm: skipped (no candidates) ⚕ qwen3.6:35b │ ctx -- │ [░░░░░░░░░░] -- │ 3s │ ⏲ 0s ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ❯ ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

When I asked my usual question, some superfluous characaters appeared. Later I realized whomever or whatever had written this code thought that visual noise makes for a better user experience.

Hermes-Agent Session (continued)
────────────────────────────────────────
● what os am i running
Initializing agent...

────────────────────────────────────────
ヽ(>∀<☆)☆ reasoning...

It took 3:05 for Hermes to decide to run the following strange code:

Hermes-influenced QWEN3
uname -a && cat /etc/os-release 2>/dev/null || echo "Not Linux or /etc/os-release missing"


python
import platform
print("System:", platform.system())
print("Release:", platform.release())

No output was ever shown.

Life is short. LLMs are moving fast. I decided to use Pi because it worked.

OpenClaw

OpenClaw is early-stage agentic technology for personal assistants.

Running With Scissors

I am not comfortable with the idea of running OpenClaw on any of my computers, or any VM that can authenticate on my behalf. You have been warned!

You can use OpenClaw with the Ollama CLI client and the model of your choice.

Invoking Ollama with OpenClaw does not add the model to the Ollama registry.

You should update your installed version of Node.js before proceeding further.

Shell
$ nvm install node

Here is an example of how to use OpenClaw with MiniMax-M2 v2.7 Cloud:

Shell
$ ollama launch openclaw --model minimax-m2.7:cloud
Installing OpenClaw...
npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead
added 539 packages in 1m OpenClaw installed successfully

Launching OpenClaw with minimax-m2.7:cloud...
Security
OpenClaw can read files and run actions when tools are enabled. A bad prompt can trick it into doing unsafe things.
Learn more: https://docs.openclaw.ai/gateway/security
I understand the risks. Continue?
Yes No
Setting up OpenClaw with Ollama... Model: minimax-m2.7:cloud
🦞 OpenClaw 2026.3.13 (61d171a) — Give me a workspace and I'll give you fewer tabs, fewer toggles, and more oxygen.
Default Ollama model: minimax-m2.7:cloud Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 533707073495c347426fa957f78981a4f45bc038571ff141285f3846365a1d2c -> 0d8b0f65f3e6fea408d7af2a08f62b9e3be00d4fefb1ea9304f7be3d90a9dc2f, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Updated ~/.openclaw/openclaw.json Workspace OK: ~/.openclaw/workspace Sessions OK: ~/.openclaw/agents/main/sessions System Node 20.19.4 at /usr/bin/node is below the required Node 22.16+. Using /home/mslinn/.nvm/versions/node/v25.8.1/bin/node for the daemon. Install Node 24 (recommended) or Node 22 LTS from nodejs.org or Homebrew.
Installed systemd service: /home/mslinn/.config/systemd/user/openclaw-gateway.service Enabled systemd lingering for mslinn
Agents: main (default) Heartbeat interval: 30m (main) Session store (main): /home/mslinn/.openclaw/agents/main/sessions/sessions.json (0 entries) Tip: run `openclaw configure --section web` to store your Brave API key for web_search. Docs: https://docs.openclaw.ai/tools/web ✓ Installed web search plugin
Starting your assistant — this may take a moment...
Starting gateway...
✓ OpenClaw is running
Open the Web UI: http://localhost:18789/#token=1794b10a3af3ffca97b27a1273e79d36b94a2bd18494fa55
Quick start: /help see all commands openclaw configure --section channels connect WhatsApp, Telegram, etc. openclaw skills browse and install skills
The OpenClaw gateway is running in the background. Stop it with: openclaw gateway stop

🦞 OpenClaw 2026.3.13 (61d171a) — I don't just autocomplete—I auto-commit (emotionally), then ask you to review (logically) openclaw tui - ws://127.0.0.1:18789 - agent main - session main
session agent:main:main

Wake up, my friend!

Hey! Good morning. ☀️
Looks like I'm fresh out of the box — no memories, no name, nothing yet. Just woke up.
So... who are you? And what should I call myself? I'm open to suggestions. 😄 connected | idle agent main | session main (openclaw-tui) | ollama/minimax-m2.7:cloud | tokens 11k/128k (9%)

When you are finished working with OpenClaw, type CTRL+D to end the session, then type the following to terminate the OpenClaw background process:

Shell
$ openclaw gateway stop
🦞 OpenClaw 2026.3.13 (61d171a) — I can run local, remote, or purely on vibes—results may vary with DNS.
Stopped systemd service: openclaw-gateway.service

Pi

Ollama automatically installed Pi the first time I tried to use it, and it gave me the correct results.

Shell
$ ollama launch pi --model qwen3.6:35b

Installing Pi... npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead added 256 packages in 7s Pi installed successfully Preparing Pi... Checking Pi installation... Checking Pi web search package... Installing @ollama/pi-web-search... Installing npm:@ollama/pi-web-search... added 1 package in 801ms Installed npm:@ollama/pi-web-search ✓ Installed @ollama/pi-web-search Launching Pi... fd not found. Downloading... ripgrep not found. Downloading... ripgrep installed to /home/mslinn/.pi/agent/bin/rg fd installed to /home/mslinn/.pi/agent/bin/fd pi v0.70.6 escape interrupt · ctrl+c/ctrl+d clear/exit · / commands · ! bash · ctrl+o more Press ctrl+o to show full startup help and loaded resources. Pi can explain its own features and look up its docs. Ask it how to use or extend Pi. [Skills] microsoft-foundry [Extensions] @ollama/pi-web-search ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── /mnt/_/www/www.mslinn.com (master) 0.0%/262k (auto) qwen3.6:35b • medium

Basic system state interrogation worked immediately:

Pi / QWEN3.6:35b session (continued)
what os am i running
The user wants to know what operating system they are running. I can use the uname -a command to get system information on a Unix-like system. I can also check /etc/os-release if we are on Linux.
Let's run uname -a to see the system type and then maybe cat /etc/os-release or lsb_release -a for more details if it's Linux. Actually, uname -a generally tells us if it's Linux, macOS, BSD, etc.
I'll start with uname -a.


$ uname -a
Linux gojira 6.17.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 13 12:04:44 UTC 2026 x86_64 GNU/Linux
Took 0.0s

The user is running Ubuntu Linux (indicated by Linux gojira 6.17.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC...). I can also check lsb_release -a or /etc/os-release to get the exact version, but "Ubuntu" is usually sufficient. I'll provide the uname -a output details.
You are running Ubuntu Linux.
Here are the details from your system: - OS: Linux (specifically Ubuntu, as the kernel was built with Ubuntu patches) - Kernel: 6.17.0-22-generic - Architecture: x86_64 (64-bit) - Hostname: gojira

I would like to see a timer to show how long prompts take to complete. Most other harness have this feature.

I then type /settings hide thinking and repeated the previous prompt. The result was the same, except that because I did not see the tests run by the harness on behalf of the model I had to wait for the end of the entire process. This took a lot longer than noticing the result of the first tool that ran and interrupting the command.

Models

Ollama models can be installed locally or run from the cloud. I wrote an article that attempts to track the most popular local models for coding. Only some of those models are Ollama-compatible.

Ollama uses models on demand; by default Ollama models are automatically unloaded after 5 minutes if no queries are active. That means you do not have to restart the ollama service after installing a new model or removing an existing model.

Certain cloud-based LLMs are Ollama compatible, so you need not worry about their model formats.

Inspecting a Model

To view the parameters of a registered model, use the ollama show command:

Shell
$ ollama show deepseek-r1:8b
Model
    architecture        qwen3
    parameters          8.2B
    context length      131072
    embedding length    4096
    quantization        Q4_K_M
Capabilities completion thinking
Parameters stop "<|begin?of?sentence|>" stop "<|end?of?sentence|>" stop "<|User|>" stop "<|Assistant|>" temperature 0.6 top_p 0.95
License MIT License Copyright (c) 2023 DeepSeek ...

Much less information is shown for cloud models than for local models.

You can filter the output to just display the quantization:

Shell
$ ollama show deepseek-r1:8b | grep quantization
quantization Q4_K_M 

Local Models

By default, local Ollama models are downloaded into these directories:

  • Linux: /usr/share/ollama/.ollama/models
  • macOS: ~/.ollama/models

The Ollama library has many models available for download.

After you have downloaded a model using ollama pull or ollama run, the model is added to the local Ollama registry. The ollama list command shows you the registered Ollama models.

Invoking Ollama with OpenClaw does not add the model to the Ollama registry.

Cloud Models

Cloud models are easier to set up than local models, and they can be used with any computer because the cloud processing is not performed on the local machine.

After running a cloud model using ollama run or installing via ollama pull, the model is added to the local Ollama registry. The ollama list command shows you the registered Ollama models.

Ollama can run models via its traditional chat mode, or also via Claude Code, Hermes Agent, and OpenClaw. I only show the commands for each mode once, using DeepSeek as the example. You can use similar commands to run all other Ollama-compatible models.

Cloud models require you to sign in with an Ollama ID first:

Shell
$ ollama signin

Usage on the Ollama free plan resets every 3 hours as well as weekly. View your usage here.

Deepseek-v4-pro

DeepSeek-V4-Pro is the flagship model in the DeepSeek-V4 series — a Mixture-of-Experts model with 1.6T total parameters and 49B activated, built for frontier-level reasoning across a 1M-token context window.
Via Chat Mode

To run deepseek-v4-pro in the cloud via chat mode, type:

Shell
$ ollama run deepseek-v4-pro:cloud
Connecting to 'deepseek-v4-pro:cloud' on 'ollama.com' ⚡
>>> Send a message (/? for help) 
$ ollama show deepseek-v4-pro:cloud   Model architecture deepseek4 parameters 158000000000 context length 1048576 embedding length 4096 quantization FP8 Capabilities completion tools thinking
With Claude Code

To use DeepSeek-V4-Pro with Claude Code, run:

Shell
$ ollama launch claude --model deepseek-v4-pro:cloud
With OpenClaw

For use with OpenClaw, type the following. OpenClaw will be installed if is not already. I had to retry to make this thing go, and even then it looked like some fiddling would be required:

Shell
$ ollama launch openclaw --model deepseek-v4-pro:cloud
Installing OpenClaw...
npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead

added 434 packages in 1m
OpenClaw installed successfully

This will modify your OpenClaw configuration:
/home/mslinn/.openclaw/openclaw.json
Backups will be saved to /tmp/ollama-backups/


Your assistant can message you on WhatsApp, Telegram, Discord, and more.


Starting your assistant — this may take a moment...

  Warning: daemon restart failed: exit status 1
  Warning: gateway did not come back after restart
Starting gateway...
Error: gateway did not start on localhost:18789 
$ ollama launch openclaw --model deepseek-v4-pro:cloud Security OpenClaw can read files and run actions when tools are enabled. A bad prompt can trick it into doing unsafe things. Learn more: https://docs.openclaw.ai/gateway/security Updating OpenClaw... │ ◇ ✓ Updating via package manager (32.63s) │ ◇ ✓ Running doctor checks (30.24s) Update Result: OK Root: /home/mslinn/.nvm/versions/node/v25.8.2/lib/node_modules/openclaw Before: 2026.4.26 After: 2026.4.26 Total time: 65.28s Updating plugins... Downloading @ollama/openclaw-web-search… Extracting /tmp/openclaw-npm-pack-1dYItP/ollama-openclaw-web-search-0.2.2.tgz… Installing to /home/mslinn/.openclaw/extensions/openclaw-web-search… Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 85b674df87782e7e73fb5cfde53c5036a7fdf32eb8f34aa47b386e426cce2ff0 -> 757aba9cf42030c2a71750a5cebfb7b9cdbcedff3cdcb37179fb4c502d925822, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Config auto-restored from backup: /home/mslinn/.openclaw/openclaw.json (size-drop-vs-last-good:7398->820, gateway-mode-missing-vs-last-good) Config observe anomaly: /home/mslinn/.openclaw/openclaw.json (size-drop-vs-last-good:7398->715, missing-meta-vs-last-good, gateway-mode-missing-vs-last-good) npm plugins: 0 updated, 1 unchanged. Completion cache update failed: Error: spawnSync /home/mslinn/.nvm/versions/node/v25.8.2/bin/node ETIMEDOUT Restarting service... Gateway did not become healthy after restart. Gateway version mismatch: expected 2026.4.26, running gateway reported unavailable. Service runtime: status=running, state=active, pid=1625547, lastExit=0 Gateway port 18789 status: free. Restart log: /home/mslinn/.openclaw/logs/gateway-restart.log Run `openclaw gateway status --deep` for details. Setting up OpenClaw with Ollama... Model: deepseek-v4-pro:cloud 🦞 OpenClaw 2026.4.26 (be8c246) — Your config is valid, your assumptions are not. Default Ollama model: deepseek-v4-pro:cloud Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 85b674df87782e7e73fb5cfde53c5036a7fdf32eb8f34aa47b386e426cce2ff0 -> d81f9a372cb185254e060580c65b6f0619ef60b70d8086d2fc189442b3eb7452, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Updated ~/.openclaw/openclaw.json Workspace OK: ~/.openclaw/workspace Sessions OK: ~/.openclaw/agents/main/sessions System Node 20.19.5 at /usr/bin/node is below the required Node 22.14+. Using /home/mslinn/.nvm/versions/node/v25.8.2/bin/node for the daemon. Install Node 24 (recommended) or Node 22 LTS from nodejs.org or Homebrew. Installed systemd service: /home/mslinn/.config/systemd/user/openclaw-gateway.service Previous unit backed up to: /home/mslinn/.config/systemd/user/openclaw-gateway.service.bak Tip: run `openclaw configure --section web` to store your Brave API key for web_search. Docs: https://docs.openclaw.ai/tools/web Your assistant can message you on WhatsApp, Telegram, Discord, and more. Connect a channel (messaging app) now? Yes Set up later Starting your assistant — this may take a moment... Starting gateway... ✓ OpenClaw is running Open the Web UI: http://localhost:18789/#token=fd77d96ff87a06b5d521b81bfffd152a3ebfa41d17c223f3 Quick start: /help see all commands openclaw skills browse and install skills The OpenClaw gateway is running in the background. Stop it with: openclaw gateway stop 🦞 OpenClaw 2026.4.26 (be8c246) — One CLI to rule them all, and one more restart because you changed the port. openclaw tui - ws://127.0.0.1:18789 - agent main - session main connecting | idle gateway disconnected: closed | idle
With Hermes Agent

For use with Hermes Agent, type the following. Hermes Agent will be installed if it not already present:

Shell
$ ollama launch hermes --model deepseek-v4-pro:cloud
Installing Hermes...


┌─────────────────────────────────────────────────────────┐
│             ⚕ Hermes Agent Installer                    │
├─────────────────────────────────────────────────────────┤
│  An open source AI agent by Nous Research.              │
└─────────────────────────────────────────────────────────┘

✓ Detected: linux (ubuntu)
→ Checking for uv package manager...
✓ uv found (uv 0.8.15)
→ Checking Python 3.11...
→ Python 3.11 not found, installing via uv...
Installed Python 3.11.13 in 1.23s
 + cpython-3.11.13-linux-x86_64-gnu (python3.11)
✓ Python installed: Python 3.11.13
→ Checking Git...
✓ Git 2.51.0 found
→ Checking Node.js (for browser tools)...
✓ Node.js v25.8.2 found
→ Checking ripgrep (fast file search)...
→ Checking ffmpeg (TTS voice messages)...
✓ ffmpeg 7.1.1-1ubuntu4.2 found
→ Installing ripgrep...
Installing:
  ripgrep

Summary:
  Upgrading: 0, Installing: 1, Removing: 0, Not Upgrading: 32
  Download size: 1521 kB
  Space needed: 5492 kB / 47.9 GB available

Get:1 http://archive.ubuntu.com/ubuntu questing/universe amd64 ripgrep amd64 14.1.1-1 [1521 kB]
Fetched 1521 kB in 0s (8189 kB/s)
Selecting previously unselected package ripgrep.
(Reading database ... 487883 files and directories currently installed.)
Preparing to unpack .../ripgrep_14.1.1-1_amd64.deb ...
Unpacking ripgrep (14.1.1-1) ...
Setting up ripgrep (14.1.1-1) ...
Processing triggers for man-db (2.13.1-1) ...
Scanning processes...
Scanning candidates...

Restarting services...

Service restarts being deferred:
 systemctl restart NetworkManager.service
 /etc/needrestart/restart.d/dbus.service
 systemctl restart gdm.service
 systemctl restart systemd-logind.service
 systemctl restart unattended-upgrades.service
 systemctl restart wpa_supplicant.service

No containers need to be restarted.

User sessions running outdated binaries:
 mslinn @ user manager: (sd-pam)[1364]
 mslinn @ user service: at-spi-dbus-bus.service[46449,46475], dbus.service[1490,46429], filter-chain.service[1494],
  gnome-keyring-daemon.service[46430], mpris-proxy.service[1538], pipewire-pulse.service[1542], pipewire.service[1492],
  wireplumber.service[1540]

No VM guests are running outdated hypervisor (qemu) binaries on this host.
✓ ripgrep installed
→ Installing to /home/mslinn/.hermes/hermes-agent...
→ Trying SSH clone...
✓ Cloned via SSH
✓ Repository ready
→ Creating virtual environment with Python 3.11...
warning: Failed to parse `pyproject.toml` during settings discovery:
  TOML parse error at line 172, column 17
      |
  172 | exclude-newer = "7 days"
      |                 ^^^^^^^^
  failed to parse year in date "7 days": failed to parse "7 da" as year (a four digit integer): invalid digit, expected 0-9 but got

Using CPython 3.11.13
Creating virtual environment at: venv
Activate with: source venv/bin/activate
✓ Virtual environment ready (Python 3.11)
→ Installing dependencies...
✓ Main package installed
✓ All dependencies installed
→ Installing Node.js dependencies (browser tools)...
✅ Browser tools ready. Run: python run_agent.py --help
✓ Node.js dependencies installed
→ Installing browser engine (Playwright Chromium)...
→ Playwright may request sudo to install browser system dependencies (shared libraries).
→ This is standard Playwright setup — Hermes itself does not require root access.
Installing dependencies...
Switching to root user to install dependencies...
Hit:1 http://archive.ubuntu.com/ubuntu questing InRelease
Hit:2 http://archive.ubuntu.com/ubuntu questing-updates InRelease
Hit:3 http://security.ubuntu.com/ubuntu questing-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu questing-backports InRelease
Ign:5 https://apt.kitware.com/ubuntu questing InRelease
Err:6 https://apt.kitware.com/ubuntu questing Release
  404  Not Found [IP: 66.194.253.25 443]
Hit:7 https://ppa.launchpadcontent.net/longsleep/golang-backports/ubuntu questing InRelease
Reading package lists... Done
Failed to install browsers
Error: Installation process exited with code: 100
⚠ Playwright browser installation failed — browser tools will not work.
⚠ Try running manually: cd /home/mslinn/.hermes/hermes-agent && npx playwright install --with-deps chromium
✓ Browser engine setup complete
→ Installing TUI dependencies...

  ⠏⠉                                                         ⠉⢹

    ██╗   ██╗███╗   ██╗██╗ ██████╗ ██████╗ ██████╗ ███████╗
    ██║   ██║████╗  ██║██║██╔════╝██╔═══██╗██╔══██╗██╔════╝
    ██║   ██║██╔██╗ ██║██║██║     ██║   ██║██║  ██║█████╗
    ██║   ██║██║╚██╗██║██║██║     ██║   ██║██║  ██║██╔══╝
    ╚██████╔╝██║ ╚████║██║╚██████╗╚██████╔╝██████╔╝███████╗
     ╚═════╝ ╚═╝  ╚═══╝╚═╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝
    BRAILLE ANIMATIONS

    ⠧⠀ braille        ⠀⠀⢸⡇ scan           ⠂⠌⡠⠐ rain
    ⣀⠀ orbit          ⢾⣉⡷⠀ pulse          ⠊⡰⡡⡘ sparkle
    ⡑⠀ breathe        ⠀⢀⡴⠋ cascade        ⠙⠢⣄⣠ waverows
    ⠀⠛ snake          ⣿⡇⠀⠀ columns        ⣉⡱⣉⡱ helix
    ⣿⣿ fillsweep      ⠓⠓⠓⠀ scanline       ⠠⠐⠈⠁ braillewave
    ⣿⡿ diagswipe      ⡪⡪⡪⠀ checkerboard   ⠉⠙⠚⠒ dna

    npx unicode-animations                  demo all spinners
    npx unicode-animations --list           list all spinners
    npx unicode-animations --web              open in browser

  ⣇⣀                                                         ⣀⣸

✓ TUI dependencies installed
→ Setting up hermes command...
✓ Symlinked hermes → ~/.local/bin/hermes
→ ~/.local/bin already on PATH
✓ hermes command ready
→ Setting up configuration files...
✓ Created ~/.hermes/.env from template
✓ Created ~/.hermes/config.yaml from template
✓ Created ~/.hermes/SOUL.md (edit to customize personality)
✓ Configuration directory ready: ~/.hermes/
→ Syncing bundled skills to ~/.hermes/skills/ ...
Syncing bundled skills into ~/.hermes/skills/ ...
  + pokemon-player
  + minecraft-modpack-server
  + dogfood
  + llm-wiki
  + arxiv
  + polymarket
  + research-paper-writing
  + blogwatcher
  + github-issues
  + github-repo-management
  + github-code-review
  + github-pr-workflow
  + codebase-inspection
  + github-auth
  + xurl
  + himalaya
  + hermes-agent-skill-authoring
  + systematic-debugging
  + python-debugpy
  + node-inspect-debugger
  + writing-plans
  + plan
  + test-driven-development
  + requesting-code-review
  + subagent-driven-development
  + debugging-hermes-tui-commands
  + webhook-subscriptions
  + powerpoint
  + linear
  + notion
  + airtable
  + ocr-and-documents
  + maps
  + nano-pdf
  + google-workspace
  + opencode
  + hermes-agent
  + codex
  + claude-code
  + dspy
  + obliteratus
  + outlines
  + serving-llms-vllm
  + llama-cpp
  + axolotl
  + unsloth
  + fine-tuning-with-trl
  + huggingface-hub
  + audiocraft-audio-generation
  + segment-anything-model
  + weights-and-biases
  + evaluating-llms-harness
  + yuanbao
  + imessage
  + apple-reminders
  + apple-notes
  + findmy
  + claude-design
  + pixel-art
  + baoyu-infographic
  + ascii-art
  + humanizer
  + design-md
  + manim-video
  + baoyu-comic
  + touchdesigner-mcp
  + p5js
  + ascii-video
  + popular-web-designs
  + songwriting-and-ai-music
  + architecture-diagram
  + ideation
  + excalidraw
  + youtube-content
  + spotify
  + songsee
  + gif-search
  + heartmula
  + openhue
  + obsidian
  + godmode
  + jupyter-live-kernel
  + native-mcp

Done: 83 new, 0 updated, 0 unchanged. 83 total bundled.
✓ Skills synced to ~/.hermes/skills/
→ Skipping setup wizard (--skip-setup)


┌─────────────────────────────────────────────────────────┐
│              ✓ Installation Complete!                   │
└─────────────────────────────────────────────────────────┘


📁 Your files:

   Config:    /home/mslinn/.hermes/config.yaml
   API Keys:  /home/mslinn/.hermes/.env
   Data:      /home/mslinn/.hermes/cron/, sessions/, logs/
   Code:      /home/mslinn/.hermes/hermes-agent

─────────────────────────────────────────────────────────

🚀 Commands:

   hermes              Start chatting
   hermes setup        Configure API keys & settings
   hermes config       View/edit configuration
   hermes config edit  Open config in editor
   hermes gateway install Install gateway service (messaging + cron)
   hermes update       Update to latest version

─────────────────────────────────────────────────────────

⚡ Reload your shell to use 'hermes' command:

   source ~/.bashrc

Hermes installed successfully

This will modify your Hermes Agent configuration:
  /home/mslinn/.hermes/config.yaml
Backups will be saved to /tmp/ollama-backups/

Hermes can message you on Telegram, Discord, Slack, and more. 

GPT-OSS

Released in August 2025, GPT-OSS is a family of open-weight reasoning models developed by OpenAI in collaboration with partners, bringing reasoning and tool-use capabilities to self-hosted and edge environments.These models are released under the Apache 2.0 license, allowing for broad access and customization.

gpt-oss-120b requires an NVIDIA H100 or A100, which is why many users run it from the cloud.

Shell
$ ollama run gpt-oss:120b-cloud

Other commands are:

Shell
$ ollama launch claude   --model gpt-oss:120b-cloud
$ ollama launch codex    --model gpt-oss:120b-cloud
$ ollama launch droid    --model gpt-oss:120b-cloud
$ ollama launch hermes   --model gpt-oss:120b-cloud
$ ollama launch openclaw --model gpt-oss:120b-cloud
$ ollama launch opencode --model gpt-oss:120b-cloud
$ ollama launch pi       --model gpt-oss:120b-cloud

Its smaller sibling, gpt-oss-20b, is designed for edge devices and consumer hardware with 16 GB of memory, such as the NVIDIA 3060. However, it can also be run from the cloud if for you want to do that some reason:

Shell
$ ollma run gpt-oss:20b-cloud

Other commands are:

Shell
$ ollama launch claude   --model gpt-oss:20b-cloud
$ ollama launch codex    --model gpt-oss:20b-cloud
$ ollama launch droid    --model gpt-oss:20b-cloud
$ ollama launch hermes   --model gpt-oss:20b-cloud
$ ollama launch openclaw --model gpt-oss:20b-cloud
$ ollama launch opencode --model gpt-oss:20b-cloud
$ ollama launch pi       --model gpt-oss:20b-cloud

Minimax-m2.7

See MiniMax-M2 and Mini-Agent Review and MiniMax M2.7.

Shell
$ ollama run minimax-m2.7:cloud
Connecting to 'minimax-m2.7:cloud' on 'ollama.com' ⚡
>>> Send a message (/? for help) 

Other commands are:

Shell
$ ollama launch claude   --model minimax-m2.7:cloud
$ ollama launch codex    --model minimax-m2.7:cloud
$ ollama launch droid    --model minimax-m2.7:cloud
$ ollama launch hermes   --model minimax-m2.7:cloud
$ ollama launch openclaw --model minimax-m2.7:cloud
$ ollama launch opencode --model minimax-m2.7:cloud
$ ollama launch pi       --model minimax-m2.7:cloud
Shell
$ ollama show minimax-m2.7:cloud
  Model
    architecture        minimax-m2
    parameters          0
    context length      204800
    embedding length    3072
    quantization

  Capabilities
    completion
    tools
    thinking 

Nemotron 3 Super

Mar 11, 2026

The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging.

This model tackles the “context explosion” with a native 1M-token context window that gives agents long-term memory for aligned, high-accuracy reasoning. The model is fully open with open weights, datasets, and recipes so developers can easily customize, optimize, and deploy it on their own infrastructure.

Shell
$ ollama run nemotron-3-super:cloud
Connecting to 'nemotron-3-super:cloud' on 'ollama.com' ⚡
>>>
Use Ctrl + d or /bye to exit.
>>> CTRL+D
$ ollama list
NAME                                 ID              SIZE      MODIFIED
nemotron-3-super:cloud               be3943c5a818    -         6 seconds ago 

Other commands are:

Shell
$ ollama launch claude   --model nemotron-3-super:cloud
$ ollama launch codex    --model nemotron-3-super:cloud
$ ollama launch droid    --model nemotron-3-super:cloud
$ ollama launch hermes   --model nemotron-3-super:cloud
$ ollama launch openclaw --model nemotron-3-super:cloud
$ ollama launch opencode --model nemotron-3-super:cloud
$ ollama launch pi       --model nemotron-3-super:cloud

QWEN 3.x

QWEN 3.6 was released in April 2026, but the Ollama cloud models had not been updated as of 2026-05-01.

qwen3-coder:480b-cloud is a specialized heavyweight model specifically tuned for autonomous coding agents.

Shell
$ ollama launch claude --model qwen3-coder:480b-cloud
╭─── Claude Code v2.1.121 ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                    │ Tips for getting started                                                    │
│                 Welcome back Mike!                 │ Run /init to create a CLAUDE.md file with instructions for Claude           │
│                                                    │ ─────────────────────────────────────────────────────────────────────────── │
│                       ▐▛███▜▌                     │ What's new                                                                  │
│                      ▝▜█████▛▘                   │ Added `alwaysLoad` option to MCP server config — when `true`, all tools fr… │
│                        ▘▘ ▝▝                     │ Added `claude plugin prune` to remove orphaned auto-installed plugin depen… │
│    qwen3-coder:480b-cloud · API Usage Billing ·    │ Added a type-to-filter search box to `/skills` so you can find a skill in … │
│    mslinn@mslinn.com's Organization                │ /release-notes for more                                                     │
│          /mnt/f/sites/intranet.mslinn.com          │                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯ 
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  mslinn@Bear:intranet.mslinn.com [qwen3-coder:480b-cloud] 🌿 master; 0 edits; ; 0 tokens (+0) 

qwen3.5:cloud is a smarter generalist with better conversational ability.

Shell
$ ollama launch claude --model qwen3.5:cloud

Other commands are:

Shell
$ ollama launch claude   --model qwen3.5:cloud
$ ollama launch codex    --model qwen3.5:cloud
$ ollama launch droid    --model qwen3.5:cloud
$ ollama launch hermes   --model qwen3.5:cloud
$ ollama launch openclaw --model qwen3.5:cloud
$ ollama launch opencode --model qwen3.5:cloud
$ ollama launch pi       --model qwen3.5:cloud

Local Models

The Ollama default model depth is Q4 (4-bit quantized), which is faster but can be much less accurate than Q8 (8-bit quantization) models. Install Q8 versions if possible.

Installation

To install or update a model without running it, type ollama pull, followed by the name of the model.

You can install and run any LLAMA-compatible model by typing ollama run, followed by the name of the model.

To list the models registered on your computer, use the ollama list command:

Shell
$ ollama list
NAME                                 ID              SIZE      MODIFIED
deepseek-v4-pro:cloud                22bfd5026abd    -         55 minutes ago
nemotron-3-super:cloud               be3943c5a818    -         3 weeks ago
gpt-oss:120b-cloud                   569662207105    -         3 weeks ago
minimax-m2.7:cloud                   06daa293c105    -         3 weeks ago
qwen3-coder:480b-cloud               e30e45586389    -         3 weeks ago
minimax-m2.5:cloud                   c0d5751c800f    -         2 months ago
test:latest                          3b86bc070971    4.9 GB    4 months ago
fluffy/l3-8b-stheno-v3.2:latest      f1afe09480f3    4.9 GB    5 months ago
bjoernb/qwen3-coder-30b-1m:latest    fb3efb7f8d40    18 GB     5 months ago
deepseek-coder-v2:lite               63fb193b3a9b    8.9 GB    5 months ago
nomic-embed-text:latest              0a109f422b47    274 MB    5 months ago
sllm/glm-z1-9b:latest                4b2d0d3f65a6    8.6 GB    5 months ago
mistral-small3.2:latest              5a408ab55df5    15 GB     7 months ago
deepseek-r1:8b                       6995872bfe4c    5.2 GB    7 months ago
deepseek-r1:7b                       755ced02ce7b    4.7 GB    7 months ago
llama3:8b                            365c0bd3c000    4.7 GB    7 months ago
llama2-uncensored:70b                bdd0ec2f5ec5    38 GB     2 years ago 

DeepSeek

Shell
$ ollama pull deepseek-r1:8b
pulling manifest
pulling e6a7edc1a4d7: 100% ▕████████████████████████████ ▏ 5.2 GB/5.2 GB 63 MB/s 0s
pulling c5ad996bda6e: 100% ▕████████████████████████████▏ 556 B
pulling 6e4c38e1172f: 100% ▕████████████████████████████▏ 1.1 KB
pulling ed8474dc73db: 100% ▕████████████████████████████▏ 179 B
pulling f64cd5418e4b: 100% ▕████████████████████████████▏ 487 B
verifying sha256 digest
writing manifest
success 

You can also download and run in one step by typing:

Shell
$ ollama run deepseek-r1:8b

fluffy/l3-8b-stheno-v3.2

fluffy/l3-8b-stheno-v3.2 is a small, uncensored model that will even run slowly on a laptop without a powerful video card.

Shell
$ ollama run fluffy/l3-8b-stheno-v3.2

llama2-uncensored

The uncensored Llama2 70B model requires a powerful machine with lots of GPU RAM.

Shell
$ ollama pull llama2-uncensored:70b
pulling manifest
pulling abca3de387b6... 100% ▕███████████████████████████▏  38 GB
pulling 9224016baa40... 100% ▕███████████████████████████▏ 7.0 KB
pulling 1195ea171610... 100% ▕███████████████████████████▏ 4.8 KB
pulling 28577ba2177f... 100% ▕███████████████████████████▏   55 B
pulling ddaa351c1f3d... 100% ▕███████████████████████████▏   51 B
pulling 9256cd2888b0... 100% ▕███████████████████████████▏  530 B
verifying sha256 digest
writing manifest
removing any unused layers
success 

QWEN 3.x

For coding, you want a balance of reasoning depth and speed. The best models for coding with the NVIDIA 3060, which has 12GB RAM, and the NVIDIA 3060, which has 24GB RAM, as of late April 2026 are shown below.

qwen3.6:35b

This is a Mixture-of-Experts (MoE) model. Even though it has 35B total parameters, because it is an MoE, only about 10% of all parameters are active at any given moment.

This model is fast, especially on an NVIDIA 4090. An NVIDIA 3060 can provide high tokens-per-second while maintaining the intelligence of a much larger model. Because the Q4_K_M quant is about 22GB, it will spill over into system RAM, but because it is an MoE, the performance hit should be minimal.

Shell
$ ollama run qwen3.6:35b
pulling manifest
pulling f5ee307a2982: 100% ▕█████████████████████████▏  23 GB
pulling 5f3a3c817e78: 100% ▕█████████████████████████▏  11 KB
pulling 86eff881e8d2: 100% ▕█████████████████████████▏   94 B
pulling 5d1c86a949f7: 100% ▕█████████████████████████▏  462 B
verifying sha256 digest
writing manifest
success
>>> Send a message (/? for help)
/show info   Model architecture qwen35moe parameters 36.0B context length 262144 embedding length 2048 quantization Q4_K_M Capabilities completion vision tools thinking Parameters min_p 0 presence_penalty 1.5 repeat_penalty 1 temperature 1 top_k 20 top_p 0.95 License Apache License Version 2.0, January 2004 ...

I noticed that architecture above is shown as qwen35moe, not qwen36moe as expected.

Other commands are:

ollama launch claude   --model qwen3.6:35b
ollama launch codex    --model qwen3.6:35b
ollama launch droid    --model qwen3.6:35b
ollama launch hermes   --model qwen3.6:35b
ollama launch openclaw --model qwen3.6:35b
ollama launch opencode --model qwen3.6:35b
ollama launch pi       --model qwen3.6:35b
qwen3.6:27b

This model often outperforms the qwen3.6:35b version on complex logic and multi-file refactoring.

This is a dense model, meaning it uses all its parameters for every token. It is arguably the most capable coding model that can realistically run on consumer hardware.

The qwen3.6:27b provides fast performance on a GPU with 24GB VRAM, for example the NVIDIA 4090. To fit this on the NVIDIA 3060 with room for code context, use a 4-bit quantization.

Shell
$ ollama run qwen3.6:27b
pulling manifest
pulling 83c54730a5fe: 100% ▕██████████████████████████▏  17 GB
pulling 5f3a3c817e78: 100% ▕██████████████████████████▏  11 KB
pulling 86eff881e8d2: 100% ▕██████████████████████████▏   94 B
pulling 728c795c7762: 100% ▕██████████████████████████▏  456 B
verifying sha256 digest
writing manifest
success
>>> Send a message (/? for help)
/show info   Model architecture qwen35 parameters 27.8B context length 262144 embedding length 5120 quantization Q4_K_M Capabilities completion vision tools thinking Parameters top_k 20 top_p 0.95 min_p 0 presence_penalty 1.5 repeat_penalty 1 temperature 1 License Apache License Version 2.0, January 2004 ...

I noticed that architecture above is shown as qwen35, not qwen36 as expected.

Other commands are:

ollama launch claude   --model qwen3.6:27b
ollama launch codex    --model qwen3.6:27b
ollama launch droid    --model qwen3.6:27b
ollama launch hermes   --model qwen3.6:27b
ollama launch openclaw --model qwen3.6:27b
ollama launch opencode --model qwen3.6:35b
ollama launch pi       --model qwen3.6:35b

Running Queries

Ollama queries can be run in several ways:

REST API

I used curl to query the Ollama REST API from the command line, then I used jq and fold to process the response. The -s option for curl prevents the progress meter from cluttering up the screen, and the jq filter removes everything from the response except the desired text. The fold command wraps the text response to a width of 72 characters.

Shell
$ curl -s http://localhost:11434/api/generate -d '{
  "model":  "llama2:70b",
  "prompt": "Why is there air?",
  "stream": false
}' | jq -r .response | fold -w 72 -s
Air, or more specifically oxygen, is essential for life as we know it.
It exists because of the delicate balance of chemical reactions in
Earth’s atmosphere, which has allowed complex organisms like
ourselves to evolve.
But if you’re asking about air in a broader sense, it serves many functions: it helps maintain a stable climate, protects living things from harmful solar radiation, and provides buoyancy for various forms of life, such as fish or birds.

Go Binding

The official Go language bindings can be added to a Go project as follows (additional Go libraries exist):

Shell
$ mkdir /tmp/blah
$ cd /tmp/blah
$ go mod init github.com/mslinn/demo go: creating new go.mod: module github.com/mslinn/demo
$ go get github.com/ollama/ollama/api go: downloading golang.org/x/sys v0.37.0 go: added github.com/bahlo/generic-list-go v0.2.0 go: added github.com/buger/jsonparser v1.1.1 go: added github.com/google/uuid v1.6.0 go: added github.com/mailru/easyjson v0.7.7 go: added github.com/ollama/ollama v0.18.2 go: added github.com/wk8/go-ordered-map/v2 v2.1.8 go: added golang.org/x/crypto v0.43.0 go: added golang.org/x/sys v0.37.0 go: added gopkg.in/yaml.v3 v3.0.1

Ruby Binding

I wrote this Ruby method to describe images.

Ruby
def describe_image(image_filename)
  @client = Ollama.new(
    credentials: { address: @address },
    options: {
      server_sent_events: true,
      temperature: @temperature,
      connection: { request: { timeout: @timeout, read_timeout: @timeout } },
    }
  )
  result = @client.generate(
    {
      model: @model,
      prompt: 'Please describe this image.',
      images: [Base64.strict_encode64(File.read(image_filename))],
    }
  )
  puts result.map { |x| x['response'] }.join
end

The results with the llama2:70b model were ridiculous - an example of the famous hallucinations that LLMs entertain their audience with. As the public becomes enculturated with these hallucinations, we may come to prefer them over human comedians. Certainly there will be a lot of material for the human comedians to fight back with. For example, when describing a photo of me:

Shell
$ ollama pull llama2:70b
$ describe -m llama2:70b /mnt/c/bestPhotoOfMike.png This is an image of a vibrant and colorful sunrise over the ocean, with the sun peeking above the horizon, casting warm, golden hues over the sky and water below. The sunlight reflects off the rippled surface of the water, creating shimmering patterns that contrast with the tranquil darkness of the receding waters. In the foreground, a solitary figure is silhouetted against the rising sun, perhaps lost in thought or finding inspiration in the breathtaking beauty of the scene.

The llava model is supposed to be good at describing images, so I installed it and tried again, with excellent results:

Shell
$ ollama pull llava:13b
$ describe -m llava:13b /mnt/c/bestPhotoOfMike.png The image features a smiling man wearing glasses and dressed in a suit and tie. He has a well-groomed appearance. The man's attire includes a jacket, dress shirt, and a patterned tie that complements his professional outfit. The setting appears to be a studio environment, as there is a background behind the man that has an evenly lit texture. The man's smile conveys confidence and approachability, making him appear knowledgeable in his field or simply happy to pose for this photograph.

You can try the latest LLaVA model online.

Ollama and Claude CLI

I wrote a review of Claude CLI. It can be used as a harness to run Ollama-compatible models, no matter if they are local or in the cloud. Documentation is here.

Local Models

llama2

My Windows workstation has 64 GB RAM, a 13th generation Intel i7 and a modest NVIDIA 3060. I decided to try the biggest Llama 2 model to see what might happen. I downloaded and executed the Llama 2 70B model with the following incantation. An NVIDIA 4090 would have been a better video card for this Ollama model, and it would still have been slow.

Shell
$ ollama run llama2:70b
pulling manifest
pulling 68bbe6dc9cf4... 100% ▕██████████████████████████▏  38 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕██████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████▏   59 B
pulling fa304d675061... 100% ▕██████████████████████████▏   91 B
pulling 7c96b46dca6c... 100% ▕██████████████████████████▏  558 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help) 

I played around to learn what the available messages were. For more information, see Tutorial: Set Session System Message in Ollama CLI by Ingrid Stevens.

Ollama messages (continued)
>>> /?
Available Commands:
  /set          Set session variables
  /show         Show model information
  /bye          Exit
  /?, /help     Help for a command
  /? shortcuts  Help for keyboard shortcuts

Use """ to begin a multi-line message.

>>> Send a message (/? for help)
>>> /show
Available Commands:
  /show info         Show details for this model
  /show license      Show model license
  /show modelfile    Show Modelfile for this model
  /show parameters   Show parameters for this model
  /show system       Show system message
  /show template     Show prompt template

>>> /show modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama2:70b

FROM /usr/share/ollama/.ollama/models/blobs/sha256:68bbe6dc9cf42eb60c9a7f96137fb8d472f752de6ebf53e9942f267f1a1e2577
TEMPLATE """[INST] <<SYS>>{{ .System }}<</SYS>>

{{ .Prompt }} [/INST]
"""
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"
PARAMETER stop "<<SYS>>"
>>> /show system
No system message was specified for this model.
>>>
/show template [INST] <<SYS>>{{ .System }}<</SYS>>
{{ .Prompt }} [/INST] >>>
%}/bye

USER: and ASSISTANT: are helpful when writing a request for the model to reply to.

QWEN 3.5

The 9B model is the default when running Ollama locally. It fits comfortably on a 12GB GPU like an RTX 3060, and supports text, image input, thinking, and tool calling. 4b, 2b, and 0.8b models are also available. To run the default 9B model locally, type:

Shell
$ ollama launch claude --model qwen3.5

NVIDIA Nemotron 3 Super

You can use a free Ollama account to run the NVIDIA Nemotron 3 Super model in the cloud under the control (or lack of control) of OpenClaw.

Shell
$ ollama launch openclaw --model nemotron-3-super:cloud

Installing OpenClaw... npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead
added 539 packages in 18s OpenClaw installed successfully
To use nemotron-3-super:cloud, please sign in.
Navigate to: https://ollama.com/connect?name=Bear&key=c3NoLWVkMjU1MTkgQUFBQUMzTnphQzFsWkRJMU5URTVBQUFBSU9odVJTM0FMdVMvUGZid3M0STJHVUdJekFyTlJpL1J3MmtVR210ZmlXaUY
⠸ Waiting for sign in to complete... Launching OpenClaw with nemotron-3-super:cloud...
Security
OpenClaw can read files and run actions when tools are enabled. A bad prompt can trick it into doing unsafe things.
Learn more: https://docs.openclaw.ai/gateway/security
I understand the risks. Continue?
Setting up OpenClaw with Ollama... Model: nemotron-3-super:cloud
🦞 OpenClaw 2026.3.13 (61d171a) — Half butler, half debugger, full crustacean. Default Ollama model: nemotron-3-super:cloud Config overwrite: /home/mslinn/.openclaw/openclaw.json (sha256 85b674df87782e7e73fb5cfde53c5036a7fdf32eb8f34aa47b386e426cce2ff0 -> 2da265b895dc0f25e207bdda2d7183df4e8972ea0ff30bda366eb6cde757d0f8, backup=/home/mslinn/.openclaw/openclaw.json.bak) Config write anomaly: /home/mslinn/.openclaw/openclaw.json (missing-meta-before-write) Updated ~/.openclaw/openclaw.json Workspace OK: ~/.openclaw/workspace Sessions OK: ~/.openclaw/agents/main/sessions System Node 20.19.5 at /usr/bin/node is below the required Node 22.16+. Using /home/mslinn/.nvm/versions/node/v25.8.1/bin/node for the daemon. Install Node 24 (recommended) or Node 22 LTS from nodejs.org or Homebrew. Installed systemd service: /home/mslinn/.config/systemd/user/openclaw-gateway.service Enabled systemd lingering for mslinn. │ ◇ Agents: main (default) Heartbeat interval: 30m (main) Session store (main): /home/mslinn/.openclaw/agents/main/sessions/sessions.json (0 entries) Tip: run `openclaw configure --section web` to store your Brave API key for web_search. Docs: https://docs.openclaw.ai/tools/web ✓ Installed web search plugin
Starting your assistant — this may take a moment...
Starting gateway...
✓ OpenClaw is running
Open the Web UI: http://localhost:18789/#token=1032095f6293e02b513eff40fca59bb947096c70b2906eee
Quick start: /help see all commands openclaw configure --section channels connect WhatsApp, Telegram, etc. openclaw skills browse and install skills
The OpenClaw gateway is running in the background. Stop it with: openclaw gateway stop

🦞 OpenClaw 2026.3.13 (61d171a) — Somewhere between 'hello world' and 'oh god what have I built.'
openclaw tui - ws://127.0.0.1:18789 - agent main - session main
session agent:main:main

Wake up, my friend!

I need to check my daily memory file first, as per the startup routine.
Let me read the memory file for today's date (2026-03-20):I'll create the memory directory and today's memory file since it doesn't exist yet:Now let me check yesterday's memory file (2026-03-19) for recent context:Let me also check if there's a MEMORY.md file for long-term memory (only load in main session):Now let me read the core files to understand who I am and who I'm helping: ⠴ running • 12s | connected agent main | session main | unknown | tokens ?/128k

Background Agents

Ollama can launch non-interactive agents using the new --yes flag. This enables running OpenClaw and other agents in the background without interruption on servers, scripts, and other environments that do not support the OpenClaw gateway.

OpenClaw and other background agents hand control of your computer to autonomous LLM models that are widely known to be unreliable and prone to extremely destructive acts.

To launch a non-interactive OpenClaw agent with the minimax-m2.7:cloud model, run:

Shell
$ ollama launch openclaw \
    --model minimax-m2.7:cloud \
    --yes -- agent \
    --agent main \
    --local \
    --message "Prepare a pre-read for my next meeting"

"Prepare a pre-read" means creating a document or set of materials for attendees to review before a meeting starts. The goal is to provide necessary background and context so you can skip the catch-up phase and dive straight into productive discussion or decision-making during the actual meeting.

Recording a Session

See Recording Chat Transcripts to obtain the record script and to learn various ways of viewing the transcript.

The following shows how to use record to launch Ollama and run the qwen3:4b model.

Shell
$ record -c 'ollama run qwen3:4b'
Press Ctrl+D to end the chat and stop recording.
Script started, output log file is '2025-12-12_20-06-39_chat.log'. 
>>> /show Available Commands: /show info Show details for this model /show license Show model license /show modelfile Show Modelfile for this model /show parameters Show parameters for this model /show system Show system message /show template Show prompt template
>>> ^D # Exit the ollama session
Script done. Recording finished. Log saved to /home/mslinn/2025-12-12_20-06-39_chat.log

Documentation

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.