Large Language Models

Large Language Models with Ollama

Published 2024-01-14.
Time to read: 4 minutes.

This page is part of the llm collection.

I've been playing with large language models (LLMs) online and locally. Local models are not as powerful, or as fast, as the OpenAI model, but you have complete control over them without censorship.

Ollama is a way to run large language models (LLMs) locally. It is an open source tool built in Go for running and packaging generative machine learning models. Ollama can be used procedurally, via chat, via a web interface, and via its REST interface.

Here are some example open-source models that can be downloaded:

Model Parameters Size Download Command
Llama 2 7B 3.8GB ollama run llama2
Mistral 7B 4.1GB ollama run mistral
Dolphin Phi 2.7B 1.6GB ollama run dolphin-phi
Phi-2 2.7B 1.7GB ollama run phi
Neural Chat 7B 4.1GB ollama run neural-chat
Starling 7B 4.1GB ollama run starling-lm
Code Llama 7B 3.8GB ollama run codellama
Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
Llama 2 13B 13B 7.3GB ollama run llama2:13b
Llama 2 70B 70B 39GB ollama run llama2:70b
Orca Mini 3B 1.9GB ollama run orca-mini
Vicuna 7B 3.8GB ollama run vicuna
LLaVA 7B 4.5GB ollama run llava

Each model has unique attributes. Some are designed for describing images, while others are designed for generating music, or other special purposes.

The 70B parameter model really puts a strain on the computer, and takes much longer than other models to yield a result.

Installation

I installed Ollama on WSL like this:

Shell
$ curl -s https://ollama.ai/install.sh | sh
>>> Downloading ollama...
################################################################# 100.0%#=#=#
################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service →
/etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed. 

Bind: address already in use

The new user experience has a speed bump. Supposedly this was fixed, but I hit it.

Shell
$ ollama serve
Couldn't find '/home/mslinn/.ollama/id_ed25519'. Generating new private key.
Your new public key is:
ssh-ed25519 AAAAC3NzaC1lZDI1ABCDEFIDuwe6mUdM4aXFZYxpVYejCO/Kp/cq1HrfC59w7qb
Error: listen tcp 127.0.0.1:11434: bind: address already in use

I do not know why the highlighted message appeared, because I checked and the port was available. Restarting the server solved the problem.

MacOS Solution

On Mac the app, running in the toolbar, automatically restarts the server if it stops. To stop the server, exit the toolbar app. You can restart it like this:

Shell
$ brew services restart ollama

Linux Solution

On Linux, the Ollama server is added as a system service. You can control it with these commands:

Shell
$ sudo systemctl stop ollama
$ sudo systemctl start ollama
$ sudo systemctl restart ollama

Command Line Start

You can start the server from the command line, if it is not already running as a service:

Shell
$ ollama serve
2024/01/14 16:25:20 images.go:808: total blobs: 0
2024/01/14 16:25:20 images.go:815: total unused blobs removed: 0
2024/01/14 16:25:20 routes.go:930: Listening on 127.0.0.1:11434 (version 0.1.20)
2024/01/14 16:25:21 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/14 16:25:21 gpu.go:88: Detecting GPU type
2024/01/14 16:25:21 gpu.go:203: Searching for GPU management library libnvidia-ml.so
2024/01/14 16:25:21 gpu.go:248: Discovered GPU libraries: [/usr/lib/wsl/lib/libnvidia-ml.so.1]
2024/01/14 16:25:21 gpu.go:94: Nvidia GPU detected
2024/01/14 16:25:21 gpu.go:135: CUDA Compute Capability detected: 8.6 

Ollama Models

Ollama uses models on demand; the models are ignored if no queries are active. That means you do not have to restart ollama after installing a new model or removing an existing model.

My workstation has 64 GB RAM, a 13th generation Intel i7 and a modest NVIDIA 3060. I decided to try the biggest model to see what might happen. I downloaded the Llama 2 70B model with the following incantation. (Spoiler: An NVIDIA 4090 would have been better video card for this Ollama model, and it would still be slow.)

Shell
$ ollama run llama2:70b
pulling manifest
pulling 68bbe6dc9cf4... 100% ▕████████████████████████████████████▏  38 GB
pulling 8c17c2ebb0ea... 100% ▕████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕████████████████████████████████████▏   59 B
pulling fa304d675061... 100% ▕████████████████████████████████████▏   91 B
pulling 7c96b46dca6c... 100% ▕████████████████████████████████████▏  558 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help) 

I played around to learn what the available messages were. For more information, see Tutorial: Set Session System Message in Ollama CLI by Ingrid Stevens.

Ollama messages (continued)
>>> /?
Available Commands:
  /set          Set session variables
  /show         Show model information
  /bye          Exit
  /?, /help     Help for a command
  /? shortcuts  Help for keyboard shortcuts

Use """ to begin a multi-line message.

>>> Send a message (/? for help)
>>>  /show
Available Commands:
  /show info         Show details for this model
  /show license      Show model license
  /show modelfile    Show Modelfile for this model
  /show parameters   Show parameters for this model
  /show system       Show system message
  /show template     Show prompt template

>>> /show modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama2:70b

FROM /usr/share/ollama/.ollama/models/blobs/sha256:68bbe6dc9cf42eb60c9a7f96137fb8d472f752de6ebf53e9942f267f1a1e2577
TEMPLATE """[INST] <<SYS>>{{ .System }}<</SYS>>

{{ .Prompt }} [/INST]
"""
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"
PARAMETER stop "<<SYS>>"
>>> /show system
No system message was specified for this model.
>>>
/show template [INST] <<SYS>>{{ .System }}<</SYS>>
{{ .Prompt }} [/INST] >>>
%}/bye

USER: and ASSISTANT: are helpful when writing a request for the model to reply to.

By default, Ollama models are stored in these directories:

  • Linux: /usr/share/ollama/.ollama/models
  • macOS: ~/.ollama/models

The Ollama library has many models available. OllamaHub has more. For applications that may not be safe for work, there is an equivalent uncensored Llama2 70B model that can be downloaded. Do not try to work with this model unless you have a really powerful machine!

Shell
$ ollama pull llama2-uncensored:70b
pulling manifest
pulling abca3de387b6... 100% ▕█████████████████████████████████████▏  38 GB
pulling 9224016baa40... 100% ▕█████████████████████████████████████▏ 7.0 KB
pulling 1195ea171610... 100% ▕█████████████████████████████████████▏ 4.8 KB
pulling 28577ba2177f... 100% ▕█████████████████████████████████████▏   55 B
pulling ddaa351c1f3d... 100% ▕█████████████████████████████████████▏   51 B
pulling 9256cd2888b0... 100% ▕█████████████████████████████████████▏  530 B
verifying sha256 digest
writing manifest
removing any unused layers
success 

Some additional models that interested me:

  • falcon - A large language model built by the Technology Innovation Institute (TII) for use in summarization, text generation, and chat bots.
  • samantha-mistral - A companion assistant trained in philosophy, psychology, and personal relationships. Based on Mistral.
  • yarn-llama2 - An extension of Llama 2 that supports a context of up to 128k tokens.

I then listed the models on my computer in another console:

Shell
$ ollama list
NAME            ID              SIZE    MODIFIED
llama2:70b      e7f6c06ffef4    38 GB   9 minutes ago 

Running Queries

Ollama queries can be run in many ways

I used curl, jq and fold to write my first query from a bash prompt. The -s option for curl prevents the progress meter from cluttering up the screen, and the jq filter removes everything from the response except the desired text. The fold command wraps the text response to a width of 72 characters.

Shell
$ curl -s http://localhost:11434/api/generate -d '{
  "model":  "llama2:70b",
  "prompt": "Why is there air?",
  "stream": false
}' | jq -r .response | fold -w 72 -s
Air, or more specifically oxygen, is essential for life as we know it.
It exists because of the delicate balance of chemical reactions in
Earth’s atmosphere, which has allowed complex organisms like
ourselves to evolve.
But if you’re asking about air in a broader sense, it serves many functions: it helps maintain a stable climate, protects living things from harmful solar radiation, and provides buoyancy for various forms of life, such as fish or birds.

Describing Images

I wrote this method to describe images.

Ruby
def describe_image(image_filename)
  @client = Ollama.new(
    credentials: { address: @address },
    options:     {
      server_sent_events: true,
      temperature:        @temperature,
      connection:         { request: { timeout: @timeout, read_timeout: @timeout } },
    }
  )
  result = @client.generate(
    {
      model:  @model,
      prompt: 'Please describe this image.',
      images: [Base64.strict_encode64(File.read(image_filename))],
    }
  )
  puts result.map { |x| x['response'] }.join
end

The results were ridiculous - an example of the famous hallucination that LLMs entertain their audience with. As the public becomes enculturated with these hallucinations, we may come to prefer them over human comedians. Certainly there will be a lot of material for the human comedians to fight back with. For example, when describing the photo of me at the top of this page:

This is an image of a vibrant and colorful sunrise over the ocean, with the sun peeking above the horizon, casting warm, golden hues over the sky and water below. The sunlight reflects off the rippled surface of the water, creating shimmering patterns that contrast with the tranquil darkness of the receding waters. In the foreground, a solitary figure is silhouetted against the rising sun, perhaps lost in thought or finding inspiration in the breathtaking beauty of the scene.

Another attempt, with an equally ridiculous result:

The photograph depicts an intricate pattern of geometric shapes and lines, creating an abstract design that appears to be in motion. The colors used, including vibrant shades of blue, purple, and red, add energy and dynamism to the piece. It has a sense of fluidity and movement which evokes a feeling of excitement or anticipation. The artwork's abstract nature allows for multiple interpretations, leaving room for personal perspectives and emotions that each viewer may associate with it.

The llava model is supposed to be good at describing images, so I installed it and tried again, with excellent results:

Shell
$ ollama pull llava:13b
describe -m llava:13b /mnt/c/sites/photos/me/bestPhotoOfMike.png The image features a smiling man wearing glasses and dressed in a suit and tie. He has a well-groomed appearance. The man's attire includes a jacket, dress shirt, and a patterned tie that complements his professional outfit. The setting appears to be a studio environment, as there is a background behind the man that has an evenly lit texture. The man's smile conveys confidence and approachability, making him appear knowledgeable in his field or simply happy to pose for this photograph.

You can try the latest LLaVA model online.

Documentation

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.