ollama

mirror of https://github.com/ollama/ollama.git synced 2025-12-11 11:02:49 +01:00

Files

Patrick Devine d7fd72193f tests: basic benchmarking test framework (#12964 )

This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.

2025-11-15 18:17:40 -08:00

bench_test.go

…

bench.go

…

README.md

…

README.md

Ollama Benchmark Tool

A Go-based command-line tool for benchmarking Ollama models with configurable parameters and multiple output formats.

Features

Benchmark multiple models in a single run
Support for both text and image prompts
Configurable generation parameters (temperature, max tokens, seed, etc.)
Supports benchstat and CSV output formats
Detailed performance metrics (prefill, generate, load, total durations)

Building from Source

go build -o ollama-bench bench.go
./bench -model gpt-oss:20b -epochs 6 -format csv

Using Go Run (without building)

go run bench.go -model gpt-oss:20b -epochs 3

Usage

Basic Example

./bench -model gemma3 -epochs 6

Benchmark Multiple Models

./bench -model gemma3,gemma3n -epochs 6 -max-tokens 100 -p "Write me a short story" | tee gemma.bench
benchstat -col /name gemma.bench

With Image Prompt

./bench -model qwen3-vl -image photo.jpg -epochs 6 -max-tokens 100 -p "Describe this image"

Advanced Example

./bench -model llama3 -epochs 10 -temperature 0.7 -max-tokens 500 -seed 42 -format csv -output results.csv

Command Line Options

Output Formats

Markdown Format

The default markdown format is suitable for copying and pasting into a GitHub issue and will look like:

 Model | Step | Count | Duration | nsPerToken | tokensPerSec |
|-------|------|-------|----------|------------|--------------|
| gpt-oss:20b | prefill | 124 | 30.006458ms | 241987.56 | 4132.44 |
| gpt-oss:20b | generate | 200 | 2.646843954s | 13234219.77 | 75.56 |
| gpt-oss:20b | load | 1 | 121.674208ms | - | - |
| gpt-oss:20b | total | 1 | 2.861047625s | - | - |

Benchstat Format

Compatible with Go's benchstat tool for statistical analysis:

BenchmarkModel/name=gpt-oss:20b/step=prefill 128 78125.00 ns/token 12800.00 token/sec
BenchmarkModel/name=gpt-oss:20b/step=generate 512 19531.25 ns/token 51200.00 token/sec
BenchmarkModel/name=gpt-oss:20b/step=load 1 1500000000 ns/request

CSV Format

Machine-readable comma-separated values:

NAME,STEP,COUNT,NS_PER_COUNT,TOKEN_PER_SEC
gpt-oss:20b,prefill,128,78125.00,12800.00
gpt-oss:20b,generate,512,19531.25,51200.00
gpt-oss:20b,load,1,1500000000,0

Metrics Explained

The tool reports four types of metrics for each model:

prefill: Time spent processing the prompt
generate: Time spent generating the response
load: Model loading time (one-time cost)
total: Total request duration