ollama

highperfocused/ollama

Fork 0

mirror of https://github.com/ollama/ollama.git synced 2025-11-10 10:56:46 +01:00

Files

History

Patrick Devine ddc4a39386 feed the linter

2025-11-05 00:20:54 -08:00

bench_test.go

feed the linter

2025-11-05 00:20:54 -08:00

bench.go

feed the linter

2025-11-05 00:20:54 -08:00

README.md

tests: basic benchmarking test framework

2025-11-04 23:16:51 -08:00

README.md

Ollama Benchmark Tool

A Go-based command-line tool for benchmarking Ollama models with configurable parameters and multiple output formats.

Features

Benchmark multiple models in a single run
Support for both text and image prompts
Configurable generation parameters (temperature, max tokens, seed, etc.)
Supports benchstat and CSV output formats
Detailed performance metrics (prefill, generate, load, total durations)

Building from Source

go build -o ollama-bench bench.go
./bench -model gpt-oss:20b -epochs 6 -format csv

Using Go Run (without building)

go run bench.go -model gpt-oss:20b -epochs 3

Usage

Basic Example

./bench -model gemma3 -epochs 6

Benchmark Multiple Models

./bench -model gemma3,gemma3n -epochs 6 -max-tokens 100 -p "Write me a short story" | tee gemma.bench
benchstat -col /name gemma.bench

With Image Prompt

./bench -model qwen3-vl -image photo.jpg -epochs 6 -max-tokens 100 -p "Describe this image"

Advanced Example

./bench -model llama3 -epochs 10 -temperature 0.7 -max-tokens 500 -seed 42 -format csv -output results.csv

Command Line Options

Output Formats

Benchstat Format

Compatible with Go's benchstat tool for statistical analysis:

BenchmarkModel/name=gpt-oss:20b/step=prefill 128 78125.00 ns/token 12800.00 token/sec
BenchmarkModel/name=gpt-oss:20b/step=generate 512 19531.25 ns/token 51200.00 token/sec
BenchmarkModel/name=gpt-oss:20b/step=load 1 1500000000 ns/request

CSV Format

Machine-readable comma-separated values:

NAME,STEP,COUNT,NS_PER_COUNT,TOKEN_PER_SEC
gpt-oss:20b,prefill,128,78125.00,12800.00
gpt-oss:20b,generate,512,19531.25,51200.00
gpt-oss:20b,load,1,1500000000,0

Metrics Explained

The tool reports four types of metrics for each model:

prefill: Time spent processing the prompt
generate: Time spent generating the response
load: Model loading time (one-time cost)
total: Total request duration