Files
ollama/integration/README.md
Daniel Hiltgen 517807cdf2 perf: build graph for next batch async to keep GPU busy (#11863)
* perf: build graph for next batch in parallel to keep GPU busy

This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.

* tests: tune integration tests for ollama engine

This tunes the integration tests to focus more on models supported
by the new engine.
2025-08-29 14:20:28 -07:00

1.3 KiB

Integration Tests

This directory contains integration tests to exercise Ollama end-to-end to verify behavior

By default, these tests are disabled so go test ./... will exercise only unit tests. To run integration tests you must pass the integration tag. go test -tags=integration ./... Some tests require additional tags to enable to allow scoped testing to keep the duration reasonable. For example, testing a broad set of models requires -tags=integration,models and a longer timeout (~60m or more depending on the speed of your GPU.). To view the current set of tag combinations use find integration -type f | xargs grep "go:build"

The integration tests have 2 modes of operating.

  1. By default, they will start the server on a random port, run the tests, and then shutdown the server.
  2. If OLLAMA_TEST_EXISTING is set to a non-empty string, the tests will run against an existing running server, which can be remote based on your OLLAMA_HOST environment variable

Important

Before running the tests locally without the "test existing" setting, compile ollama from the top of the source tree go build . in addition to GPU support with cmake if applicable on your platform. The integration tests expect to find an ollama binary at the top of the tree.