6 Commits

Author SHA1 Message Date
Parth Sareen
7e34f4fbfa
sample: add numerical stability to temperature/softmax transform () 2025-03-10 14:43:53 -07:00
Jeffrey Morgan
e093db92c4
sample: temporarily use grammars for constrained generation in new engine () 2025-03-10 16:17:39 +01:00
Parth Sareen
0682dae027
sample: improve ollama engine sampler performance ()
This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s
2025-03-07 12:37:48 -08:00
Parth Sareen
c245b0406f
sample: remove transforms from greedy sampling () 2025-02-27 15:44:53 -08:00
Parth Sareen
0b7e1676eb
sample: add sampling package for new engine () 2025-02-24 17:19:01 -08:00
Michael Yang
58245413f4
next ollama runner ()
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling ()
- integration with Ollama and KV caching ()
- more model support () with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00