ollama

highperfocused/ollama

Fork 0

mirror of https://github.com/ollama/ollama.git synced 2025-03-19 14:21:57 +01:00

Commit Graph

Author	SHA1	Message	Date
Parth Sareen	c245b0406f	sample: remove transforms from greedy sampling (#9377 )	2025-02-27 15:44:53 -08:00
Parth Sareen	0b7e1676eb	sample: add sampling package for new engine (#8410 )	2025-02-24 17:19:01 -08:00
Michael Yang	58245413f4	next ollama runner (#7913 ) feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2025-02-13 16:31:21 -08:00

Author

SHA1

Message

Date

Parth Sareen

c245b0406f

sample: remove transforms from greedy sampling (#9377 )

2025-02-27 15:44:53 -08:00

Parth Sareen

0b7e1676eb

sample: add sampling package for new engine (#8410 )

2025-02-24 17:19:01 -08:00

Michael Yang

58245413f4

next ollama runner (#7913 )

feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

2025-02-13 16:31:21 -08:00

3 Commits