9 Commits

Author SHA1 Message Date
Parth Sareen
108fe02165
sample: make mutations in transforms explicit ()
* updated minP to use early exit making use of sorted tokens
2025-03-17 11:24:18 -07:00
Parth Sareen
5c0b663969
sample: separate softmax and temperature transforms () 2025-03-13 09:53:27 -07:00
ParthSareen
4aeb67ef4c sample: do all sorting in topK 2025-03-12 11:59:17 -07:00
ParthSareen
3ba91634c1 sample: simplify top_k=0 sorting 2025-03-12 11:59:17 -07:00
ParthSareen
1b7433b71e sample: use container/heap for top_k 2025-03-12 11:59:17 -07:00
Parth Sareen
7e34f4fbfa
sample: add numerical stability to temperature/softmax transform () 2025-03-10 14:43:53 -07:00
Jeffrey Morgan
e093db92c4
sample: temporarily use grammars for constrained generation in new engine () 2025-03-10 16:17:39 +01:00
Parth Sareen
0682dae027
sample: improve ollama engine sampler performance ()
This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s
2025-03-07 12:37:48 -08:00
Parth Sareen
0b7e1676eb
sample: add sampling package for new engine () 2025-02-24 17:19:01 -08:00