Commit Graph

  • ae5c33008e docs: move turbo.md to cloud.md jmorganca 2025-09-19 15:49:56 -07:00
  • 4ef2b2852d server: serve original error for remote models jmorganca/cloud-errors jmorganca 2025-09-20 16:46:32 -07:00
  • 3677842ff1 Merge pull request #12358 from ollama/drifkin/qwen3-coder-ampersands v0.12.1-rc0 Devon Rifkin 2025-09-20 12:40:33 -07:00
  • 242df70a75 parsers: fix &s in qwen3coder parameter values Devon Rifkin 2025-09-20 12:10:58 -07:00
  • dba39b2eee gemma: fix rope scaling for qat models (#12348) Patrick Devine 2025-09-19 15:04:40 -07:00
  • 220a0da37e simplify expand path mxyng/expand-path Michael Yang 2025-09-19 10:01:28 -07:00
  • 9f3a37fd36 fix: model load for unsupported embedding models (#12311) v0.12.0-rc1 v0.12.0 Michael Yang 2025-09-18 16:11:08 -07:00
  • 7460259eb3 feat: qwen3 embed (#12301) Michael Yang 2025-09-18 15:50:32 -07:00
  • 22ccdd74c2 server: add unauthorized error to remote chat handler (#12338) Jeffrey Morgan 2025-09-18 19:40:31 -03:00
  • 0c3d0e7533 build: avoid unbounded parallel builds (#12319) Daniel Hiltgen 2025-09-18 14:57:01 -07:00
  • e7f56ef3d8 harmony: remove special casing in routes.go Devon Rifkin 2025-09-18 14:55:59 -07:00
  • eb0a5d4459 auth: check the permissions on the private key to see if it's readable (#12336) Patrick Devine 2025-09-18 14:34:34 -07:00
  • ceac416ec2 fix(integration): check truncated length (#12337) Michael Yang 2025-09-18 14:00:21 -07:00
  • 2717dce6fe convert: convert bf16 vision weights to fp16 (#12324) v0.12.0-rc0 Patrick Devine 2025-09-17 17:43:17 -07:00
  • 9b8187b487 server: skip parsing initial <think> if provided in the prompt for /api/generate (#12289) frob 2025-09-18 01:39:04 +02:00
  • 8b894933a7 engine: add remote proxy (#12307) Patrick Devine 2025-09-17 14:40:53 -07:00
  • 9c5bf342bc fix: multi-cuda version skew (#12318) Daniel Hiltgen 2025-09-17 13:05:09 -07:00
  • 564b558c92 fix(llama): other llama flavours (#12308) Michael Yang 2025-09-17 12:12:21 -07:00
  • a417ac97ee prefer ollama engine for qwen3 (#12310) Michael Yang 2025-09-17 09:48:21 -07:00
  • 05d53457af refactor: use the built-in max/min to simplify the code (#12280) russcoss 2025-09-16 20:14:21 -04:00
  • b225508c9b logutil: fix source field (#12279) Michael Yang 2025-09-16 16:18:07 -07:00
  • fa1c987a29 Merge pull request #12248 from ollama/drifkin/qwen3-coder-parsing Devon Rifkin 2025-09-16 10:21:43 -07:00
  • ad95d5b30b use split activations when possible (#12293) Michael Yang 2025-09-16 09:51:19 -07:00
  • c253433d68 embed: cleanup (#12299) Michael Yang 2025-09-16 09:48:42 -07:00
  • a1cff89b30 fix: fix CUDA detection for older GPUs (#12300) Beshoy Girgis 2025-09-16 09:47:06 -05:00
  • c10a40db99 parser: tidy up parameter/message parsing pdevine/parser-tidy Patrick Devine 2025-09-15 18:09:05 -07:00
  • 93c64ea1b1 doc: show how to clear the cgo cache (#12298) Daniel Hiltgen 2025-09-15 15:45:35 -07:00
  • 3f6642f6fc model: implement bert in ollama engine (#9080) Michael Yang 2025-09-15 15:35:59 -07:00
  • 6f7117145f batch: use tensors for outputs (#12185) Michael Yang 2025-09-15 14:33:06 -07:00
  • 7eb0ff7dca set_rows jessegross/set_rows Jesse Gross 2025-08-18 10:45:58 -07:00
  • 472feec2ff address comments Devon Rifkin 2025-09-15 11:46:25 -07:00
  • 47991940d4 add qwen3-coder tool support Devon Rifkin 2025-09-11 13:40:35 -07:00
  • 92b96d54ef Revert "runner: move harmony to runner (#12052)" v0.11.11-rc3 v0.11.11-rc2 v0.11.11 jmorganca 2025-09-12 13:32:30 -07:00
  • 9d56e63dbf Revert "runner: simplify parser entrypoints in runner (#12233)" jmorganca 2025-09-12 13:32:02 -07:00
  • 053092185e Fix image cannot be seen with slice image on llama engine tc-mb 2025-09-13 07:25:12 +08:00
  • 44a6792873 tests: tighten up a few flaky tests (#12271) Daniel Hiltgen 2025-09-12 13:59:34 -07:00
  • 45fecff6c0 Revert "runner: simplify parser entrypoints in runner (#12233)" revert-12233-parth/simplify-entrypoints-runner Jeffrey Morgan 2025-09-12 13:31:15 -07:00
  • e4ce68311a cuda: remove compression for better compatibility (#12259) v0.11.11-rc1 Daniel Hiltgen 2025-09-12 07:59:14 -07:00
  • 26214125e8 ollamarunner: Suppress stack trace during memory allocation Jesse Gross 2025-09-11 13:48:51 -07:00
  • 61fb912ca4 CI: fix windows cuda build (#12246) v0.11.11-rc0 Daniel Hiltgen 2025-09-11 12:25:26 -07:00
  • aba1575315 llm: Don't try to load split vision models in the Ollama engine Jesse Gross 2025-09-10 11:03:06 -07:00
  • eb10390de9 llm: Enable new memory estimates by default Jesse Gross 2025-09-11 10:30:18 -07:00
  • feb18cd710 feat: add dimensions field to embed requests (#12242) Michael Yang 2025-09-11 10:36:10 -07:00
  • 8a7e2055d2 cmd: use slices.Contains to simplify code (#12249) fengyuchuanshen 2025-09-12 00:57:31 +08:00
  • c0aeb3531b runner: add sync between computeBatch and completion parth/enable-so-gpt-oss ParthSareen 2025-09-10 18:50:01 -07:00
  • 29ddfc2cab ggml: Disable flash attention for gemma2 Jesse Gross 2025-09-09 10:48:34 -07:00
  • 71cb86af3e llm: Remove unneeded warning with flash attention enabled Jesse Gross 2025-09-09 10:37:28 -07:00
  • 5198956372 docs: add ollama-co2 to community integrations (#12230) CarbonatedWater.org 2025-09-10 16:37:10 -07:00
  • f5c9eb5aa2 models: qwen3vl brucemacd/qwen3vl Bruce MacDonald 2025-09-10 12:11:46 -07:00
  • 17a023f34b Add v12 + v13 cuda support (#12000) Daniel Hiltgen 2025-09-10 12:05:18 -07:00
  • 1e5fecbbc3 runner/parser: allow on-the-fly grammar constraining ParthSareen 2025-09-10 11:50:12 -07:00
  • 8d6fffaead runner: simplify parser entrypoints in runner (#12233) Parth Sareen 2025-09-10 11:24:42 -07:00
  • 20b53eaa72 tests: add tool calling integration test (#12232) Parth Sareen 2025-09-09 14:01:11 -07:00
  • 6745182885 tests: reduce stress on CPU to 2 models (#12161) Daniel Hiltgen 2025-09-09 09:32:15 -07:00
  • 02403c2e62 readme: simplify readme jmorganca/readme-simplify jmorganca 2025-09-08 21:37:25 -07:00
  • f810ec741c readme: add Clueless to community integrations (#12188) Kashyap Tanuku 2025-09-09 00:31:29 -04:00
  • e119783e66 llm: Clamp batch size to context size Jesse Gross 2025-09-08 17:33:31 -07:00
  • 1a558f98e2 runner: move harmony to runner (#12052) Parth Sareen 2025-09-08 15:07:59 -07:00
  • 7b91c9ce51 Hybrid and recurrent memory estimates (#12186) Gabe Goodhart 2025-09-08 15:53:22 -06:00
  • 950d33aa30 docs: show how to debug nvidia init failures (#12216) Daniel Hiltgen 2025-09-08 11:39:00 -07:00
  • 9714e38dd0 fix: nil pointer dereference if cache is nil (#12215) Michael Yang 2025-09-08 09:53:59 -07:00
  • 4378ae4ffa parser: don't check the file type of safetensors to prevent false negatives. (#12176) frob 2025-09-06 01:27:40 +02:00
  • 5994e8e8fd embedding gemma model (#12181) v0.11.10 Michael Yang 2025-09-04 09:09:07 -07:00
  • b3e6120736 more logutil.Trace (#12177) Michael Yang 2025-09-03 17:24:39 -07:00
  • 1fe7e07f63 sampler/runner: enable gpt-oss structured outputs parth/gpt-oss-structured-outputs ParthSareen 2025-09-03 15:04:20 -07:00
  • 1aa4947cdc Revert "tools: avoid matching braces that are part of tool content (#12039)" revert-12039-jmorganca/tools-braces Jeffrey Morgan 2025-09-03 15:02:39 -07:00
  • 40d3436cd1 cleanup passing in harmony flag and add generate support ParthSareen 2025-09-02 09:58:36 -07:00
  • 5bc783b58e harmony: move tests from routes to parser ParthSareen 2025-08-28 13:47:18 -07:00
  • 87714c1c39 harmony: add harmony parsing to runner ParthSareen 2025-08-22 15:47:10 -07:00
  • f7ca3b7f7e routes: ChatHandler to get parsed harmony from runner ParthSareen 2025-08-22 15:46:42 -07:00
  • 72189c6d6e harmony: simplify prefill, add marshalling for functions, and update harmony check ParthSareen 2025-08-22 15:45:11 -07:00
  • 1d09e01431 server: update completion request signature and update token repeat ParthSareen 2025-08-22 15:40:32 -07:00
  • eb7660d724 server: add thinking and tool calls to CompletionResponse ParthSareen 2025-08-21 14:50:34 -07:00
  • 4a5bdd5f12 harmony: move harmony parsing into a package ParthSareen 2025-08-21 12:34:52 -07:00
  • fb92b61754 logutil: add Trace and TraceContext helpers (#12110) v0.11.9 Michael Yang 2025-09-02 13:09:12 -07:00
  • 8149a3c86e llm: Avoid underflow in free memory logging Jesse Gross 2025-09-02 10:47:33 -07:00
  • 46e485f32c runner: disable embedding models in ollama engine mxyng/embeddings Michael Yang 2025-09-02 10:42:16 -07:00
  • 0cc90a8186 harden uncaught exception registration (#12120) v0.11.9-rc0 Daniel Hiltgen 2025-09-02 09:43:55 -07:00
  • e42300f25b ml: fix struct field name in comment (#12123) pxwanglu 2025-09-01 07:26:11 +08:00
  • 66e73809a1 readme: add NOMYO Router to community integrations (#12129) alpha-nerd-nomyo 2025-08-31 22:49:10 +02:00
  • 517807cdf2 perf: build graph for next batch async to keep GPU busy (#11863) Daniel Hiltgen 2025-08-29 14:20:28 -07:00
  • ead4a9a1d0 Always filter devices (#12108) Daniel Hiltgen 2025-08-29 12:17:31 -07:00
  • 8d97d4b0ea use fs.gguf.File to show models mxyng/gguf Michael Yang 2025-08-28 17:04:45 -07:00
  • d9d980c760 lazy gguf arrays Michael Yang 2025-06-12 10:57:17 -07:00
  • 12e13573a8 benchmark tests Michael Yang 2025-06-12 10:56:23 -07:00
  • 4383a3ab7a readme: add Neuro SAN to community integrations (#12109) v0.11.8 ofrancon 2025-08-28 12:27:13 -07:00
  • 9d97e6a9f1 ggml: Avoid allocating CUDA primary context on unused GPUs Jesse Gross 2025-08-26 14:17:43 -07:00
  • 1081532430 fix keep alive (#12041) Michael Yang 2025-08-27 11:51:25 -07:00
  • 59412fbb43 convert(gptoss): mxfp4 to ggml layout to avoid jit conversion (#12018) v0.11.8-rc0 Michael Yang 2025-08-26 16:41:02 -07:00
  • 86834a2797 convert: fix tensor sorting (#12015) Michael Yang 2025-08-26 13:57:46 -07:00
  • 85ccf7354d gptoss: enable flash attention by default (#11996) Michael Yang 2025-08-26 13:34:45 -07:00
  • bdfc82b351 add model benchmark mxyng/benchmark Michael Yang 2025-08-08 14:54:32 -07:00
  • d05fc26570 null truncate mxyng/types-null Michael Yang 2025-08-22 16:57:34 -07:00
  • c457628090 null stream Michael Yang 2025-08-23 13:28:12 -07:00
  • 30fb7e19f8 remove extra field attr (#11205) Michael Yang 2025-08-25 09:58:16 -07:00
  • e914477bb6 types: add types.Null[T] Michael Yang 2025-08-22 16:17:37 -07:00
  • d3450dd52e api: implement stringer for ToolFunctionParameters (#12038) v0.11.7-rc1 v0.11.7-rc0 v0.11.7 Jeffrey Morgan 2025-08-22 16:26:48 -07:00
  • f30d01801d routes: update generate handler to use runner with harmony parth/move-parsing ParthSareen 2025-08-22 16:06:41 -07:00
  • b08c7dad0a harmony: add harmony parsing to runner ParthSareen 2025-08-22 15:47:10 -07:00
  • bc5ab5784b routes: ChatHandler to get parsed harmony from runner ParthSareen 2025-08-22 15:46:42 -07:00