mirror of https://github.com/ollama/ollama.git synced 2025-03-26 17:51:48 +01:00

Go to file

Michael Yang 5ade3db040 fix race

block on write which only returns when the channel is closed. this is
contrary to the previous arrangement where the handler may return but
the stream hasn't finished writing. it can lead to the client receiving
unexpected responses (since the request has been handled) or worst case
a nil-pointer dereference as the stream tries to flush a nil writer

2023-07-14 15:10:46 -07:00

api

Merge pull request #77 from jmorganca/mem

2023-07-14 14:57:42 -07:00

app

app: trim server lines before logging

2023-07-11 16:43:19 -07:00

cmd

continue conversation

2023-07-13 17:13:00 -07:00

docs

add publish script

2023-07-07 12:59:45 -04:00

examples/python

examples: add basic python example

2023-07-08 17:40:05 -04:00

llama

continue conversation

2023-07-13 17:13:00 -07:00

scripts

build app in publish script

2023-07-12 19:16:39 -07:00

server

fix race

2023-07-14 15:10:46 -07:00

web

web: disable signup button while submitting

2023-07-12 17:32:27 -07:00

.dockerignore

update Dockerfile

2023-07-06 16:34:44 -04:00

.gitignore

fix compilation issue in Dockerfile, remove from README.md until ready

2023-07-11 19:51:08 -07:00

.prettierrc.json

move .prettierrc.json to root

2023-07-02 17:34:46 -04:00

Dockerfile

fix compilation issue in Dockerfile, remove from README.md until ready

2023-07-11 19:51:08 -07:00

ggml-metal.metal

look for ggml-metal in the same directory as the binary

2023-07-11 15:58:56 -07:00

go.mod

no errgroup

2023-07-11 14:58:10 -07:00

go.sum

no errgroup

2023-07-11 14:58:10 -07:00

LICENSE

proto -> ollama

2023-06-26 15:57:13 -04:00

main.go

continue conversation

2023-07-13 17:13:00 -07:00

models.json

update vicuna model

2023-07-12 09:42:26 -07:00

README.md

update README.md API reference

2023-07-12 19:16:28 -07:00

README.md

Ollama

Run large language models with llama.cpp.

Note: certain models that can be run with Ollama are intended for research and/or non-commercial use only.

Features

Download and run popular large language models
Switch between multiple models on the fly
Hardware acceleration where available (Metal, CUDA)
Fast inference server written in Go, powered by llama.cpp
REST API to use with your application (python, typescript SDKs coming soon)

Install

Download for macOS
Download for Windows (coming soon)

You can also build the binary from source.

Quickstart

Run a fast and simple model.

ollama run orca

Example models

💬 Chat

Have a conversation.

ollama run vicuna "Why is the sky blue?"

🗺️ Instructions

Get a helping hand.

ollama run orca "Write an email to my boss."

🔎 Ask questions about documents

Send the contents of a document and ask questions about it.

ollama run nous-hermes "$(cat input.txt)", please summarize this story

📖 Storytelling

Venture into the unknown.

ollama run nous-hermes "Once upon a time"

Advanced usage

Run a local model

ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

Building

go build .

To run it start the server:

./ollama server &

Finally, run a model!

./ollama run ~/Downloads/vicuna-7b-v1.3.ggmlv3.q4_1.bin

API Reference

`POST /api/pull`

Download a model

curl -X POST http://localhost:11343/api/pull -d '{"model": "orca"}'

`POST /api/generate`

Complete a prompt

curl -X POST http://localhost:11434/api/generate -d '{"model": "orca", "prompt": "hello!"}'

Languages

Go 93.3%

C 2.6%

Shell 1.2%

TypeScript 1%

PowerShell 0.8%

Other 1%