--- title: Vision --- Vision models accept images alongside text so the model can describe, classify, and answer questions about what it sees. ## Quick start ```shell ollama run gemma3 ./image.png whats in this image? ``` ## Usage with Ollama's API Provide an `images` array. SDKs accept file paths, URLs or raw bytes while the REST API expects base64-encoded image data. ```shell # 1. Download a sample image curl -L -o test.jpg "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg" # 2. Encode the image IMG=$(base64 < test.jpg | tr -d '\n') # 3. Send it to Ollama curl -X POST http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{ "model": "gemma3", "messages": [{ "role": "user", "content": "What is in this image?", "images": ["'"$IMG"'"] }], "stream": false }' " ``` ```python from ollama import chat # from pathlib import Path # Pass in the path to the image path = input('Please enter the path to the image: ') # You can also pass in base64 encoded image data # img = base64.b64encode(Path(path).read_bytes()).decode() # or the raw bytes # img = Path(path).read_bytes() response = chat( model='gemma3', messages=[ { 'role': 'user', 'content': 'What is in this image? Be concise.', 'images': [path], } ], ) print(response.message.content) ``` ```javascript import ollama from 'ollama' const imagePath = '/absolute/path/to/image.jpg' const response = await ollama.chat({ model: 'gemma3', messages: [ { role: 'user', content: 'What is in this image?', images: [imagePath] } ], stream: false, }) console.log(response.message.content) ```