Gemma 3 running with Ollama on a Nvidia Jetson Orin Nano computer
Written by Georgios Soloupis, AI and Android GDE.
In this blog post we delve into running Gemma 3 with Ollama on a Jetson Orin Nano device. This combination offers a compact yet powerful AI setup optimized for fast inference at the edge. Leveraging the Orin Nano’s efficient GPU acceleration, Ollama orchestrates the deployment and execution of the lightweight Gemma 3 models, enabling responsive performance for natural language processing tasks. Together, this combination delivers low-latency AI inference without the need for cloud connectivity, making it ideal for embedded, real-time scenarios when privacy is also a priority.
Jetson Orin Nano developer kit:
The NVIDIA Jetson Orin™ Nano Super Developer Kit is a compact, yet powerful computer that redefines generative AI for small edge devices. It provides developers, students, and makers with the most affordable and accessible platform, backed by the support of NVIDIA AI software and a broad AI software ecosystem. With the JetPack 6.1 (rev.1) update from December 2024, it delivers up to 70% more performance. Guides for Micro-SD and SDK Manager exist, that walk you through unboxing, firmware updates, flashing JetPack 6.2, and initial setup to get you ready for AI tutorials and projects.
Ollama is a lightweight framework designed to run large language models (LLMs) locally with minimal setup. It simplifies deploying and managing models by handling downloading, configuration, and execution behind the scenes. Ollama supports a variety of open-source models and enables fast, offline inference on compatible hardware, making it a great tool for developers looking to integrate AI capabilities into local or edge applications.
Gemma 3 is a collection of lightweight, advanced open models based on the same technology behind Gemini 2.0. With a 128K-token context window, it excels at handling complex tasks and processing large amounts of information. It supports over 140 languages for seamless multilingual communication and can understand and analyze text and images, enabling the creation of powerful, intelligent applications. Gemma models are a safe choice for AI development due to their focus on safety, efficiency, and performance.
Procedure
Once you’ve flashed the Jetson Orin Nano, the next step is to install Ollama and run the Gemma 3 model. The device runs Ubuntu out of the box, which is incredibly convenient for experimentation and development, providing a flexible and familiar environment for deploying AI workloads.
Here you can find the basic tutorial to install Ollama on the computer and start experimenting. Steps are pretty straight forward:
- Install Ollama with terminal and this simple command
curl -fsSL https://ollama.com/install.sh | sh
It creates a service to run ollama serve
on start up, so you can start using ollama
command right away
2. Run Gemma 3 effortlessly with the run
command
ollama run gemma3:4b "Describe the image in 100 words '/home/Downloads/bike.png'"
Ollama takes care of the download, setup, write the manifests on the computer and start immediately working. You can browse models that are available for immediate usage at this Ollama web page.
3. Not a fan of using the terminal after setup? No problem! Open WebUI lets you interact with Ollama through a clean, browser-based interface, making it easy to run models and manage tasks from your favorite web browser. Run a docker
command in the terminal to install Open WebUI.
docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main
Check the local host IP address of the Jetson computer on the terminal by typing:
hostname -I
This will give you something like 192.168.1.92 . Use this address when you open your browser of preference and type:
http://192.168.1.92:8080
Open WebUI starts at your browser and after a simple sign up, a familiar environment starts to help you use the LLMs:
The above address can be used to any computer’s browser in house that is connected to the same network (wifi).
*For some commands above you may need to install curl
and docker
first.
Using the Ollama API
Some devices such as the mobile ones have to use the Ollama API to gain access to the LLMs. The procedure is also straightforward:
- Stop the Ollama server
systemctl stop ollama
2. Set some environment variables
EXPORT OLLAMA_MODELS="/usr/share/ollama/.ollama/models/"
EXPORT OLLAMA_HOST="0.0.0.0:11434"
The first variable sets the path for the folder that Ollama keeps the downloaded models. The second sets OLLAMA_HOST
. When you’re using the Ollama CLI or an app that communicates with the Ollama API, it needs to know where to send requests. By default, it might expect the API to be running at localhost:11434
, but if you've configured it differently, for example, to be accessible on a different interface or machine, you need to tell the tools where to find it. Why 0.0.0.0
?
0.0.0.0
is a special IP that means "listen on all network interfaces."- If you’re running Ollama on a server or want it to be accessible from other devices on the same network (or even externally), you use
0.0.0.0
so it’s not just bound tolocalhost
.
3. Start Ollama service
ollama serve
You’re now ready to use the Ollama API from any device on the same network!
Simply connect to the server at http://192.168.1.92:11434
. Whether you're on a laptop, mobile phone, or any other device with browser or API access, just use this IP and port to interact with the Ollama API.
Remember to use hostname -I
to check your host machines IP address and that port 11434
stays the same. For example your address would be http://188.123.1.34:11434
.
Examples
Using the terminal you can ping the ollama server easily. First check that the port is listening with command:
nc -vz 192.168.1.92:11434
Then use the curl
command:
curl http://192.168.1.92:11434/api/generate -d '{
"model": "gemma3:1b",
"prompt":"Why my cat is not eating?",
"stream": false
}'
You want to do more with the API? Check the documentation of the Ollama API with examples on how you can get structured outputs or sending a request with images.
curl http://192.168.1.92:11434/api/generate -d '{
"model": "gemma3:4b",
"prompt":"What is in this picture?",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}'
Android setup
Once you have the curl
command and its parameters ready, it's straightforward to convert it into an equivalent Retrofit API call for use in an Android app. Check this app on Github if you want to follow along. The most important parts of the code are:
- The
interface
using thestreaming
annotation so we can have results on screen as the model generates the outputs.
interface ApiStreamingService {
@POST("api/generate")
@Streaming
suspend fun generate(
@Body request: Any
): Response<ResponseBody>
}
2. The network module. In the app Hilt dependency injection is used.
@Module
@InstallIn(SingletonComponent::class)
object NetworkModule {
// On the host machine do a "hostname -I" to check the IP
// In my case it was 192.168.1.92
// Port for Jetson Orin Nano is 11434
// Since we use http for the local server then use android:usesCleartextTraffic="true" at the manifest
private const val BASE_URL = "http://192.168.1.92:11434/"
@Provides
@Singleton
fun provideRetrofit(): Retrofit {
val okHttpClient = OkHttpClient.Builder()
.connectTimeout(60, TimeUnit.SECONDS)
.readTimeout(60, TimeUnit.SECONDS)
.writeTimeout(60, TimeUnit.SECONDS)
.build()
return Retrofit.Builder()
.baseUrl(BASE_URL)
.client(okHttpClient)
.addConverterFactory(GsonConverterFactory.create())
.build()
}
// Use for non streaming.
/*@Provides
@Singleton
fun provideApiService(retrofit: Retrofit): ApiService =
retrofit.create(ApiService::class.java)*/
@Provides
@Singleton
fun provideApiService(retrofit: Retrofit): ApiStreamingService =
retrofit.create(ApiStreamingService::class.java)
}
3. The processStream() function inside the view model.
private suspend fun processStream(responseBody: ResponseBody) {
// Wrap the byte stream with a BufferedReader.
responseBody.byteStream().bufferedReader().use { reader: BufferedReader ->
while (true) {
val line = reader.readLine() ?: break
// Update the Compose state on the main thread.
withContext(Dispatchers.Main) {
Log.v("streaming_", line)
_serverResult.value += JsonParser.parseResponse(line)
updateJetsonIsWorking(false)
}
}
}
}
You can customize the app to use any of the Gemma 3 models from the Ollama model garden!
Conclusion
In this blog post we explored how to run Gemma 3 with Ollama on the NVIDIA Jetson Orin Nano, a compact, powerful edge AI computer. This setup enables fast, local inference without cloud dependency, ideal for real-time applications. The Orin Nano leverages GPU acceleration to run the lightweight Gemma 3 models efficiently via Ollama, which simplifies deployment. Users can interact through the terminal or a browser-based WebUI, and the system also supports API access for remote devices. With support for over 140 languages and image understanding, Gemma 3 is powerful, optimized for edge use cases and a safer choice for AI development due to its focus on safety, efficiency, and performance.