Sitemap

Gemma 3 running with Ollama on a Nvidia Jetson Orin Nano computer

7 min readApr 7, 2025

Written by Georgios Soloupis, AI and Android GDE.

This is an image of Gemma 3 logo

In this blog post we delve into running Gemma 3 with Ollama on a Jetson Orin Nano device. This combination offers a compact yet powerful AI setup optimized for fast inference at the edge. Leveraging the Orin Nano’s efficient GPU acceleration, Ollama orchestrates the deployment and execution of the lightweight Gemma 3 models, enabling responsive performance for natural language processing tasks. Together, this combination delivers low-latency AI inference without the need for cloud connectivity, making it ideal for embedded, real-time scenarios when privacy is also a priority.

Jetson Orin Nano developer kit:
The NVIDIA Jetson Orin™ Nano Super Developer Kit is a compact, yet powerful computer that redefines generative AI for small edge devices. It provides developers, students, and makers with the most affordable and accessible platform, backed by the support of NVIDIA AI software and a broad AI software ecosystem. With the JetPack 6.1 (rev.1) update from December 2024, it delivers up to 70% more performance. Guides for Micro-SD and SDK Manager exist, that walk you through unboxing, firmware updates, flashing JetPack 6.2, and initial setup to get you ready for AI tutorials and projects.

This is an image of an Nvidia Jetson Orin Nano developer kit. This is a single board computer and the image shows the various ports for connectivity.
The Jetson Orin Nano Developer kit.

Ollama is a lightweight framework designed to run large language models (LLMs) locally with minimal setup. It simplifies deploying and managing models by handling downloading, configuration, and execution behind the scenes. Ollama supports a variety of open-source models and enables fast, offline inference on compatible hardware, making it a great tool for developers looking to integrate AI capabilities into local or edge applications.

This is an image of ollama’s logo. It shows an animal that looks like a lama.
Ollama logo.

Gemma 3 is a collection of lightweight, advanced open models based on the same technology behind Gemini 2.0. With a 128K-token context window, it excels at handling complex tasks and processing large amounts of information. It supports over 140 languages for seamless multilingual communication and can understand and analyze text and images, enabling the creation of powerful, intelligent applications. Gemma models are a safe choice for AI development due to their focus on safety, efficiency, and performance.

Gemma’s logo.

Procedure

Once you’ve flashed the Jetson Orin Nano, the next step is to install Ollama and run the Gemma 3 model. The device runs Ubuntu out of the box, which is incredibly convenient for experimentation and development, providing a flexible and familiar environment for deploying AI workloads.

This is an image of the Jetson Orin Nano screen. It is featuring an Nvidia logo.
Jetson Orin Nano screen.

Here you can find the basic tutorial to install Ollama on the computer and start experimenting. Steps are pretty straight forward:

  1. Install Ollama with terminal and this simple command
curl -fsSL https://ollama.com/install.sh | sh
this is an image of a terminal executing the command to install ollama
Ollama installed.

It creates a service to run ollama serveon start up, so you can start using ollamacommand right away

This is an image of a terminal running the command Ollama to check the installation and provide the various flags and commands a user can utilize
Executing Ollama command

2. Run Gemma 3 effortlessly with the run command

ollama run gemma3:4b "Describe the image in 100 words '/home/Downloads/bike.png'"

Ollama takes care of the download, setup, write the manifests on the computer and start immediately working. You can browse models that are available for immediate usage at this Ollama web page.

3. Not a fan of using the terminal after setup? No problem! Open WebUI lets you interact with Ollama through a clean, browser-based interface, making it easy to run models and manage tasks from your favorite web browser. Run a docker command in the terminal to install Open WebUI.

docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main

Check the local host IP address of the Jetson computer on the terminal by typing:

hostname -I

This will give you something like 192.168.1.92 . Use this address when you open your browser of preference and type:

http://192.168.1.92:8080

Open WebUI starts at your browser and after a simple sign up, a familiar environment starts to help you use the LLMs:

This is a screenshot of the Web UI running on a laptop, demonstrating the use of Gemma 3 4B model offline.
Web UI interface

The above address can be used to any computer’s browser in house that is connected to the same network (wifi).

*For some commands above you may need to install curl and docker first.

Using the Ollama API

Some devices such as the mobile ones have to use the Ollama API to gain access to the LLMs. The procedure is also straightforward:

  1. Stop the Ollama server
systemctl stop ollama

2. Set some environment variables

EXPORT OLLAMA_MODELS="/usr/share/ollama/.ollama/models/"
EXPORT OLLAMA_HOST="0.0.0.0:11434"

The first variable sets the path for the folder that Ollama keeps the downloaded models. The second sets OLLAMA_HOST . When you’re using the Ollama CLI or an app that communicates with the Ollama API, it needs to know where to send requests. By default, it might expect the API to be running at localhost:11434, but if you've configured it differently, for example, to be accessible on a different interface or machine, you need to tell the tools where to find it. Why 0.0.0.0?

  • 0.0.0.0 is a special IP that means "listen on all network interfaces."
  • If you’re running Ollama on a server or want it to be accessible from other devices on the same network (or even externally), you use 0.0.0.0 so it’s not just bound to localhost.

3. Start Ollama service

ollama serve

You’re now ready to use the Ollama API from any device on the same network!
Simply connect to the server at http://192.168.1.92:11434. Whether you're on a laptop, mobile phone, or any other device with browser or API access, just use this IP and port to interact with the Ollama API.
Remember to use hostname -I to check your host machines IP address and that port 11434 stays the same. For example your address would be http://188.123.1.34:11434 .

Examples

Using the terminal you can ping the ollama server easily. First check that the port is listening with command:

nc -vz 192.168.1.92:11434

Then use the curl command:

curl http://192.168.1.92:11434/api/generate -d '{
"model": "gemma3:1b",
"prompt":"Why my cat is not eating?",
"stream": false
}'

You want to do more with the API? Check the documentation of the Ollama API with examples on how you can get structured outputs or sending a request with images.

curl http://192.168.1.92:11434/api/generate -d '{
"model": "gemma3:4b",
"prompt":"What is in this picture?",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}'

Android setup

Once you have the curl command and its parameters ready, it's straightforward to convert it into an equivalent Retrofit API call for use in an Android app. Check this app on Github if you want to follow along. The most important parts of the code are:

  1. The interface using the streaming annotation so we can have results on screen as the model generates the outputs.
interface ApiStreamingService {
@POST("api/generate")
@Streaming
suspend fun generate(
@Body request: Any
): Response<ResponseBody>
}

2. The network module. In the app Hilt dependency injection is used.

@Module
@InstallIn(SingletonComponent::class)
object NetworkModule {
// On the host machine do a "hostname -I" to check the IP
// In my case it was 192.168.1.92
// Port for Jetson Orin Nano is 11434
// Since we use http for the local server then use android:usesCleartextTraffic="true" at the manifest
private const val BASE_URL = "http://192.168.1.92:11434/"

@Provides
@Singleton
fun provideRetrofit(): Retrofit {
val okHttpClient = OkHttpClient.Builder()
.connectTimeout(60, TimeUnit.SECONDS)
.readTimeout(60, TimeUnit.SECONDS)
.writeTimeout(60, TimeUnit.SECONDS)
.build()

return Retrofit.Builder()
.baseUrl(BASE_URL)
.client(okHttpClient)
.addConverterFactory(GsonConverterFactory.create())
.build()
}

// Use for non streaming.
/*@Provides
@Singleton
fun provideApiService(retrofit: Retrofit): ApiService =
retrofit.create(ApiService::class.java)*/

@Provides
@Singleton
fun provideApiService(retrofit: Retrofit): ApiStreamingService =
retrofit.create(ApiStreamingService::class.java)
}

3. The processStream() function inside the view model.

private suspend fun processStream(responseBody: ResponseBody) {
// Wrap the byte stream with a BufferedReader.
responseBody.byteStream().bufferedReader().use { reader: BufferedReader ->
while (true) {
val line = reader.readLine() ?: break
// Update the Compose state on the main thread.
withContext(Dispatchers.Main) {
Log.v("streaming_", line)
_serverResult.value += JsonParser.parseResponse(line)
updateJetsonIsWorking(false)
}
}
}
}

You can customize the app to use any of the Gemma 3 models from the Ollama model garden!

Conclusion
In this blog post we explored how to run Gemma 3 with Ollama on the NVIDIA Jetson Orin Nano, a compact, powerful edge AI computer. This setup enables fast, local inference without cloud dependency, ideal for real-time applications. The Orin Nano leverages GPU acceleration to run the lightweight Gemma 3 models efficiently via Ollama, which simplifies deployment. Users can interact through the terminal or a browser-based WebUI, and the system also supports API access for remote devices. With support for over 140 languages and image understanding, Gemma 3 is powerful, optimized for edge use cases and a safer choice for AI development due to its focus on safety, efficiency, and performance.

--

--

Georgios Soloupis
Georgios Soloupis

Written by Georgios Soloupis

Pharmacist turned Android and AI Google Developer Expert. Right now I am rocking with Envision and working on accessibility at the Zolup browser.

No responses yet