Using PALM API inside android

George Soloupis
4 min readSep 27, 2023

--

Written by George Soloupis ML and Android GDE.

This is an implementation showcasing the seamless integration of Google’s Speech Recognizer and PALM API within a unified Android application. In this application, users can effortlessly communicate through their Android device’s microphone. The SpeechRecognizer component translates their spoken words into text, while the PALM API summarizes the content within a contextual framework. As a result, the user can execute various UI actions without ever needing to physically interact with their phone. This application empowers users to describe the ambient noise in their surroundings, allowing the phone to respond intelligently by either decreasing or increasing the volume accordingly.

Setting up the Speech Recognizer

The usage of Speech Recognizer is pretty straight forward inside android.
First we create the speechRecognizer and the recognitionIntent:

private var speechRecognizer: SpeechRecognizer? = null
private val recognitionIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)

speechRecognizer = SpeechRecognizer.createSpeechRecognizer(context)
recognitionIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
recognitionIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US")

ACTION_RECOGNIZE_SPEECH constant starts an action that will prompt the user for speech and send it through a speech recognizer. EXTRA_LANGUAGE_MODEL: Informs the recognizer which speech model to prefer when performing ACTION_RECOGNIZE_SPEECH.
LANGUAGE_MODEL_FREE_FORM: Language model based on free-form speech recognition.
EXTRA_LANGUAGE: Optional tag for example “en-US”.
Check more on this link for recognitionIntent options.

Following this, establishing a listener on the speechRecognizer object suffices to extract the text content from the spoken voice. Check these lines of code at the github example.

Finally you can just start or stop listening to record the voice:

fun startListening() {
speechRecognizer?.startListening(recognitionIntent)
}

fun stopListening() {
speechRecognizer?.stopListening()
}

Inside the onResults override fun of the speechRecognizer listener we get the result bundle:

override fun onResults(results: Bundle?) {
// Called when recognition results are available
val matches = results?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
if (!matches.isNullOrEmpty()) {
val result = matches[0]
}

The “result” is the recognized text that is going to be passed at the PALM API so we can get the summarization/context of the spoken words.

Setting up the PALM API

To use the API, you need an API key. You can follow the instructions here to get one. As per the current day the API is open to use in specific countries, so users outside these regions have to use a VPN.

Settings everything together is pretty straightforward and you can follow along the instructions here. To link Speech recognizer with the PALM API we have to use the text output of the first to pass it into the second. Check how the implementation will be below:

//...implementation for Speech recognizer
override fun onResults(results: Bundle?) {
// Called when recognition results are available
val matches = results?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
if (!matches.isNullOrEmpty()) {
val result = matches[0]
//////////////////////////////////////
// For PALM API.
// Create the text prompt
val prompt = createPrompt(result)
// Send the first request
val request = createTextRequest(prompt)
generateText(request)
}
}

you can find the above lines of code at the main github repository.

The PALM API can easily create context. Sending directly the output from the Speech recognizer to PALM, you get responses based on your queries:

Responses with non manipulated text from Speech Recognizer.

However, to ensure it responds to our specific areas of interest, we must fine-tune the text prompt we send to the PALM API. Specifically if we want to instruct it to pick one of three choices only based on the given text, we can change the prompt to something like below:

val fullText = "$textContent means:" +
"1 Volume up\n" +
"2 Volume down\n" +
"3 Unidentified"

where “textContent” is the response from the Speech Recognizer. Some responses from the PALM API will be:

Responses with manipulated text from Speech Recognizer.

Based on the number output of the PALM API we can use the android component which is in charge of the audio manipulation to increase or decrease the volume:

Conclusion

This article showcases the integration of Google’s Speech Recognizer and PALM API into an Android application, enabling users to communicate through voice commands and receive context-aware responses. It explains the setup of the Speech Recognizer, the passage of recognized text to the PALM API, and the importance of fine-tuning input text for specific outcomes. The result is an Android app that empowers users to control UI actions, such as volume adjustments, by describing their surroundings.

--

--

George Soloupis
George Soloupis

Written by George Soloupis

I am a pharmacist turned android developer and machine learning engineer. Right now I work on accessibility on Zolup browser and I am a ML & Android GDE.

No responses yet