Question Answering with KerasNLP

4 min readFeb 25, 2024

Written by George Soloupis ML and Android GDE.

In this blog post, we delve into the task of question answering, wherein a Machine Learning (ML) model is tasked with extracting answers from input documents. Referred to as the “question answering task,” the model is provided with a context, which is essentially the input document, along with a question pertaining to the document’s content. Its objective? To pinpoint the exact span of text within the document housing the answer. This process involves the model computing two probability distributions across the document’s tokens, indicating the start and end positions of the answer span.

KerasNLP is a powerful library for building Natural Language Processing (NLP) applications using Keras, it is versatile and can handle various NLP tasks, including:

Text Classification: Categorizing text into different classes, like sentiment analysis or topic labeling.
Machine Translation: Converting text from one language to another.
Named Entity Recognition: Identifying and classifying named entities like people, locations, and organizations.
Text Summarization: Generating concise summaries of lengthy text.
Question Answering: Finding answers to questions from a given text corpus.

Let’s dive into the code to examine how we can benefit from KerasNLP for the Question Answering task. We are going to use Colab and the BertBackbone model.

First we install and upgrade KerasNLP library:

!pip install -q --upgrade keras-nlp

Then we import the necessary libraries to use for our purpose:

import numpy as np
import tensorflow as tf
import keras
import keras_nlp
import tensorflow_datasets as tfds
from tensorflow.keras import layers
import os
import re
import json
import string
import numpy as np

print(tf.__version__)

At this project we are using the BertWordPieceTokenizer as:

from tokenizers import BertWordPieceTokenizer
from transformers import BertTokenizer

# Save the slow pretrained tokenizer
slow_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
save_path = "bert_base_uncased/"
if not os.path.exists(save_path):
    os.makedirs(save_path)
slow_tokenizer.save_pretrained(save_path)

# Load the fast tokenizer from saved file
tokenizer = BertWordPieceTokenizer("bert_base_uncased/vocab.txt", lowercase=True)

The data to help us with this task is the SQUAD version 1.1:

train_data_url = "https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json"
train_path = keras.utils.get_file("train.json", train_data_url)
eval_data_url = "https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json"
eval_path = keras.utils.get_file("eval.json", eval_data_url)

Then we need to create the train and evaluation datasets. Follow along the procedure at this Colab notebook.

Since the backbone model is not ready for the Question answering task we have to customize it:


def create_model():
    ## BERT encoder
    encoder = keras_nlp.models.BertBackbone.from_preset("bert_medium_en_uncased")

    ## QA Model
    token_ids = layers.Input(shape=(max_len,), dtype=tf.int32)
    segment_ids = layers.Input(shape=(max_len,), dtype=tf.int32)
    padding_mask = layers.Input(shape=(max_len,), dtype=tf.int32)

    embedding = encoder(inputs={'token_ids': token_ids, 'segment_ids': segment_ids,'padding_mask': padding_mask})

    start_logits = layers.Dense(1, name="start_logit", use_bias=False)(embedding['sequence_output'])
    start_logits = layers.Flatten()(start_logits)

    end_logits = layers.Dense(1, name="end_logit", use_bias=False)(embedding['sequence_output'])
    end_logits = layers.Flatten()(end_logits)

    start_probs = layers.Activation(keras.activations.softmax)(start_logits)
    end_probs = layers.Activation(keras.activations.softmax)(end_logits)

    model = keras.Model(
        inputs=[token_ids, segment_ids, padding_mask],
        outputs=[start_probs, end_probs],
    )
    loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False)
    optimizer = keras.optimizers.Adam(learning_rate=5e-5)
    model.compile(optimizer=optimizer, loss=[loss, loss])
    return model

model = create_model()
model.summary()

Check at the end of this page for different types of the model.

We need to have a custom callback to evaluate the model after each epoch. For this procedure check the ExactMatch class at the Colab notebook.

Finally we start the training:

model.fit(
    x_train,
    y_train,
    epochs=10,  # 10 epochs are recommended
    verbose=1,
    batch_size=16, # for medium size model and 16GB RAM for the GPU
    callbacks=[exact_match_callback],
)

When the training is finished we can convert the model to .tflite format and use it inside an android application. The code for the conversion is as follows:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

# Save the model.
with open('model_bert_medium_512_quant.tflite', 'wb') as f:
  f.write(tflite_model)

You can download this trained and converted file here. This .tflite file corresponds to a medium size Bert model with 512 input sequence length. You can use it directly with this android application which is a revamp of this old application which had custom .aar libraries and one version of model to run.

Inference with TensorFlow Lite

We can continue and inference the model with the TensorFlow Lite Interpreter. First we can create the inputs:

max_len = 512
context = "Nikola Tesla (Serbian Cyrillic: 10 July 1856 - 7 January 1943) was a Serbian American inventor, electrical engineer, mechanical engineer, physicist, and futurist best known for his contributions to the design of the modern alternating current (AC) electricity supply system."
question = "In what year did Tesla die?"

tokenized_context = tokenizer.encode(context)
print(tokenized_context)
# Tokenize question
tokenized_question = tokenizer.encode(question)
print(tokenized_question)

# Create inputs
input_ids = tokenized_context.ids + tokenized_question.ids[1:]
token_type_ids = [0] * len(tokenized_context.ids) + [1] * len(
    tokenized_question.ids[1:]
)
attention_mask = [1] * len(input_ids)

# Pad and create attention masks.
# Skip if truncation is needed
padding_length = max_len - len(input_ids)
if padding_length > 0:  # pad
    input_ids = input_ids + ([0] * padding_length)
    attention_mask = attention_mask + ([0] * padding_length)
    token_type_ids = token_type_ids + ([0] * padding_length)

Then we feed the inputs into the Interpreter:

import numpy as np
import tensorflow as tf

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="/content/model_bert_medium_512_quant.tflite")

interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
#print(input_details)
#print(output_details)

input_ids = np.array(input_ids, dtype=np.int32)
input_ids = np.reshape(input_ids,(1, input_ids.size))

token_type_ids = np.array(token_type_ids, dtype=np.int32)
token_type_ids = np.reshape(token_type_ids,(1, token_type_ids.size))

attention_mask = np.array(attention_mask, dtype=np.int32)
attention_mask = np.reshape(attention_mask,(1, attention_mask.size))

interpreter.set_tensor(input_details[0]['index'], input_ids)
interpreter.set_tensor(input_details[1]['index'], attention_mask)
interpreter.set_tensor(input_details[2]['index'], token_type_ids)


interpreter.invoke()

Finally we print the answer based on the interpreter outputs:


output_data_0 = interpreter.get_tensor(output_details[0]['index'])
#print(output_data_0)
output_data_1 = interpreter.get_tensor(output_details[1]['index'])
#print(output_data_1)

start_position = tf.argmax(output_data_1, axis=1)
end_position = tf.argmax(output_data_0, axis=1)

answer = input_ids[0, int(start_position) : int(end_position) + 1]

logit = tokenizer.decode(answer.tolist())
print(logit)

Conclusion

In this blog post we described how to use KerasNLP for question answering tasks. We explained the concept of question answering, introduced KerasNLP, and guided you through the steps of building and deploying a question answering model using KerasNLP and TensorFlow Lite. Check the whole code at this GitHub repository.

Question Answering with KerasNLP

Inference with TensorFlow Lite

Conclusion

Written by George Soloupis