LLM on Android with Keras and TensorFlow Lite

LLM on Android with Keras and TensorFlow Lite Cloud Kampala

What are LLMs? Large language models (LLMs) are ML models
trained on a large text data to generate outputs for various NLP tasks, such as text generation, question answering, and machine translation. LLMs are just deep learning neural networks, based on the Transformer architecture invented by Google researchers in 2017. Ex: Google LaMDA, PaLM, GPT-2.

What is KerasNLP? KerasNLP is a natural language processing library
that supports users through the entire development cycle of building language models. KerasNLP comes with pre-trained models such as GPT-2 and is supported in the TensorFlow ecosystem for deployment to mobile devices with TensorFlow Lite

The work plan… - Model building - Model conversion -
Quantization - Android App integration

Model building… • KerasNLP • GPT-2 Cloud Kampala

gpt2_tokenizer =   keras_nlp.models.GPT2Tokenizer.from_preset("gpt2_base_en")    gpt2_preprocessor =   keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(  "gpt2_base_en", 
sequence_length=256,  add_end_token=True,  )    gpt2_lm =   keras_nlp.models.GPT2CausalLM.from_preset(  "gpt2_base_en",  preprocessor=gpt2_preprocessor  ) 

Model conversion… • Keras -> TFLite Cloud Kampala

@tf.function  def generate(prompt, max_length):    return gpt2_lm.generate(prompt, max_length)    concrete_func
= generate.get_concrete_function(tf.TensorSpec([], tf.string), 100) 

Note: You can also use from_keras_model() from TFLiteConverter in order
to perform the conversion.

def run_inference(input, generate_tflite):  interp = interpreter.InterpreterWithCustomOps(  model_content=generate_tflite,  custom_op_registerers=  tf_text.tflite_registrar.SELECT_TFTEXT_OPS  ) 
  interp.get_signature_list()    generator = interp.get_signature_runner('serving_default')  output = generator(prompt=np.array([input])) 

Note: TF text ops are not built-in ops in the
TFLite runtime We need to add these custom ops in order for the interpreter to make inference on this model. The helper function accepts an input and a function that performs the conversion, namely the generator() function defined above.

gpt2_lm.jit_compile = False  converter = tf.lite.TFLiteConverter.from_concrete_functions(  [concrete_func],  gpt2_lm)    """ 
Code…  """    converter._experimental_guarantee_all_funcs_one_use = True  generate_tflite = converter.convert()  run_inference("I'm at DevFest Kampala", generate_tflite) 

Quantization… • Dynamic range • FP16 • Full integer Cloud
Kampala

gpt2_lm.jit_compile = False  converter = tf.lite.TFLiteConverter.from_concrete_functions(  [concrete_func],  gpt2_lm)    """ 
Code…  """    converter._experimental_guarantee_all_funcs_one_use = True  quant_generate_tflite = converter.convert()  run_inference("I'm at DevFest Kampala", quant_generate_tflite)   

Android integration… • Android Studio 2022.2.1 or above • Android
device/emulator Cloud Kampala

Let’s see this in play… Cloud Kampala

Resources Because these slides clearly aren’t enough to become an
‘LLM Pro’ - Colab Notebook - Codelab - TensorFlow tutorial

Cloud Kampala Thank you! Any questions? Wesley Kambale @weskambale kambale.dev

LLM on Android with Keras and TensorFlow Lite

LLM on Android with Keras and TensorFlow Lite

Wesley Kambale

More Decks by Wesley Kambale

Featured

Transcript