Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLM on Android with Keras and TensorFlow Lite

Wesley Kambale
December 02, 2023
71

LLM on Android with Keras and TensorFlow Lite

Using GPT-2, we used Keras to build a large language model to run on an Android device using TensorFlow Lite serving

Wesley Kambale

December 02, 2023
Tweet

Transcript

  1. What are LLMs? Large language models (LLMs) are ML models

    trained on a large text data to generate outputs for various NLP tasks, such as text generation, question answering, and machine translation. LLMs are just deep learning neural networks, based on the Transformer architecture invented by Google researchers in 2017. Ex: Google LaMDA, PaLM, GPT-2.
  2. What is KerasNLP? KerasNLP is a natural language processing library

    that supports users through the entire development cycle of building language models. KerasNLP comes with pre-trained models such as GPT-2 and is supported in the TensorFlow ecosystem for deployment to mobile devices with TensorFlow Lite
  3. The work plan… - Model building - Model conversion -

    Quantization - Android App integration
  4. gpt2_tokenizer = 
 keras_nlp.models.GPT2Tokenizer.from_preset("gpt2_base_en")
 
 gpt2_preprocessor = 
 keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
 "gpt2_base_en",


    sequence_length=256,
 add_end_token=True,
 )
 
 gpt2_lm = 
 keras_nlp.models.GPT2CausalLM.from_preset(
 "gpt2_base_en",
 preprocessor=gpt2_preprocessor
 )

  5. def run_inference(input, generate_tflite):
 interp = interpreter.InterpreterWithCustomOps(
 model_content=generate_tflite,
 custom_op_registerers=
 tf_text.tflite_registrar.SELECT_TFTEXT_OPS
 )


    
 interp.get_signature_list()
 
 generator = interp.get_signature_runner('serving_default')
 output = generator(prompt=np.array([input]))

  6. Note: TF text ops are not built-in ops in the

    TFLite runtime We need to add these custom ops in order for the interpreter to make inference on this model. The helper function accepts an input and a function that performs the conversion, namely the generator() function defined above.
  7. gpt2_lm.jit_compile = False
 converter = tf.lite.TFLiteConverter.from_concrete_functions(
 [concrete_func],
 gpt2_lm)
 
 """


    Code…
 """
 
 converter._experimental_guarantee_all_funcs_one_use = True
 generate_tflite = converter.convert()
 run_inference("I'm at DevFest Kampala", generate_tflite)

  8. gpt2_lm.jit_compile = False
 converter = tf.lite.TFLiteConverter.from_concrete_functions(
 [concrete_func],
 gpt2_lm)
 
 """


    Code…
 """
 
 converter._experimental_guarantee_all_funcs_one_use = True
 quant_generate_tflite = converter.convert()
 run_inference("I'm at DevFest Kampala", quant_generate_tflite)
 

  9. Resources Because these slides clearly aren’t enough to become an

    ‘LLM Pro’ - Colab Notebook - Codelab - TensorFlow tutorial