LLM on Android with Keras and TensorFlow Lite

Slide 1

Slide 1 text

LLM on Android with Keras and TensorFlow Lite Cloud Kampala

Slide 2

Slide 2 text

What are LLMs? Large language models (LLMs) are ML models trained on a large text data to generate outputs for various NLP tasks, such as text generation, question answering, and machine translation. LLMs are just deep learning neural networks, based on the Transformer architecture invented by Google researchers in 2017. Ex: Google LaMDA, PaLM, GPT-2.

Slide 3

Slide 3 text

What is KerasNLP? KerasNLP is a natural language processing library that supports users through the entire development cycle of building language models. KerasNLP comes with pre-trained models such as GPT-2 and is supported in the TensorFlow ecosystem for deployment to mobile devices with TensorFlow Lite

Slide 4

Slide 4 text

The work plan… - Model building - Model conversion - Quantization - Android App integration

Slide 5

Slide 5 text

Model building… ● KerasNLP ● GPT-2 Cloud Kampala

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

gpt2_tokenizer =   keras_nlp.models.GPT2Tokenizer.from_preset("gpt2_base_en")    gpt2_preprocessor =   keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(  "gpt2_base_en",  sequence_length=256,  add_end_token=True,  )    gpt2_lm =   keras_nlp.models.GPT2CausalLM.from_preset(  "gpt2_base_en",  preprocessor=gpt2_preprocessor  ) 

Slide 8

Slide 8 text

Model conversion… ● Keras -> TFLite Cloud Kampala

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

@tf.function  def generate(prompt, max_length):    return gpt2_lm.generate(prompt, max_length)    concrete_func = generate.get_concrete_function(tf.TensorSpec([], tf.string), 100) 

Slide 11

Slide 11 text

Note: You can also use from_keras_model() from TFLiteConverter in order to perform the conversion.

Slide 12

Slide 12 text

def run_inference(input, generate_tflite):  interp = interpreter.InterpreterWithCustomOps(  model_content=generate_tflite,  custom_op_registerers=  tf_text.tflite_registrar.SELECT_TFTEXT_OPS  )    interp.get_signature_list()    generator = interp.get_signature_runner('serving_default')  output = generator(prompt=np.array([input])) 

Slide 13

Slide 13 text

Note: TF text ops are not built-in ops in the TFLite runtime We need to add these custom ops in order for the interpreter to make inference on this model. The helper function accepts an input and a function that performs the conversion, namely the generator() function defined above.

Slide 14

Slide 14 text

gpt2_lm.jit_compile = False  converter = tf.lite.TFLiteConverter.from_concrete_functions(  [concrete_func],  gpt2_lm)    """  Code…  """    converter._experimental_guarantee_all_funcs_one_use = True  generate_tflite = converter.convert()  run_inference("I'm at DevFest Kampala", generate_tflite) 

Slide 15

Slide 15 text

Quantization… ● Dynamic range ● FP16 ● Full integer Cloud Kampala

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

gpt2_lm.jit_compile = False  converter = tf.lite.TFLiteConverter.from_concrete_functions(  [concrete_func],  gpt2_lm)    """  Code…  """    converter._experimental_guarantee_all_funcs_one_use = True  quant_generate_tflite = converter.convert()  run_inference("I'm at DevFest Kampala", quant_generate_tflite)   

Slide 18

Slide 18 text

Android integration… ● Android Studio 2022.2.1 or above ● Android device/emulator Cloud Kampala

Slide 19

Slide 19 text

Let’s see this in play… Cloud Kampala

Slide 20

Slide 20 text

Resources Because these slides clearly aren’t enough to become an ‘LLM Pro’ - Colab Notebook - Codelab - TensorFlow tutorial

Slide 21

Slide 21 text

Cloud Kampala Thank you! Any questions? Wesley Kambale @weskambale kambale.dev