Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CLIP Indonesian

Galuh Sahid
December 04, 2021

CLIP Indonesian

PyCon ID 2021

Galuh Sahid

December 04, 2021
Tweet

More Decks by Galuh Sahid

Other Decks in Technology

Transcript

  1. Galuh Sahid | Dec 4, 2021
    CLIP-Indonesian
    Contrastive Language–Image Pre-training model trained on
    Indonesian Data

    View full-size slide

  2. Outline
    • High-level overview of CLIP

    • Building CLIP-Indonesian

    • Introducing JAX

    • Environment setup

    • Datasets

    • Code

    • Monitoring

    • Experiments

    • Demo

    View full-size slide

  3. Slides
    https://bit.ly/pycon-clip-indonesian


    GitHub repository
    https://github.com/galuhsahid/clip-indonesian
    Still a work in progress, so may not give the best performance (yet) :)

    View full-size slide

  4. High-level overview of CLIP: how
    to connect images and text with
    CLIP

    View full-size slide

  5. https://openai.com/blog/clip/

    View full-size slide

  6. What CLIP does
    Image classi
    fi
    cation task
    https://openai.com/blog/clip/

    View full-size slide

  7. What CLIP does
    Image classi
    fi
    cation task
    https://openai.com/blog/clip/

    View full-size slide

  8. What CLIP does
    Image search
    https://cloud.google.com/blog/topics/developers-practitioners/image-search-natural-language-queries

    View full-size slide

  9. How does CLIP work?
    Encoders
    The CLIP model consists of dual encoders:

    • a text encoder that will embed text into mathematical space

    • Examples: BERT, RoBERTa

    • an image encoder that will embed images into mathematical space

    • Examples: Vision transformer (ViT)
    https://openai.com/blog/clip/

    View full-size slide

  10. How does CLIP work?
    Measuring how good our model is
    https://openai.com/blog/clip/

    View full-size slide

  11. How does CLIP work?
    Measuring how good our model is
    https://openai.com/blog/clip/

    View full-size slide

  12. How does CLIP work?
    Zero-shot prediction
    https://openai.com/blog/clip/

    View full-size slide

  13. Fun 😭 fact: the original CLIP was trained on 400
    million image-text pairs and the training process
    took 30 days across 592 V100 GPUs.

    View full-size slide

  14. Fun 😭 fact: the original CLIP was trained on 400
    million image-text pairs and the training process
    took 30 days across 592 V100 GPUs.
    So... how can we build our own CLIP?

    View full-size slide

  15. Building CLIP-Indonesian

    View full-size slide

  16. Building CLIP-Indonesian
    What we need
    • Computing resources

    • Dataset

    • Code, compute-intensive NLP+CV

    • Monitoring system

    View full-size slide

  17. Building CLIP-Indonesian
    What we need
    • Computing resources → TPU Research Cloud

    • Dataset → A large image-text pairs dataset in Indonesian

    • Code, compute-intensive NLP+CV → Flax/Jax + HuggingFace

    • Monitoring system → Weights & Biases

    View full-size slide

  18. Computing resources

    View full-size slide

  19. Computing resources
    Signing up to TPU Research Cloud (https://sites.research.google/trc/about/)
    • Free TPU v2-8 and v3-8
    device(s)!

    • Participants in the TRC
    program will be expected to
    share their TRC-supported
    research with the world
    through peer-reviewed
    publications, open source
    code, blog posts, or other
    means.

    View full-size slide

  20. Computing resources
    Signing up to TPU Research Cloud (https://sites.research.google/trc/about/)

    View full-size slide

  21. Computing resources
    Setting up the TPU VM
    • There are two ways to set up the TPU VM: GUI (https://console.cloud.google.com/) or CLI

    • Tip: for projects requiring large datasets, you might need to set up persistent disks
    • Calculate how much data you'll need (in GB)

    • The zone for the disks must be the same as the zone of the TPU VM

    • These will not be covered by TRC, but you can use your GCP free trial credits ($300)

    View full-size slide

  22. Computing resources
    Setting up the TPU VM
    $ gcloud compute disks create clip-indonesian-disk-1 \


    --size 300GB \


    --zone europe-west4-a \


    --type pd-balanced
    1. Creating the
    fi
    rst persistent disk
    $ gcloud compute disks create clip-indonesian-disk-2 \


    --size 300GB \


    --zone europe-west4-a \


    --type pd-balanced
    2. Creating the second persistent disk

    View full-size slide

  23. Computing resources
    Setting up the TPU VM
    $ gcloud alpha compute tpus tpu-vm create clip-
    indonesian \


    --zone=europe-west4-a \


    --version=v2-alpha \


    --accelerator-type="v3-8" \


    --data-disk source=projects/clip-indonesian/
    zones/europe-west4-a/disks/clip-disk-1 \


    --data-disk source=projects/clip-indonesian/
    zones/europe-west4-a/disks/clip-disk-2
    3. Setting up the TPU VM
    Complete instruction on setting up TPU VM + persistent disk can be found here.
    (Don't forget to mount your disks based on the instruction!)
    4. SSH to your TPU VM
    $ gcloud alpha compute tpus tpu-vm ssh clip-indonesian
    \

    --zone=europe-west4-a

    View full-size slide

  24. https://github.com/galuhsahid/clip-indonesian/tree/master/data

    View full-size slide

  25. Building the Indonesian dataset
    The original CLIP model was trained on 400M pairs of image-text. Is such data
    a) available for the public b) in Indonesian?

    • Answer: it wasn't readily available, with a little bit of work, we can get some
    decent data :)

    View full-size slide

  26. What datasets are we using to build CLIP-Indonesian?
    Name
    Count
    (Train)*
    Count
    (Validation)*
    Original
    Dataset Link
    Translated Annotations Link
    CC12M 9,480,140 650,000 Link Link
    CC3M 2,520,816 300,000 Link Link
    COCO
    2017
    108,285 10,000 Link Link
    Flickr8k 5,670 800 Link Link
    WiT 89,610 9,000 Link
    Dataset is already in Indonesian
    (
    fi
    lter by lang = id)
    Total 12,204,521 969,800
    *) excludes broken images, SVGs, and images that cannot be downloaded. For WiT, also excludes image-text pairs with captions that have 80% of proper nouns.

    View full-size slide

  27. Building the Indonesian dataset
    What are the readily available image-text pairs datasets?
    Conceptual 12M (CC12M)
    ~12 million image-text pairs;
    covers a much more diverse
    set of visual concepts than
    CC3M. English
    Conceptual Captions (CC3M)
    3 million images, paired with
    natural-language captions,
    collected from the web. The raw
    descriptions were extracted from
    the alt-text HTML attribute of each
    image. English
    COCO (Microsoft Common
    Objects in Context)
    328K images. A large-scale
    object detection, segmentation,
    key-point detection, and
    captioning dataset. English

    View full-size slide

  28. Building the Indonesian dataset
    What are the readily available image-text pairs datasets?
    Wikipedia-based Image
    Text (WIT)
    A large multimodal
    multilingual dataset
    (including Indonesian!)
    Flickr 8k
    8,000 images that are each
    paired with
    fi
    ve di
    ff
    erent
    captions which provide clear
    descriptions of the salient
    entities and events. English

    View full-size slide

  29. Building the Indonesian dataset
    General step-by-step

    View full-size slide

  30. Building the Indonesian dataset
    General step-by-step
    1. Translate the dataset to Indonesian (except for the WiT dataset)
    File name Original caption Indonesian caption
    1000092795.
    jpg
    Two blonde haired youths looked at their hands while
    hanging out in the courtyard.
    Dua pemuda berambut lusuh melihat tangan mereka
    sambil nongkrong di halaman.
    1000092795.
    jpg
    Two young, white boys were outside near a bunch of
    bushes.
    Dua anak muda, laki-laki kulit putih berada di luar dekat
    banyak semak.
    1000092795.
    jpg Two men in green shirts are standing in the courtyard. Dua pria berkemeja hijau berdiri di halaman.
    1000092795.
    jpg A man in a blue shirt stands in the park. Seorang pria dengan kemeja biru berdiri di taman.
    1000092795.
    jpg Two friends enjoying time spent together. Dua teman menikmati waktu yang dihabiskan bersama.

    View full-size slide

  31. Building the Indonesian dataset
    General step-by-step
    1. Translate the dataset to Indonesian (except for the WiT dataset)
    Repository: https://github.com/acul3/translated-
    dataset

    Available datasets in Indonesian:

    • Flickr30

    • Coco (2017 train)

    • Sub Caption

    • VizWiz train

    • CC3M

    • CC12M

    View full-size slide

  32. Building the Indonesian dataset
    General step-by-step
    1. Translate the dataset to Indonesian (except for the WiT dataset)
    $ pip install mariantranslate
    $ from mariantranslate import Translator


    lang_from = "en" # source language


    lang_to = "id" # target language


    en_id_translator = Translator(lang_from,
    lang_to)


    en_id_translator.translate("Due to the
    limited vegetation cover of the Faroe
    Islands, it is relatively easy to follow the
    history of geology.")


    >>> Karena tumbuhan terbatas menutupi
    Kepulauan Faroe, relatif mudah untuk
    mengikuti sejarah geologi.


    View full-size slide

  33. Building the Indonesian dataset
    General step-by-step
    2. Download the images

    View full-size slide

  34. Building the Indonesian dataset
    General step-by-step
    2. Download the images (link to complete code)
    # Load data


    with contexttimer.Timer(prefix="Loading from tsv"):


    df = pd.read_csv(sys.argv[1], delimiter='\t', header=None)


    url_to_idx_map = {url: index for index, caption, url in df.itertuples()}


    base_dir = os.path.join(os.getcwd(), sys.argv[2])


    def process(item):


    url, image_id = item


    base_url = os.path.basename(url) # extract base url


    stem, ext = os.path.splitext(base_url) # split into stem and extension


    filename = f'{image_id:08d}---{stem}.jpg' # create filename


    filepath = os.path.join(base_dir, filename) # concat to get filepath


    if not os.path.isfile(filepath):


    req = requests.get(url, stream=True, timeout=1, verify=False).raw


    image = Image.open(req).convert('RGB')


    image.save(filepath) # save PIL image


    Downloads images
    Collects URLs that need to
    be downloaded

    View full-size slide

  35. Building the Indonesian dataset
    General step-by-step
    • Tip: since downloading all the images might take a while, it can be
    bene
    fi
    cial to implement multiprocessing
    • Multiprocessing enables a faster downloading process, however
    approximately ~20% # of images will be lost.

    • This might not be a problem for CC3M and CC12M that have a large #
    large datasets, but it's a problem for WiT data that only have ~100k of
    images-caption # pairs for Indonesian data.

    • Thus for WiT, we download the images without multiprocessing in order
    to preserve all images as many as possible.
    2. Download the images (link to complete code)

    View full-size slide

  36. Building the Indonesian dataset
    General step-by-step
    python downloaders/cc12m.py


    python downloaders/cc12m.py datasets/cc12m/cc12m.tsv datasets/cc12m/images


    • Tip: build a command-line interface for your script to make it easier
    for the programs to be run; you just need to de
    fi
    ne the input and
    output

    • This way you can make a shell script to replicate the procedures
    automatically
    2. Download the images (link to complete code)

    View full-size slide

  37. Building the Indonesian dataset
    General step-by-step
    3. Preprocess the dataset (link to complete code)
    • We need to process the datasets so that all datasets will have the same format

    • The code that we will be using accepts JSON lines (jsonl)
    fi
    les as input

    • The scripts in the `/preprocessors` folder convert JSON or .tsv
    fi
    les (depending on the dataset) into JSON lines
    fi
    les.

    • Each dataset will have a separate `train` and `val` dataset.

    • So in summary what the preprocessing script does is:

    • Separate training and validation dataset

    • Convert the original dataset (still in .csv, or .tsv) into a common jsonl
    fi
    le

    • At the end we will have
    fi
    les like cc12m_train.json, cc12m_val.json, cc3m_train.json, cc3m_val.json, etc. all following the
    same format.
    {"image_path": "29374927984.jpeg", "captions": "Buah di atas meja"}


    {"image_path": "34875339282.jpeg", "captions": "Orang sedang berlari di pantai"}

    View full-size slide

  38. Building the Indonesian dataset
    General step-by-step
    with open(annotation_file, "r") as f:


    annotations = json.load(f)["annotations"]


    image_path_to_caption = collections.defaultdict(list)


    for element in annotations:


    caption = f"{element['caption'].lower().rstrip('.')}"


    image_path = images_dir + "/%012d.jpg" % (element["image_id"])


    image_path_to_caption[image_path].append(caption)


    lines = []


    for image_path, captions in image_path_to_caption.items():


    lines.append(json.dumps({"image_path": image_path, "captions": captions}))


    # Train and validation split


    train_lines = lines[:-10_001]


    valid_lines = lines[-10_001:]


    with open(output_file+"_train.json", "w") as f:


    f.write("\n".join(train_lines))


    with open(output_file+"_val.json", "w") as f:


    f.write("\n".join(valid_lines))


    3. Preprocess the dataset (link to complete code). Sampel: COCO dataset
    Parse the caption and image
    path
    Convert to the JSON lines
    format
    Separate into training and
    validation
    Write into separate training
    and validation
    fi
    les

    View full-size slide

  39. Building the Indonesian dataset
    General step-by-step
    3. Preprocessing (sample: COCO dataset)
    python preprocessors/coco.py


    python preprocessors/coco.py datasets/coco/coco_captions_train2017.json datasets/coco/images datasets/coco/
    coco_dataset


    • The script will output two
    fi
    les: coco_dataset_train.json and
    coco_dataset_val.json

    View full-size slide

  40. Building the Indonesian dataset
    Additional steps for the WiT dataset
    • Tip: There are many di
    ff
    erent kinds of
    preprocessing that you can do to get a
    high quality dataset

    View full-size slide

  41. Building the Indonesian dataset
    Additional steps for the WiT dataset (source code)
    3. Remove image-text pairs that contain mostly of proper nouns
    # Setup CRFTagger


    ct = CRFTagger()


    ct.set_model_file('all_indo_man_tag_corpus_model.crf.tagger')


    # Load data


    df = pd.read_csv(sys.argv[1], delimiter='\t')


    df = df[["caption_reference_description", "image_url"]]


    def drop_propn(text):


    try:


    if len(text)==0:


    return True


    text = text.split()


    result = ct.tag_sents([text])


    nnp_cnt = 0


    total = len(result[0])


    for x in result[0]:


    if x[1] == "NNP":


    nnp_cnt += 1




    if (nnp_cnt/total) >= sys.argv[3]:


    return True


    return False


    except Exception as e:


    print(e)


    return True


    df["to_drop"] = df["caption_reference_description"].apply(drop_propn)


    df = df[df["to_drop"]==False]


    df = df.drop("to_drop",axis=1)


    df.to_csv(sys.argv[2], sep='\t')
    Load part-of-speech (POS)
    tagger
    Calculate percentage of
    proper noun (NNP)
    Only keep image-caption
    pairs where to_drop=False

    View full-size slide

  42. Building the Indonesian dataset
    Merging them all together
    awk 1 cc12m_dataset_disk1_train.json cc12m_dataset_disk2_train.json
    cc3m_dataset_train.json coco_dataset_train.json flickr8k_dataset_train.json
    wit_dataset_train.json > train_dataset_v6.json
    awk 1 cc12m_dataset_disk1_val.json cc12m_dataset_disk2_val.json cc3m_dataset_val.json
    coco_dataset_val.json flickr8k_dataset_val.json wit_dataset_val.json >
    val_dataset_v6.json

    View full-size slide

  43. https://github.com/galuhsahid/clip-indonesian

    View full-size slide

  44. https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects/hybrid_clip

    View full-size slide

  45. Code
    Overview
    • Uses the JAX/Flax backend

    • Is a vision-text dual encoder model using a pre-trained vision and text
    encoder
    • For the image encoder, we use Vision Transformer (ViT), more
    speci
    fi
    cally openai/clip-vit-base-patch32.

    • For the text encoder, we experimented with two models: IndoBERT Large
    (indobenchmark/indobert-base-p2) and Indonesian RoBERTa Base (
    fl
    ax-
    community/indonesian-roberta-base).

    • The CLIP-Indonesian model uses a modi
    fi
    ed code from the HybridCLIP code

    View full-size slide

  46. Code
    Intro to JAX/Flax: JAX motivating example
    Classic Numpy
    • JAX: a framework that is speci
    fi
    cally suited for Machine Learning Research
    • What's missing?
    • Running on accelerated hardware
    (GPU/TPU)

    • Fast optimization via automatic
    di
    ff
    erentiation

    • Parallelization of data and
    computation
    https://www.youtube.com/watch?v=WdTeDXsOSj4

    View full-size slide

  47. Code
    Intro to JAX/Flax: JAX motivating example
    Classic Numpy
    Replace numpy with
    jax.numpy
    https://www.youtube.com/watch?v=WdTeDXsOSj4

    View full-size slide

  48. Code
    Intro to JAX/Flax: JAX motivating example
    Classic Numpy
    https://www.youtube.com/watch?v=WdTeDXsOSj4
    Apply jax.grad

    View full-size slide

  49. Code
    Intro to JAX/Flax: JAX motivating example
    Classic Numpy
    https://www.youtube.com/watch?v=WdTeDXsOSj4
    Apply jax.vmap

    View full-size slide

  50. Code
    Intro to JAX/Flax: JAX motivating example
    Classic Numpy
    https://www.youtube.com/watch?v=WdTeDXsOSj4
    Apply Just in Time (JIT)
    compilation

    View full-size slide

  51. Code
    Intro to JAX/Flax: JAX motivating example
    Classic Numpy
    https://www.youtube.com/watch?v=WdTeDXsOSj4
    Apply pmap

    View full-size slide

  52. Code
    Intro to JAX/Flax: JAX examples in the wild
    https://www.youtube.com/watch?v=WdTeDXsOSj4

    View full-size slide

  53. Code
    Intro to JAX/Flax: What is Flax?
    • Flax: a deep learning framework built on top of JAX

    • Contains all the usual elements you usually encounter in deep learning frameworks:

    • Neural network API (
    fl
    ax.linen): Dense, Conv, {Batch|Layer|Group} Norm, Attention, Pooling, {LSTM|GRU} Cell,
    Dropout

    • Optimizers (
    fl
    ax.optim): SGD, Momentum, Adam, LARS, Adagrad, LAMB, RMSprop

    • Utilities and patterns: replicated training, serialization and checkpointing, metrics, prefetching on device

    • And more!
    https://github.com/google/
    fl
    ax

    View full-size slide

  54. Code
    FlaxHybridCLIP code from HuggingFace
    https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects/hybrid_clip

    View full-size slide

  55. Code
    FlaxHybridCLIP code from HuggingFace
    https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects/hybrid_clip
    python run_hybrid_clip.py \


    --output_dir ${MODEL_DIR} \


    --text_model_name_or_path="roberta-base" \


    --vision_model_name_or_path="openai/clip-vit-base-
    patch32" \


    --tokenizer_name="roberta-base" \


    --train_file="coco_dataset/train_dataset.json" \


    --validation_file="coco_dataset/
    validation_dataset.json" \


    --do_train --do_eval \


    --num_train_epochs="40" --max_seq_length 96 \


    --per_device_train_batch_size="64" \


    --per_device_eval_batch_size="64" \


    --learning_rate="5e-5" --warmup_steps="0" --
    weight_decay 0.1 \


    --overwrite_output_dir \


    --preprocessing_num_workers 32 \


    --push_to_hub


    View full-size slide

  56. Code
    Prior work: clip-italian
    https://arxiv.org/pdf/2108.08688.pdf

    View full-size slide

  57. Code
    Modi
    fi
    cations: Image augmentation
    1. Code for image augmentation (source code; docs for torchvision transforms; based on clip-italian)
    self.transforms = torch.nn.Sequential(


    Resize([image_size], interpolation=InterpolationMode.BICUBIC),


    RandomCrop([image_size], pad_if_needed=True, padding_mode="edge"),


    ColorJitter(hue=0.1),


    RandomHorizontalFlip(),


    RandomAffine(


    degrees=15,


    translate=(0.1, 0.1),


    scale=(0.8, 1.2),


    shear=(-15, 15, -15, 15),


    interpolation=InterpolationMode.BILINEAR,


    fill=127,


    ),


    RandomPerspective(


    distortion_scale=0.3,


    p=0.3,


    interpolation=InterpolationMode.BILINEAR,


    fill=127,


    ),


    RandomAutocontrast(p=0.3),


    RandomEqualize(p=0.3),


    ConvertImageDtype(torch.float),


    Normalize(


    (0.48145466, 0.4578275, 0.40821073),


    (0.26862954, 0.26130258, 0.27577711),


    ),


    )


    View full-size slide

  58. Code
    Modi
    fi
    cations: Better optimizer
    2. Optimizer (source code; based on clip-italian)
    optimizer = optax.chain(


    optax.adaptive_grad_clip(0.01, eps=0.001),


    optax.scale_by_belief(),


    optax.scale_by_schedule(decay_lr_schedule_fn),


    optax.scale(-1.0),


    )


    View full-size slide

  59. Code
    Modi
    fi
    cations: backbone
    freezing
    3. Backbone Freezing (source code; based on clip-italian)
    image_embeds = vision_outputs[1]


    if self.freeze_backbones:


    image_embeds = jax.lax.stop_gradient(image_embeds)


    image_embeds = self.visual_projection(image_embeds)


    text_embeds = text_outputs[1]


    if self.freeze_backbones:


    text_embeds = jax.lax.stop_gradient(text_embeds)


    text_embeds = self.text_projection(text_embeds)


    View full-size slide

  60. Code
    Running the script
    #!/bin/bash


    SCRIPT_DIR=clip-indonesian


    MODEL_DIR=/mnt/disks/data-1/models/training_indobert


    IMAGE_ENCODER="openai/clip-vit-base-patch32"


    TEXT_ENCODER="indobenchmark/indobert-base-p2"


    python ${SCRIPT_DIR}/run_hybrid_clip.py \


    --output_dir ${MODEL_DIR} \


    --overwrite_output_dir \


    --tokenizer_name=${TEXT_ENCODER} \


    --train_file="../data/train_dataset_v6.json" \


    --validation_file="../data/val_dataset_v6.json" \


    --do_train --do_eval \


    --num_train_epochs="10" --max_seq_length 96 \


    --per_device_train_batch_size="64" \


    --per_device_eval_batch_size="64" \


    --learning_rate="0.00005" --warmup_ratio 0.1 --weight_decay 0.0 \


    --preprocessing_num_workers 16 \


    --exp_name training_v3 \


    --text_model_name_or_path=${TEXT_ENCODER} \


    --vision_model_name_or_path=${IMAGE_ENCODER} \


    --eval_steps 500 \


    --logging_steps 100 \


    --save_steps 500 \


    --save_total_limit 5 \


    --adabelief \


    --freeze_backbones


    View full-size slide

  61. Monitoring
    Setting up Weights & Biases

    View full-size slide

  62. Monitoring
    Setting up Weights & Biases

    View full-size slide

  63. Monitoring
    Setting up Weights & Biases
    # Enable wandb


    if jax.process_index() == 0 and args.log_wandb:


    try:


    wandb.init(


    name=args.exp_name,


    entity="galuh",


    project="clip-Indonesian",


    sync_tensorboard=True


    )


    wandb.config.update(training_args)


    wandb.config.update(model_args)


    wandb.config.update(data_args)


    except ImportError as e:


    print(e)


    Enabling wandb (source code)

    View full-size slide

  64. Experiments
    • All image encoders are
    OpenAI ViT

    • Interestingly, in terms of
    validation loss, using
    IndoBERT large vs
    Roberta Base does not
    di
    ff
    er much

    View full-size slide

  65. Demo
    Zero-shot image classification


    Image search on Unsplash25k dataset

    View full-size slide

  66. Future improvements
    • Text/caption augmentation

    • Quantitative evaluation (e.g. MRR, accuracy)

    • Web app demo

    View full-size slide

  67. Takeaway
    It's possible to do a project that requires a lot of computation and data on your own
    • Computing resources -> TPU Research Cloud

    • Dataset -> A large image-text pairs dataset in Indonesian

    • Code, compute-intensive NLP+CV -> Flax/Jax + HuggingFace

    • Monitoring system -> Weights & Biases

    View full-size slide

  68. References
    Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G., Lakshmi, S. (2021). Contrastive Language-Image Pre-training for
    the Italian Language arXiv preprint arXiv:2108.08688
    .

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G.,
    & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML
    .

    Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., ... & Purwarianti, A. (2020). IndoNLU: Benchmark and
    resources for evaluating Indonesian natural language understanding. arXiv preprint arXiv:2009.05387
    .

    Hybrid CLIP by the HuggingFace tea
    m

    Indonesian Roberta Base by Wilson Wongso, Steven Limcorn, Samsul Rahmadani, and Chew Kok Wa
    h

    Indonesian Translated Datasets by Samsul Rahmadan

    View full-size slide

  69. Thank you!
    email: [email protected]
    linkedin: linkedin.com/in/galuhsahid

    View full-size slide