Language Model: ESM-1B[2]; Contextual language model trained unsupervised on large protein datasets to reconstruct sequences with masked amino acids. ◦ While the model cannot observe protein structure directly, it observes patterns in the sequences which are determined by structure. ◦ The model spans a representation space reflecting structural knowledge. • Selected Antibody Structure Prediction Model: DeepH3; learns to predict the inter residue distance and angles. ◦ It is “hooked” to the second-to-last layer of ESM-1B, which contains a richer representation, not only of the underlying amino acid sequence, but also encoded features relating to structural data. • Datasets: ◦ Unlabeled dataset: The UniProt Archive (UniParc)[3] with approximately 250 million sequences. ◦ Labeled dataset: SAbDab[4] dataset containing all the structure-labeled antibody sequences in the Protein Data Bank. After pre-processing, 1433 sequences were selected. ◦ Test Set: Rosetta antibody benchmark dataset[5] comprising of 49 curated antibody targets. [2] Rives A, et al., PNAS (2020) [3] Leinonen R, et al., Bioinformatics (2004) [4] Dunbar J, et al., Nucleic Acids Res. (2014) [5] Marze N.A, et al., Prot. Eng. Des. Selection (2016)