Save 37% off PRO during our Black Friday Sale! »

Going Organic - Genomic sequencing in Elasticsearch

Going Organic - Genomic sequencing in Elasticsearch

My presentation from Elasticsearch Boston, given at @hackreduce. It details my journey through a side project - the problems, the pitfalls and finally a locality sensitive hashing algorithm that saved the day.

Deca372d0ae79d8d84f90e1e68bd1618?s=128

Zachary Tong

August 14, 2013
Tweet

Transcript

  1. Going Organic Genomic sequence alignment in elasticsearch Tuesday, August 13,

    13
  2. Going Organic A story of a side project gone horribly

    wrong Tuesday, August 13, 13
  3. you are human. hopefully ^ Tuesday, August 13, 13

  4. you have lots of these 60s ribosomal protein Tuesday, August

    13, 13
  5. this is a bacteria. ʘ‿ʘ hey Tuesday, August 13, 13

  6. he has a bunch of these too 50s ribosomal protein

    Tuesday, August 13, 13
  7. They look pretty similar... 60s ribosomal protein 50s ribosomal protein

    Tuesday, August 13, 13
  8. ...or do they? MAKLTKRMRVIREKVDATKQ YDINEAIALLKELATAKFVE SVDVAVNLGIDARKSDQNVR GATVLPHGTGRSVRVAVFTQ GANAEAAKAAGAELVGMEDL ADQIKKGEMNFDVVIASPDA MRVVGQLGQVLGPRGLMPNP

    KVGTVTPNVAEAVKNAKAGQ VRYRNDKNGIIHTTIGKVDF DADKLKENLEALLVALKKAK PTQAKGVYIKKVSISTTMGA GVAVDQAGLSASVN MAQDQGEKENPMRELRIRKL CLNICVGESGDRLTRAAKVL EQLTGQTPVFSKARYTVRSF GIRRNEKIAVHCTVRGAKAE EILEKGLKVREYELRKNNFS DTGNFGFGIQEHIDLGIKYD PSIGIYGLDFYVVLGRPGFS IADKKRRTGCIGAKHRISKE EAMRWFQQKYDGIILPGK 60s ribosomal protein 50s ribosomal protein Tuesday, August 13, 13
  9. ಠ_ಠ @ZacharyTong Tuesday, August 13, 13

  10. Shhh...don’t tell them I’m a biologist Tuesday, August 13, 13

  11. 60% similar MAKLTKRMRVIREKVDATKQ YDINEAIALLKELATAKFVE SVDVAVNLGIDARKSDQNVR GATVLPHGTGRSVRVAVFTQ GANAEAAKAAGAELVGMEDL ADQIKKGEMNFDVVIASPDA MRVVGQLGQVLGPRGLMPNP KVGTVTPNVAEAVKNAKAGQ

    VRYRNDKNGIIHTTIGKVDF DADKLKENLEALLVALKKAK PTQAKGVYIKKVSISTTMGA GVAVDQAGLSASVN MAQDQGEKENPMRELRIRKL CLNICVGESGDRLTRAAKVL EQLTGQTPVFSKARYTVRSF GIRRNEKIAVHCTVRGAKAE EILEKGLKVREYELRKNNFS DTGNFGFGIQEHIDLGIKYD PSIGIYGLDFYVVLGRPGFS IADKKRRTGCIGAKHRISKE EAMRWFQQKYDGIILPGK 60s ribosomal protein 50s ribosomal protein Tuesday, August 13, 13
  12. Exact Identity Tuesday, August 13, 13

  13. Exact Identity Physical Similarity Tuesday, August 13, 13

  14. Exact Identity Physical Similarity Gaps Tuesday, August 13, 13

  15. Exact Identity Physical Similarity Gaps High Score! Tuesday, August 13,

    13
  16. actual sequence alignment Tuesday, August 13, 13

  17. exact identity Tuesday, August 13, 13

  18. physical similarity Tuesday, August 13, 13

  19. gaps Tuesday, August 13, 13

  20. pairwise alignment problem. is a solved Tuesday, August 13, 13

  21. so, what’s the problem? Tuesday, August 13, 13

  22. pairwise alignment is the problem. Tuesday, August 13, 13

  23. pairwise alignment glacially is slow Tuesday, August 13, 13

  24. “nr” database Tuesday, August 13, 13

  25. 16 Gb flat file “nr” database Tuesday, August 13, 13

  26. 26,000,000 sequences 16 Gb flat file “nr” database Tuesday, August

    13, 13
  27. 26,000,000 sequences 9,600,000,000 residues 16 Gb flat file “nr” database

    Tuesday, August 13, 13
  28. MTTQAPTFTQPLQSVVVLEGSTATFEAHISGFPVPEVSWFRDGQVISTSTLPGVQISFSDGRAKLTIPAVTKANSGRYSLKATNGSGQATSTAELLVKAETAPPNFVQRLQSMTVRQGSQVRLQVRVTGIPTPVVKFYRDGAEIQSSLDFQISQEGDLYSLLIAEAYPEDSGTYSVNATNSVGRATSTAELLVQGEEEVPAKKTKTIVSTAQISESRQTRIEKKIEAHFDARSIATVEMVIDGAAGQQLPHKTPPRIPPKPKSRSPTPPSIAAKAQLARQQSPSPIRHSPSPVRHVRAPTPSPVRSVSPAARISTSPIRSVRSPLLMRKTQA STVATGPEVPPPWKQEGYVASSSEAEMRETTLTTSTQIRTEERWEGRYGVQEQVTISGAAGAAASVSASASYAAEAVATGAKEVKQDADKSAAVATVVAAVDMARVREPVISAVEQTAQRTTTTAVHIQPAQEQVRKEAEKTAVTKVVVAADKAKEQELKSRTKEVITTKQEQMHVTHEQIRKETEKTFVPKVVISAAKAKEQETRISEEITKKQKQVTQEAIRQETEITAASMVVVATAKSTKLETVPGAQEETTTQQDQMHLSYEKIMKETRKTVVPKVIVATPKVKEQDLVSRGREGITTKREQVQITQEKMRKEAEKTALSTIAVATA KAKEQETILRTRETMATRQEQIQVTHGKVDVGKKAEAVATVVAAVDQARVREPREPGHLEESYAQQTTLEYGYKERISAAKVAEPPQRPASEPHVVPKAVKPRVIQAPSETHIKTTDQKGMHISSQIKKTTDLTTERLVHVDKRPRTASPHFTVSKISVPKTEHGYEASIAGSAIATLQKELSATSSAQKITKSVKAPTVKPSETRVRAEPTPLPQFPFADTPDTYKSEAGVEVKKEVGVSITGTTVREERFEVLHGREAKVTETARVPAPVEIPVTPPTLVSGLKNVTVIEGESVTLECHISGYPSPTVTWYREDYQIESSIDFQITFQSG IARLMIREAFAEDSGRFTCSAVNEAGTVSTSCYLAVQVSEEFEKETTAVTEKFTTEEKRFVESRDVVMTDTSLTEEQAGPGEPAAPYFITKPVVQKLVEGGSVVFGCQVGGNPKPHVYWKKSGVPLTTGYRYKVSYNKQTGECKLVISMTFADDAGEYTIVVRNKHGETSASASLLEEADYELLMKSQQEMLYQTQVTAFVQEPKVGETAPGFVYSEYEKEYEKEQALIRKKMAKDTVVVRTYVEDQEFHISSFEERLIKEIEYRIIKTTLEELLEEDGEEKMAVDISESEAVESGFDSRIKNYRILEGMGVTFHCKMSGYPLPKIAWYKDG KRIKHGERYQMDFLQDGRASLRIPVVLPEDEGIYTAFASNIKGNAICSGKLYVEPAAPLGAPTYIPTLEPVSRIRSLSPRSVSRSPIRMSPARMSPARMSPARMSPARMSPGRRLEETDESQLERLYKPVFVLKPVSFKCLEGQTARFDLKVVGRPMPETFWFHDGQQIVNDYTHKVVIKEDGTQSLIIVPATPSDSGEWTVVAQNRAGRSSISVILTVEAVEHQVKPMFVEKLKNVNIKEGSRLEMKVRATGNPNPDIVWLKNSDIIVPHKYPKIRIEGTKGEAALKIDSTVSQDSAWYTATAINKAGRDTTRCKVNVEVEFAEPEPERKL IIPRGTYRAKEIAAPELEPLHLRYGQEQWEEGDLYDKEKQQKPFFKKKLTSLRLKRFGPAHFECRLTPIGDPTMVVEWLHDGKPLEAANRLRMINEFGYCSLDYGVAYSRDSGIITCRATNKYGTDHTSATLIVKDEKSLVEESQLPEGRKGLQRIEELERMAHEGALTGVTTDQKEKQKPDIVLYPEPVRVLEGETARFRCRVTGYPQPKVNWYLNGQLIRKSKRFRVRYDGIHYLDIVDCKSYDTGEVKVTAENPEGVIEHKVKLEIQQREDFRSVLRRAPEPRPEFHVHEPGKLQFEVQKVDRPVDTTETKEVVKLKRAERITHEKVPE ESEELRSKFKRRTEEGYYEAITAVELKSRKKDESYEELLRKTKDELLHWTKELTEEEKKALAEEGKITIPTFKPDKIELSPSMEAPKIFERIQSQTVGQGSDAHFRVRVVGKPDPECEWYKNGVKIERSDRIYWYWPEDNVCELVIRDVTAEDSASIMVKAINIAGETSSHAFLLVQAKQLITFTQELQDVVAKEKDTMATFECETSEPFVKVKWYKDGMEVHEGDKYRMHSDRKVHFLSILTIDTSDAEDYSCVLVEDENVKTTAKLIVEGAVVEFVKELQDIEVPESYSGELECIVSPENIEGKWYHNDVELKSNGKYTITSRRGRQNLT VKDVTKEDQGEYSFVIDGKKTTCKLKMKPRPIAILQGLSDQKVCEGDIVQLEVKVSLESVEGVWMKDGQEVQPSDRVHIVIDKQSHMLLIEDMTKEDAGNYSFTIPALGLSTSGRVSVYSVDVITPLKDVNVIEGTKAVLECKVSVPDVTSVKWYLNDEQIKPDDRVQAIVKGTKQRLVINRTHASDEGPYKLIVGRVETNCNLSVEKIKIIRGLRDLTCTETQNVVFEVELSHSGIDVLWNFKDKEIKPSSKYKIEAHGKIYKLTVLNMMKDDEGKYTFYAGENMTSGKLTVAGGAISKPLTDQTVAESQEAVFECEVANPDSKGEWLRDG KHLPLTNNIRSESDGHKRRLIIAATKLDDIGEYTYKVATSKTSAKLKVEAVKIKKTLKNLTVTETQDAVFTVELTHPNVKGVQWIKNGVVLESNEKYAISVKGTIYSLRIKNCAIVDESVYGFRLGRLGASARLHVETVKIIKKPKDVTALENATVAFEVSVSHDTVPVKWFHKSVEIKPSDKHRLVSERKVHKLMLQNISPSDAGEYTAVVGQLECKAKLFVETLHITKTMKNIEVPETKTASFECEVSHFNVPSMWLKNGVEIEMSEKFKIVVQGKLHQLIIMNTSTEDSAEYTFVCGNDQVSATLTVTPIMITSMLKDINAEEKDTITF EVTVNYEGISYKWLKNGVEIKSTDKCQMRTKKLTHSLNIRNVHFGDAADYTFVAGKATSTATLYVEARHIEFRKHIKDIKVLEKKRAMFECEVSEPDITVQWMKDDQELQITDRIKIQKEKYVHRLLIPSTRMSDAGKYTVVAGGNVSTAKLFVEGRDVRIRSIKKEVQVIEKQRAVVEFEVNEDDVDAHWYKDGIEINFQVQERHKYVVERRIHRMFISETRQSDAGEYTFVAGRNRSSVTLYVNAPEPPQVLQELQPVTVQSGKPARFCAVISGRPQPKISWYKEEQLLSTGFKCKFLHDGQEYTLLLIEAFPEDAAVYTCEAKNDYGVA

    TTSASLSVEVPEVVSPDQEMPVYPPAIITPLQDTVTSEGQPARFQCRVSGTDLKVSWYSKDKKIKPSRFFRMTQFEDTYQLEIAEAYPEDEGTYTFVASNAVGQVSSTANLSLEAPESILHERIEQEIEMEMKEFSSSFLSAEEEGLHSAELQLSKINETLELLSESPVYPTKFDSEKEGTGPIFIKEVSNADISMGDVATLSVTVIGIPKPKIQWFFNGVLLTPSADYKFVFDGDDHSLIILFTKLEDEGEYTCMASNDYGKTICSAYLKINSKGEGHKDTETESAVAKSLEKLGGPCPPHFLKELKPIRCAQGLPAIFEYTVVGEPAPTV TWFKENKQLCTSVYYTIIHNPNGSGTFIVNDPQREDSGLYICKAENMLGESTCAAELLVLLEDTDMTDTPCKAKSTPEAPEDFPQTPLKGPAVEALDSEQEIATFVKDTILKAALITEENQQLSYEHIAKANELSSQLPLGAQELQSILEQDKLTPESTREFLCINGSIHFQPLKEPSPNLQLQIVQSQKTFSKEGILMPEEPETQAVLSDTEKIFPSAMSIEQINSLTVEPLKTLLAEPEGNYPQSSIEPPMHSYLTSVAEEVLSPKEKTVSDTNREQRVTLQKQEAQSALILSQSLAEGHVESLQSPDVMISQVNYEPLVPSEHSCTEGG KILIESANPLENAGQDSAVRIEEGKSLRFPLALEEKQVLLKEEHSDNVVMPPDQIIESKREPVAIKKVQEVQGRDLLSKESLLSGIPEEQRLNLKIQICRALQAAVASEQPGLFSEWLRNIEKVEVEAVNITQEPRHIMCMYLVTSAKSVTEEVTIIIEDVDPQMANLKMELRDALCAIIYEEIDILTAEGPRIQQGAKTSLQEEMDSFSGSQKVEPITEPEVESKYLISTEEVSYFNVQSRVKYLDATPVTKGVASAVVSDEKQDESLKPSEEKEESSSESGTEEVATVKIQEAEGGLIKEDGPMIHTPLVDTVSEEGDIVHLTTSITNAK EVNWYFENKLVPSDEKFKCLQDQNTYTLVIDKVNTEDHQGEYVCEALNDSGKTATSAKLTVVKRAAPVIKRKIEPLEVALGHLAKFTCEIQSAPNVRFQWFKAGREIYESDKCSIRSSKYISSLEILRTQVVDCGEYTCKASNEYGSVSCTATLTVTEAYPPTFLSRPKSLTTFVGKAAKFICTVTGTPVIETIWQKDGAALSPSPNWRISDAENKHILELSNLTIQDRGVYSCKASNKFGADICQAELIIIDKPHFIKELEPVQSAINKKVHLECQVDEDRKVTVTWSKDGQKLPPGKDYKICFEDKIATLEIPLAKLKDSGTYVCTASNE AGSSSCSATVTVREPPSFVKKVDPSYLMLPGESARLHCKLKGSPVIQVTWFKNNKELSESNTVRMYFVNSEAILDITDVKVEDSGSYSCEAVNDVGSDSCSTEIVIKEPPSFIKTLEPADIVRGTNALLQCEVSGTGPFEISWFKDKKQIRSSKKYRLFSQKSLVCLEIFSFNSADVGEYECVVANEVGKCGCMATHLLKEPPTFVKKVDDLIALGGQTVTLQAAVRGSEPISVTWMKGQEVIREDGKIKMSFSNGVAVLIIPDVQISFGGKYTCLAENEAGSQTSVGELIVKEPAKIIERAELIQVTAGDPATLEYTVAGTPELKPKWYKD GRPLVASKKYRISFKNNVAQLKFYSAELHDSGQYTFEISNEVGSSSCETTFTVLDRDIAPFFTKPLRNVDSVVNGTCRLDCKIAGSLPMRVSWFKDGKEIAASDRYRIAFVEGTASLEIIRVDMNDAGNFTCRATNSVGSKDSSGALIVQEPPSFVTKPGSKDVLPGSAVCLKSTFQGSTPLTIRWFKGNKELVSGGSCYITKEALESSLELYLVKTSDSGTYTCKVSNVAGGVECSANLFVKEPATFVEKLEPSQLLKKGDATQLACKVTGTPPIKITWFANDREIKESSKHRMSFVESTAVLRLTDVGIEDSGEYMCEAQNEAGSDHCSS IVIVKESPYFTKEFKPIEVLKEYDVMLLAEVAGTPPFEITWFKDNTILRSGRKYKTFIQDHLVSLQILKFVAADAGEYQCRVTNEVGSSICSARVTLREPPSFIKKIESTSSLRGGTAAFQATLKGSLPITVTWLKDSDEITEDDNIRMTFENNVASLYLSGIEVKHDGKYVCQAKNDAGIQRCSALLSVKEPATITEEAVSIDVTQGDPATLQVKFSGTKEITAKWFKDGQELTLGSKYKISVTDTVSILKIISTEKKDSGEYTFEVQNDVGRSSCKARINVLDLIIPPSFTKKLKKMDSIKGSFIDLECIVAGSHPISIQWFKDDQEISA SEKYKFSFHDNTAFLEISQLEGTDSGTYTCSATNKAGHNQCSGHLTVKEPPYFVEKPQSQDVNPNTRVQLKALVGGTAPMTIKWFKDNKELHSGAARSVWKDDTSTSLELFAAKATDSGTYICQLSNDVGTATSKATLFVKEPPQFIKKPSPVLVLRNGQSTTFECQITGTPKIRVSWYLDGNEITAIQKHGISFIDGLATFQISGARVENSGTYVCEARNDAGTASCSIELKVKEPPTFIRELKPVEVVKYSDVELECEVTGTPPFEVTWLKNNREIRSSKKYTLTDRVSVFNLHITKCDPSDTGEYQCIVSNEGGSCSCSTRVALKEPPS FIKKIENTTTVLKSSATFQSTVAGSPPISITWLKDDQILDEDDNVYISFVDSVATLQIRSVDNGHSGRYTCQAKNESGVERCYAFLLVQEPAQIVEKAKSVDVTEKDPMTLECVVAGTPELKVKWLKDGKQIVPSRYFSMSFENNVASFRIQSVMKQDSGQYTFKVENDFGSSSCDAYLRVLDQNIPPSFTKKLTKMDKVLGSSIHMECKVSGSLPISAQWFKDGKEISTSAKYRLVCHERSVSLEVNNLELEDTANYTCKVSNVAGDDACSGILTVKEPPSFLVKPGRQQAIPDSTVEFKAILKGTPPFKIKWFKDDVELVSGPKCFIGLE GSTSFLNLYSVDASKTGQYTCHVTNDVGSDSCTTMLLVTEPPKFVKKLEASKIVKAGDSSRLECKIAGSPEIRVVWFRNEHELPASDKYRMTFIDSVAVIQMNNLSTEDSGDFICEAQNPAGSTSCSTKVIVKEPPVFSSFPPIVETLKNAEVSLECELSGTPPFEVVWYKDKRQLRSSKKYKIASKNFHTSIHILNVDTSDIGEYHCKAQNEVGSDTCVCTVKLKEPPRFVSKLNSLTVVAGEPAELQASIEGAQPIFVQWLKEKEEVIRESENIRITFVENVATLQFAKAEPANAGKYICQIKNDGGMRENMATLMVLEPAVIVEKAGPM TVTVGETCTLECKVAGTPELSVEWYKDGKLLTSSQKHKFSFYNKISSLRILSVERQDAGTYTFQVQNNVGKSSCTAVVDVSDRAVPPSFTRRLKNTGGVLGASCILECKVAGSSPISVAWFHEKTKIVSGAKYQTTFSDNVCTLQLNSLDSSDMGNYTCVAANVAGSDECRAVLTVQEPPSFVKEPEPLEVLPGKNVTFTSVIRGTPPFKVNWFRGARELVKGDRCNIYFEDTVAELELFNIDISQSGEYTCVVSNNAGQASCTTRLFVKEPAAFLKRLSDHSVEPGKSIILESTYTGTLPISVTWKKDGFNITTSEKCNIVTTEKTCILEI LNSTKRDAGQYSCEIENEAGRDVCGALVSTLEPPYFVTELEPLEAAVGDSVSLQCQVAGTPEITVSWYKGDTKLRPTPEYRTYFTNNVATLVFNKVNINDSGEYTCKAENSIGTASSKTVFRIQERQLPPSFARQLKDIEQTVGLPVTLTCRLNGSAPIQVCWYRDGVLLRDDENLQTSFVDNVATLKILQTDLSHSGQYSCSASNPLGTASSSARLTAREPKKSPFFDIKPVSIDVIAGESADFECHVTGAQPMRITWSKDNKEIRPGGNYTITCVGNTPHLRILKVGKGDSGQYTCQATNDVGKDMCSAQLSVKEPPKFVKKLEASKVAK QGESIQLECKISGSPEIKVSWFRNDSELHESWKYNMSFINSVALLTINEASAEDSGDYICEAHNGVGDASCSTALTVKAPPVFTQKPSPVGALKGSDVILQCEISGTPPFEVVWVKDRKQVRNSKKFKITSKHFDTSLHILNLEASDVGEYHCKATNEVGSDTCSCSVKFKEPPRFVKKLSDTSTLIGDAVELRAIVEGFQPISVVWLKDRGEVIRESENTRISFIDNIATLQLGSPEASNSGKYICQIKNDAGMRECSAVLTVLEPARIIEKPEPMTVTTGNPFALECVVTGTPELSAKWFKDGRELSADSKHHITFINKVASLKIPCAEM SDKGLYSFEVKNSVGKSNCTVSVHVSDRIVPPSFIRKLKDVNAILGASVVLECRVSGSAPISVGWFQDGNEIVSGPKCQSSFSENVCTLNLSLLEPSDTGIYTCVAANVAGSDECSAVLTVQEPPSFEQTPDSVEVLPGMSLTFTSVIRGTPPFKVKWFKGSRELVPGESCNISLEDFVTELELFEVQPLESGDYSCLVTNDAGSASCTTHLFVKEPATFVKRLADFSVETGSPIVLEATYTGTPPISVSWIKDEYLISQSERCSITMTEKSTILEILESTIEDYAQYSCLIENEAGQDICEALVSVLEPPYFIEPLEHVEAVIGEPATLQC KVDGTPEIRISWYKEHTKLRSAPAYKMQFKNNVASLVINKVDHSDVGEYSCKADNSVGAVASSAVLVIKARKLPPFFARKLKDVHETLGFPVAFECRINGSEPLQVSWYKDGVLLKDDANLQTSFVHNVATLQILQTDQSHIGQYNCSASNPLGTASSSAKLILSEHEVPPFFDLKPVSVDLALGESGTFKCHVTGTAPIKITWAKDNREIRPGGNYKMTLVENTATLTVLKVGKGDAGQYTCYASNIAGKDSCSAQLGVQEPPRFIKKLEPSRIVKQDEFTRYECKIGGSPEIKVLWYKDETEIQESSKFRMSFVDSVAVLEMHNLSVEDS GDYTCEAHNAAGSASSSTSLKVKEPPIFRKKPHPIETLKGADVHLECELQGTPPFHVSWYKDKRELRSGKKYKIMSENFLTSIHILNVDAADIGEYQCKATNDVGSDTCVGSIALKAPPRFVKKLSDISTVVGKEVQLQTTIEGAEPISVVWFKDKGEIVRESDNIWISYSENIATLQFSRVEPANAGKYTCQIKNDAGMQECFATLSVLEPATIVEKPESIKVTTGDTCTLECTVAGTPELSTKWFKDGKELTSDNKYKISFFNKVSGLKIINVAPSDSGVYSFEVQNPVGKDSCTASLQVSDRTVPPSFTRKLKETNGLSGSSVVMECKV YGSPPISVSWFHEGNEISSGRKYQTTLTDNTCALTVNMLEESDSGDYTCIATNMAGSDECSAPLTVREPPSFVQKPDPMDVLTGTNVTFTSIVKGTPPFSVSWFKGSSELVPGDRCNVSLEDSVAELELFDVDTSQSGEYTCIVSNEAGKASCTTHLYIKAPAKFVKRLNDYSIEKGKPLILEGTFTGTPPISVTWKKNGINVTPSQRCNITTTEKSAILEIPSSTVEDAGQYNCYIENASGKDSCSAQILILEPPYFVKQLEPVKVSVGDSASLQCQLAGTPEIGVSWYKGDTKLRPTTTYKMHFRNNVATLVFNQVDINDSGEYICKAEN SVGEVSASTFLTVQEQKLPPSFSRQLRDVQETVGLPVVFDCAISGSEPISVSWYKDGKPLKDSPNVQTSFLDNTATLNIFKTDRSLAGQYSCTATNPIGSASSSARLILTEGKNPPFFDIRLAPVDAVVGESADFECHVTGTQPIKVSWAKDSREIRSGGKYQISYLENSAHLTVLKVDKGDSGQYTCYAVNEVGKDSCTAQLNIKERLIPPSFTKRLSETVEETEGNSFKLEGRVAGSQPITVAWYKNNIEIQPTSNCEITFKNNTLVLQVRKAGMNDAGLYTCKVSNDAGSALCTSSIVIKEPKKPPVFDQHLTPVTVSEGEYVQLSCHV QGSEPIRIQWLKAGREIKPSDRCSFSFASGTAVLELRDVAKADSGDYVCKASNVAGSDTTKSKVTIKDKPAVAPATKKAAVDGRLFFVSEPQSIRVVEKTTATFIAKVGGDPIPNVKWTKGKWRQLNQGGRVFIHQKGDEAKLEIRDTTKTDSGLYRCVAFNEHGEIESNVNLQVDERKKQEKIEGDLRAMLKKTPILKKGAGEEEEIDIMELLKNVDPKEYEKYARMYGITDFRGLLQAFELLKQSQEEETHRLEIEEIERSERDEKEFEELVSFIQQRLSQTEPVTLIKDIENQTVLKDNDAVFEIDIKINYPEIKLSWYKGTEKLEPSD KFEISIDGDRHTLRVKNCQLKDQGNYRLVCGPHIASAKLTVIEPAWERHLQDVTLKEGQTCTMTCQFSVPNVKSEWFRNGRILKPQGRHKTEVEHKVHKLTIADVRAEDQGQYTCKYEDLETSAELRIEAEPIQFTKRIQNIVVSEHQSATFECEVSFDDAIVTWYKGPTELTESQKYNFRNDGRCHYMTIHNVTPDDEGVYSVIARLEPRGEARSTAELYLTTKEIKLELKPPDIPDSRVPIPTMPIRAVPPEEIPPVVAPPIPLLLPTPEEKKPPPKRIEVTKKAVKKDAKKVVAKPKEMTPREEIVKKPPPPTTLIPAKAPEIIDVSSK AEEVKIMTITRKKEVQKEKEAVYEKKQAVHKEKRVFIESFEEPYDELEVEPYTEPFEQPYYEEPDEDYEEIKVEAKKEVHEEWEEDFEEGQEYYEREEGYDEGEEEWEEAYQEREVIQVQKEVYEESHERKVPAKVPEKKAPPPPKVIKKPVIEKIEKTSRRMEEEKVQVTKVPEVSKKIVPQKPSRTPVQEEVIEVKVPAVHTKKMVISEEKMFFASHTEEEVSVTVPEVQKEIVTEEKIHVAISKRVEPPPKVPELPEKPAPEEVAPVPIPKKVEPPAPKVPEVPKKPVPEEKKPVPVPKKEPAAPPKVPEVPKKPVPEEKIPVPVAKKK EAPPAKVPEVQKGVVTEEKITIVTQREESPPPAVPEIPKKKVPEERKPVPRKEEEVPPPPKVPALPKKPVPEEKVAVPVPVAKKAPPPRAEVSKKTVVEEKRFVAEEKLSFAVPQRVEVTRHEVSAEEEWSYSEEEEGVSISVYREEEREEEEEAEVTEYEVMEEPEEYVVEEKLHIISKRVEAEPAEVTERQEKKIVLKPKIPAKIEEPPPAKVPEAPKKIVPEKKVPAPVPKKEKVPPPKVPEEPKKPVPEKKVPPKVIKMEEPLPAKVTERHMQITQEEKVLVAVTKKEAPPKARVPEEPKRAVPEEKVLKLKPKREEEPPAKVTEFRK RVVKEEKVSIEAPKREPQPIKEVTIMEEKERAYTLEEEAVSVQREEEYEEYEEYDYKEFEEYEPTEEYDQYEEYEEREYERYEEHEEYITEPEKPIPVKPVPEEPVPTKPKAPPAKVLKKAVPEEKVPVPIPKKLKPPPPKVPEEPKKVFEEKIRISITKREKEQVTEPAAKVPMKPKRVVAEEKVPVPRKEVAPPVRVPEVPKELEPEEVAFEEEVVTHVEEYLVEEEEEYIHEEEEFITEEEVVPVIPVKVPEVPRKPVPEEKKPVPVPKKKEAPPAKVPEVPKKPEEKVPVLIPKKEKPPPAKVPEVPKKPVPEEKVPVPVPKKVEAPP AKVPEVPKKPVPEKKVPVPAPKKVEAPPAKVPEVPKKLIPEEKKPTPVPKKVEAPPPKVPKKREPVPVPVALPQEEEVLFEEEIVPEEEVLPEEEEVLPEEEEVLPEEEEVLPEEEEIPPEEEEVPPEEEYVPEEEEFVPEEEVLPEVKPKVPVPAPVPEIKKKVTEKKVVIPKKEEAPPAKVPEVPKKVEEKRIILPKEEEVLPVEVTEEPEEEPISEEEIPEEPPSIEEVEEVAPPRVPEVIKKAVPEAPTPVPKKVEAPPAKVSKKIPEEKVPVPVQKKEAPPAKVPEVPKKVPEKKVLVPKKEAVPPAKGRTVLEEKVSVAFRQEVVV KERLELEVVEAEVEEIPEEEEFHEVEEYFEEGEFHEVEEFIKLEQHRVEEEHRVEKVHRVIEVFEAEEVEVFEKPKAPPKGPEISEKIIPPKKPPTKVVPRKEPPAKVPEVPKKIVVEEKVRVPEEPRVPPTKVPEVLPPKEVVPEKKVPVPPAKKPEAPPPKVPEAPKEVVPEKKVPVPPPKKPEVPPTKVPEVPKAAVPEKKVPEAIPPKPESPPPEVPEAPKEVVPEKKVPAAPPKKPEVTPVKVPEAPKEVVPEKKVPVPPPKKPEVPPTKVPEVPKVAVPEKKVPEAIPPKPESPPPEVFEEPEEVALEEPPAEVVEEPEPAAPPQV TVPPKKPVPEKKAPAVVAKKPELPPVKVPEVPKEVVPEKKVPLVVPKKPEAPPAKVPEVPKEVVPEKKVAVPKKPEVPPAKVPEVPKKPVLEEKPAVPVPERAESPPPEVYEEPEEIAPEEEIAPEEEKPVPVAEEEEPEVPPPAVPEEPKKIIPEKKVPVIKKPEAPPPKEPEPEKVIEKPKLKPRPPPPPPAPPKEDVKEKIFQLKAIPKKKVPEKPQVPEKVELTPLKVPGGEKKVRKLLPERKPEPKEEVVLKSVLRKRPEEEEPKVEPKKLEKVKKPAVPEPPPPKPVEEVEVPTVTKRERKIPEPTKVPEIKPAIPLPAPEPKPKP EAEVKTIKPPPVEPEPTPIAAPVTVPVVGKKAEAKAPKEEAAKPKGPIKGVPKKTPSPIEAERRKLRPGSGGEKPPDEAPFTYQLKAVPLKFVKEIKDIILTESEFVGSSAIFECLVSPSTAITTWMKDGSNIRESPKHRFIADGKDRKLHIIDVQLSDAGEYTCVLRLGNKEKTSTAKLVVEELPVRFVKTLEEEVTVVKGQPLYLSCELNKERDVVWRKDGKIVVEKPGRIVPGVIGLMRALTINDADDTDAGTYTVTVENANNLECSSCVKVVEVIRDWLVKPIRDQHVKPKGTAIFACDIAKDTPNIKWFKGYDEIPAEPNDKTEILR DGNHLYLKIKNAMPEDIAEYAVEIEGKRYPAKLTLGEREVELLKPIEDVTIYEKESASFDAEISEADIPGQWKLKGELLRPSPTCEIKAEGGKRFLTLHKVKLDQAGEVLYQALNAITTAILTVKEIELDFAVPLKDVTVPERRQARFECVLTREANVIWSKGPDIIKSSDKFDIIADGKKHILVINDSQFDDEGVYTAEVEGKKTSARLFVTGIRLKFMSPLEDQTVKEGETATFVCELSHEKMHVVWFKNDAKLHTSRTVLISSEGKTHKLEMKEVTLDDISQIKAQVKELSSTAQLKVLEADPYFTVKLHDKTAVEKDEITLKCEVSKD VPVKWFKDGEEIVPSPKYSIKADGLRRILKIKKADLKDKGEYVCDCGTDKTKANVTVEARLIKVEKPLYGVEVFVGETAHFEIELSEPDVHGQWKLKGQPLTASPDCEIIEDGKKHILILHNCQLGMTGEVSFQAANAKSAANLKVKELPLIFITPLSDVKVFEKDEAKFECEVSREPKTFRWLKGTQEITGDDRFELIKDGTKHSMVIKSAAFEDEAKYMFEAEDKHTSGKLIIEGIRLKFLTPLKDVTAKEKESAVFTVELSHDNIRVKWFKNDQRLHTTRSVSMQDEGKTHSITFKDLSIDDTSQIRVEAMGMSSEAKLTVLEGDPYFT GKLQDYTGVEKDEVILQCEISKADAPVKWFKDGKEIKPSKNAVIKADGKKRMLILKKALKSDIGQYTCDCGTDKTSGKLDIEDREIKLVRPLHSVEVMETETARFETEISEDDIHANWKLKGEALLQTPDCEIKEEGKIHSLVLHNCRLDQTGGVDFQAANVKSSAHLRVKPRVIGLLRPLKDVTVTAGETATFDCELSYEDIPVEWYLKGKKLEPSDKVVPRSEGKVHTLTLRDVKLEDAGEVQLTAKDFKTHANLFVKEPPVEFTKPLEDQTVEEGATAVLECEVSRENAKVKWFKNGTEILKSKKYEIVADGRVRKLVIHDCTPEDIKT YTCDAKDFKTSCNLNVVPPHVEFLRPLTDLQVREKEMARFECELSRENAKVKWFKDGAEIKKGKKYDIISKGAVRILVINKCLLDDEAEYSCEVRTARTSGMLTVLEEEAVFTKNLANIEVSETDTIKLVCEVSKPGAEVIWYKGDEEIIETGRYEILTEGRKRILVIQNAHLEDAGNYNCRLPSSRTDGKVKVHELAAEFISKPQNLEILEGEKAEFVCSISKESFPVQWKRDDKTLESGDKYDVIADGKKRVLVVKDATLQDMGTYVVMVGAARAAAHLTVIEKLRIVVPLKDTRVKEQQEVVFNCEVNTEGAKAKWFRNEEAIFDSSKY IILQKDLVYTLRIRDAHLDDQANYNVSLTNHRGENVKSAANLIVEEEDLRIVEPLKDIETMEKKSVTFWCKVNRLNVTLKWTKNGEEVPFDNRVSYRVDKYKHMLTIKDCGFPDEGEYIVTAGQDKSVAELLIIEAPTEFVEHLEDQTVTEFDDAVFSCQLSREKANVKWYRNGREIKEGKKYKFEKDGSIHRLIIKDCRLDDECEYACGVEDRKSRARLFVEEIPVEIIRPPQDILEAPGADVVFLAELNKDKVEVQWLRNNMVVVQGDKHQMMSEGKIHRLQICDIKPRDQGEYRFIAKDKEARAKLELAAAPKIKTADQDLVVDVGKPL TMVVPYDAYPKAEAEWFKENEPLSTKTIDTTAEQTSFRILEAKKGDKGRYKIVLQNKHGKAEGFINLKVIDVPGPVRNLEVTETFDGEVSLAWEEPLTDGGSKIIGYVVERRDIKRKTWVLATDRAESCEFTVTGLQKGGVEYLFRVSARNRVGTGEPVETDNPVEARSKYDVPGPPLNVTITDVNRFGVSLTWEPPEYDGGAEITNYVIELRDKTSIRWDTAMTVRAEDLSATVTDVVEGQEYSFRVRAQNRIGVGKPSAATPFVKVADPIERPSPPVNLTSSDQTQSSVQLKWEPPLKDGGSPILGYIIERCEEGKDNWIRCNMKLVPEL TYKVTGLEKGNKYLYRVSAENKAGVSDPSEILGPLTADDAFVEPTMDLSAFKDGLEVIVPNPITILVPSTGYPRPTATWCFGDKVLETGDRVKMKTLSAYAELVISPSERSDKGIYTLKLENRVKTISGEIDVNVIARPSAPKELKFGDITKDSVHLTWEPPDDDGGSPLTGYVVEKREVSRKTWTKVMDFVTDLEFTVPDLVQGKEYLFKVCARNKCGPGEPAYVDEPVNMSTPATVPDPPENVKWRDRTANSIFLTWDPPKNDGGSRIKGYIVERCPRGSDKWVACGEPVAETKMEVTGLEEGKWYAYRVKALNRQGASKPSRPTEEIQA VDTQEAPEIFLDVKLLAGLTVKAGTKIELPATVTGKPEPKITWTKADMILKQDKRITIENVPKKSTVTIVDSKRSDTGTYIIEAVNVCGRATAVVEVNVLDKPGPPAAFDITDVTNESCLLTWNPPRDDGGSKITNYVVERRATDSEVWHKLSSTVKDTNFKATKLIPNKEYIFRVAAENMYGVGEPVQASPITAKYQFDPPGPPTRLEPSDITKDAVTLTWCEPDDDGGSPITGYWVERLDPDTDKWVRCNKMPVKDTTYRVKGLTNKKKYRFRVLAENLAGPGKPSKSTEPILIKDPIDPPWPPGKPTVKDVGKTSVRLNWTKPEHDGGA KIESYVIEMLKTGTDEWVRVAEGVPTTQHLLPGLMEGQEYSFRVRAVNKAGESEPSEPSDPVLCREKLYPPSPPRWLEVINITKNTADLKWTVPEKDGGSPITNYIVEKRDVRRKGWQTVDTTVKDTKCTVTPLTEGSLYVFRVAAENAIGQSDYTEIEDSVLAKDTFTTPGPPYALAVVDVTKRHVDLKWEPPKNDGGRPIQRYVIEKKERLGTRWVKAGKTAGPDCNFRVTDVIEGTEVQFQVRAENEAGVGHPSEPTEILSIEDPTSPPSPPLDLHVTDAGRKHIAIAWKPPEKNGGSPIIGYHVEMCPVGTEKWMRVNSRPIKDLKFK VEEGVVPDKEYVLRVRAVNAIGVSEPSEISENVVAKDPDCKPTIDLETHDIIVIEGEKLSIPVPFRAVPVPTVSWHKDGKEVKASDRLTMKNDHISAHLEVPKSVRADAGIYTITLENKLGSATASINVKVIGLPGPCKDIKASDITKSSCKLTWEPPEFDGGTPILHYVLERREAGRRTYIPVMSGENKLSWTVKDLIPNGEYFFRVKAVNKVGGGEYIELKNPVIAQDPKQPPDPPVDVEVHNPTAEAMTITWKPPLYDGGSKIMGYIIEKIAKGEERWKRCNEHLVPILTYTAKGLEEGKEYQFRVRAENAAGISEPSRATPPTKAVDP IDAPKVILRTSLEVKRGDEIALDASISGSPYPTITWIKDENVIVPEEIKKRAAPLVRRRKGEVQEEEPFVLPLTQRLSIDNSKKGESQLRVRDSLRPDHGLYMIKVENDHGIAKAPCTVSVLDTPGPPINFVFEDIRKTSVLCKWEPPLDDGGSEIINYTLEKKDKTKPDSEWIVVTSTLRHCKYSVTKLIEGKEYLFRVRAENRFGPGPPCVSKPLVAKDPFGPPDAPDKPIVEDVTSNSMLVKWNEPKDNGSPILGYWLEKREVNSTHWSRVNKSLLNALKANVDGLLEGLTYVFRVCAENAAGPGKFSPPSDPKTAHDPISPPGPPIPR VTDTSSTTIELEWEPPAFNGGGEIVGYFVDKQLVGTNEWSRCTEKMIKVRQYTVKEIREGADYKLRVSAVNAAGEGPPGETQPVTVAEPQEPPAVELDVSVKGGIQIMAGKTLRIPAVVTGRPVPTKVWTKEEGELDKDRVVIDNVGTKSELIIKDALRKDHGRYVITATNSCGSKFAAARVEVFDVPGPVLDLKPVVTNRKMCLLNWSDPEDDGGSEITGFIIERKDAKMHTWRQPIETERSKCDITGLLEGQEYKFRVIAKNKFGCGPPVEIGPILAVDPLGPPTSPERLTYTERTKSTITLDWKEPRSNGGSPIQGYIIEKRRHDKPDF ERVNKRLCPTTSFLVENLDEHQMYEFRVKAVNEIGESEPSLPLNVVIQDDEVPPTIKLRLSVRGDTIKVKAGEPVHIPADVTGLPMPKIEWSKNETVIEKPTDALQITKEEVSRSEAKTELSIPKAVREDKGTYTVTASNRLGSVFRNVHVEVYDRPSPPRNLAVTDIKAESCYLTWDAPLDNGGSEITHYVIDKRDASRKKAEWEEVTNTAVEKRYGIWKLIPNGQYEFRVRAVNKYGISDECKSDKVVIQDPYRLPGPPGKPKVLARTKGSMLVSWTPPLDNGGSPITGYWLEKREEGSPYWSRVSRAPITKVGLKGVEFNVPRLLEGVK YQFRAMAINAAGIGPPSEPSDPEVAGDPIFPPGPPSCPEVKDKTKSSISLGWKPPAKDGGSPIKGYIVEMQEEGTTDWKRVNEPDKLITTCECVVPNLKELRKYRFRVKAVNEAGESEPSDTTGEIPATDIQEEPEVFIDIGAQDCLVCKAGSQIRIPAVIKGRPTPKSSWEFDGKAKKAMKDGVHDIPEDAQLETAENSSVIIIPECKRSHTGKYSITAKNKAGQKTANCRVKVMDVPGPPKDLKVSDITRGSCRLSWKMPDDDGGDRIKGYVIEKRTIDGKAWTKVNPDCGSTTFVVPDLLSEQQYFFRVRAENRFGIGPPVETIQRTTA RDPIYPPDPPIKLKIGLITKNTVHLSWKPPKNDGGSPVTHYIVECLAWDPTGTKKEAWRQCNKRDVEELQFTVEDLVEGGEYEFRVKAVNAAGVSKPSATVGPVTVKDQTCPPSIDLKEFMEVEEGTNVNIVAKIKGVPFPTLTWFKAPPKKPDNKEPVLYDTHVNKLVVDDTCTLVIPQSRRSDTGLYTITAVNNLGTASKEMRLNVLGRPGPPVGPIKFESVSADQMTLSWFPPKDDGGSKITNYVIEKREANRKTWVHVSSEPKECTYTIPKLLEGHEYVFRIMAQNKYGIGEPLDSEPETARNLFSVPGAPDKPTVSSVTRNSMTVNW EEPEYDGGSPVTGYWLEMKDTTSKRWKRVNRDPIKAMTLGVSYKVTGLIEGSDYQFRVYAINAAGVGPASLPSDPATARDPIAPPGPPFPKVTDWTKSSADLEWSPPLKDGGSKVTGYIVEYKEEGKEEWEKGKDKEVRGTKLVVTGLKEGAFYKFRVRAVNIAGIGEPGEVTDVIEMKDRLVSPDLQLDASVRDRIVVHAGGVIRIIAYVSGKPPPTVTWNMNERTLPQEATIETTAISSSMVIKNCQRSHQGVYSLLAKNEAGERKKTIIVDVLDVPGPVGTPFLAHNLTNESCKLTWFSPEDDGGSPITNYVIEKRESDRRAWTPVTYT VTRQNATVQGLIQGKAYFFRIAAENSIGMGPFVETSEALVIREPITVPERPEDLEVKEVTKNTVTLTWNPPKYDGGSEIINYVLESRLIGTEKFHKVTNDNLLSRKYTVKGLKEGDTYEYRVSAVNIVGQGKPSFCTKPITCKDELAPPTLHLDFRDKLTIRVGEAFALTGRYSGKPKPKVSWFKDEADVLEDDRTHIKTTPATLALEKIKAKRSDSGKYCVVVENSTGSRKGFCQVNVVDRPGPPVGPVSFDEVTKDYMVISWKPPLDDGGSKITNYIIEKKEVGKDVWMPVTSASAKTTCKVSKLLEGKDYIFRIHAENLYGISDPLVSD SMKAKDRFRVPDAPDQPIVTEVTKDSALVTWNKPHDGGKPITNYILEKRETMSKRWARVTKDPIHPYTKFRVPDLLEGCQYEFRVSAENEIGIGDPSPPSKPVFAKDPIAKPSPPVNPEAIDTTCNSVDLTWQPPRHDGGSKILGYIVEYQKVGDEEWRRANHTPESCPETKYKVTGLRDGQTYKFRVLAVNAAGESDPAHVPEPVLVKDRLEPPELILDANMAREQHIKVGDTLRLSAIIKGVPFPKVTWKKEDRDAPTKARIDVTPVGSKLEIRNAAHEDGGIYSLTVENPAGSKTVSVKVLVLDKPGPPRDLEVSEIRKDSCYLTWKEP LDDGGSVITNYVVERRDVASAQWSPLSATSKKKSHFAKHLNEGNQYLFRVAAENQYGRGPFVETPKPIKALDPLHPPGPPKDLHHVDVDKTEVSLVWNKPDRDGGSPITGYLVEYQEEGTQDWIKFKTVTNLECVVTGLQQGKTYRFRVKAENIVGLGLPDTTIPIECQEKLVPPSVELDVKLIEGLVVKAGTTVRFPAIIRGVPVPTAKWTTDGSEIKTDEHYTVETDNFSSVLTIKNCLRRDTGEYQITVSNAAGSKTVAVHLTVLDVPGPPTGPINILDVTPEHMTISWQPPKDDGGSPVINYIVEKQDTRKDTWGVVSSGSSKTKLKI PHLQKGCEYVFRVRAENKIGVGPPLDSTPTVAKHKFSPPSPPGKPVVTDITENAATVSWTLPKSDGGSPITGYYMERREVTGKWVRVNKTPIADLKFRVTGLYEGNTYEFRVFAENLAGLSKPSPSSDPIKACRPIKPPGPPINPKLKDKSRETADLVWTKPLSDGGSPILGYVVECQKPGTAQWNRINKDELIRQCAFRVPGLIEGNEYRFRIKAANIVGEGEPRELAESVIAKDILHPPEVELDVTCRDVITVRVGQTIRILARVKGRPEPDITWTKEGKVLVREKRVDLIQDLPRVELQIKEAVRADHGKYIISAKNSSGHAQGSAIVN VLDRPGPCQNLKVTNVTKENCTISWENPLDNGGSEITNFIVEYRKPNQKGWSIVASDVTKRLIKANLLANNEYYFRVCAENKVGVGPTIETKTPILAINPIDRPGEPENLHIADKGKTFVYLKWRRPDYDGGSPNLSYHVERRLKGSDDWERVHKGSIKETHYMVDRCVENQIYEFRVQTKNEGGESDWVKTEEVVVKEDLQKPVLDLKLSGVLTVKAGDTIRLEAGVRGKPFPEVAWTKDKDATDLTRSPRVKIDTRADSSKFSLTKAKRSDGGKYVVTATNTAGSFVAYATVNVLDKPGPVRNLKIVDVSSDRCTVCWDPPEDDGGCEIQ NYILEKCETKRMVWSTYSATVLTPGTTVTRLIEGNEYIFRVRAENKIGTGPPTESKPVIAKTKYDKPGRPDPPEVTKVSKEEMTVVWNPPEYDGGKSITGYFLEKKEKHSTRWVPVNKSAIPERRMKVQNLLPDHEYQFRVKAENEIGIGEPSLPSRPVVAKDPIEPPGPPTNFRVVDTTKHSITLGWGKPVYDGGAPIIGYVVEMRPKIADASPDEGWKRCNAAAQLVRKEFTVTSLDENQEYEFRVCAQNQVGIGRPAELKEAIKPKEILEPPEIDLDASMRKLVIVRAGCPIRLFAIVRGRPAPKVTWRKVGIDNVVRKGQVDLVDTMA FLVIPNSTRDDSGKYSLTLVNPAGEKAVFVNVRVLDTPGPVSDLKVSDVTKTSCHVSWAPPENDGGSQVTHYIVEKREADRKTWSTVTPEVKKTSFHVTNLVPGNEYYFRVTAVNEYGPGVPTDVPKPVLASDPLSEPDPPRKLEVTEMTKNSATLAWLPPLRDGGAKIDGYITSYREEEQPADRWTEYSVVKDLSLVVTGLKEGKKYKFRVAARNAVGVSLPREAEGVYEAKEQLLPPKILMPEQITIKAGKKLRIEAHVYGKPHPTCKWKKGEDEVVTSSHLAVHKADSSSILIIKDVTRKDSGYYSLTAENSSGTDTQKIKVVVMDAPG PPQPPFDISDIDADACSLSWHIPLEDGGSNITNYIVEKCDVSRGDWVTALASVTKTSCRVGKLIPGQEYIFRVRAENRFGISEPLTSPKMVAQFPFGVPSEPKNARVTKVNKDCIFVAWDRPDSDGGSPIIGYLIERKERNSLLWVKANDTLVRSTEYPCAGLVEGLEYSFRIYALNKAGSSPPSKPTEYVTARMPVDPPGKPEVIDVTKSTVSLIWARPKHDGGSKIIGYFVEACKLPGDKWVRCNTAPHQIPQEEYTATGLEEKAQYQFRAIARTAVNISPPSEPSDPVTILAENVPPRIDLSVAMKSLLTVKAGTNVCLDATVFGKPMP TVSWKKDGTLLKPAEGIKMAMQRNLCTLELFSVNRKDSGDYTITAENSSGSKSATIKLKVLDKPGPPASVKINKMYSDRAMLSWEPPLEDGGSEITNYIVDKRETSRPNWAQVSATVPITSCSVEKLIEGHEYQFRICAENKYGVGDPVFTEPAIAKNPYDPPGRCDPPVISNITKDHMTVSWKPPADDGGSPITGYLLEKRETQAVNWTKVNRKPIIERTLKATGLQEGTEYEFRVTAINKAGPGKPSDASKAAYARDPQYPPGPPAFPKVYDTTRSSVSLSWGKPAYDGGSPIIGYLVEVKRADSDNWVRCNLPQNLQKTRFEVTGLMED TQYQFRVYAVNKIGYSDPSDVPDKHYPKDILIPPEGELDADLRKTLILRAGVTMRLYVPVKGRPPPKITWSKPNVNLRDRIGLDIKSTDFDTFLRCENVNKYDAGKYILTLENSCGKKEYTIVVKVLDTPGPPVNVTVKEISKDSAYVTWEPPIIDGGSPIINYVVQKRDAERKSWSTVTTECSKTSFRVANLEEGKSYFFRVFAENEYGIGDPGETRDAVKASQTPGPVVDLKVRSVSKSSCSIGWKKPHSDGGSRIIGYVVDFLTEENKWQRVMKSLSLQYSAKDLTEGKEYTFRVSAENENGEGTPSEITVVARDDVVAPDLDLKGLPD LCYLAKENSNFRLKIPIKGKPAPSVSWKKGEDPLATDTRVSVESSAVNTTLIVYDCQKSDAGKYTITLKNVAGTKEGTISIKVVGKPGIPTGPIKFDEVTAEAMTLKWAPPKDDGGSEITNYILEKRDSVNNKWVTCASAVQKTTFRVTRLHEGMEYTFRVSAENKYGVGEGLKSEPIVARHPFDVPDAPPPPNIVDVRHDSVSLTWTDPKKTGGSPITGYHLEFKERNSLLWKRANKTPIRMRDFKVTGLTEGLEYEFRVMAINLAGVGKPSLPSEPVVALDPIDPPGKPEVINITRNSVTLIWTEPKYDGGHKLTGYIVEKRDLPSKSWM KANHVNVPECAFTVTDLVEGGKYEFRIRAKNTAGAISAPSESTETIICKDEYEAPTIVLDPTIKDGLTIKAGDTIVLNAISILGKPLPKSSWSKAGKDIRPSDITQITSTPTSSMLTIKYATRKDAGEYTITATNPFGTKVEHVKVTVLDVPGPPGPVEISNVSAEKATLTWTPPLEDGGSPIKSYILEKRETSRLLWTVVSEDIQSCRHVATKLIQGNEYIFRVSAVNHYGKGEPVQSEPVKMVDRFGPPGPPEKPEVSNVTKNTATVSWKRPVDDGGSEITGYHVERREKKSLRWVRAIKTPVSDLRCKVTGLQEGSTYEFRVSAENRAG IGPPSEASDSVLMKDAAYPPGPPSNPHVTDTTKKSASLAWGKPHYDGGLEITGYVVEHQKVGDEAWIKDTTGTALRITQFVVPDLQTKEKYNFRISAINDAGVGEPAVIPDVEIVEREMAPDFELDAELRRTLVVRAGLSIRIFVPIKGRPAPEVTWTKDNINLKNRANIENTESFTLLIIPECNRYDTGKFVMTIENPAGKKSGFVNVRVLDTPGPVLNLRPTDITKDSVTLHWDLPLIDGGSRITNYIVEKREATRKSYSTATTKCHKCTYKVTGLSEGCEYFFRVMAENEYGIGEPTETTEPVKASEAPSPPDSLNIMDITKSTVSLAW PKPKHDGGSKITGYVIEAQRKGSDQWTHITTVKGLECVVRNLTEGEEYTFQVMAVNSAGRSAPRESRPVIVKEQTMLPELDLRGIYQKLVIAKAGDNIKVEIPVLGRPKPTVTWKKGDQILKQTQRVNFETTATSTILNINECVRSDSGPYPLTARNIVGEVGDVITIQVHDIPGPPTGPIKFDEVSSDFVTFSWDPPENDGGVPISNYVVEMRQTDSTTWVELATTVIRTTYKATRLTTGLEYQFRVKAQNRYGVGPGITSACIVANYPFKVPGPPGTPQVTAVTKDSMTISWHEPLSDGGSPILGYHVERKERNGILWQTVSKALVPGNI FKSSGLTDGIAYEFRVIAENMAGKSKPSKPSEPMLALDPIDPPGKPVPLNITRHTVTLKWAKPEYTGGFKITSYIVEKRDLPNGRWLKANFSNILENEFTVSGLTEDAAYEFRVIAKNAAGAISPPSEPSDAITCRDDVEAPKIKVDVKFKDTVILKAGEAFRLEADVSGRPPPTMEWSKDGKELEGTAKLEIKIADFSTNLVNKDSTRRDSGAYTLTATNPGGFAKHIFNVKVLDRPGPPEGPLAVTEVTSEKCVLSWFPPLDDGGAKIDHYIVQKRETSRLAWTNVASEVQVTKLKVTKLLKGNEYIFRVMAVNKYGVGEPLESEPVLAV NPYGPPDPPKNPEVTTITKDSMVVCWGHPDSDGGSEIINYIVERRDKAGQRWIKCNKKTLTDLRYKVSGLTEGHEYEFRIMAENAAGISAPSPTSPFYKACDTVFKPGPPGNPRVLDTSRSSISIAWNKPIYDGGSEITGYMVEIALPEEDEWQIVTPPAGLKATSYTITGLTENQEYKIRIYAMNSEGLGEPALVPGTPKAEDRMLPPEIELDADLRKVVTIRACCTLRLFVPIKGRPAPEVKWARDHGESLDKASIESTSSYTLLIVGNVNRFDSGKYILTVENSSGSKSAFVNVRVLDTPGPPQDLKVKEVTKTSVTLTWDPPLLDGGS KIKNYIVEKRESTRKAYSTVATNCHKTSWKVDQLQEGCSYYFRVLAENEYGIGLPAETAESVKASERPLPPGKITLMDVTRNSVSLSWEKPEHDGGSRILGYIVEMQTKGSDKWATCATVKVTEATITGLIQGEEYSFRVSAQNEKGISDPRQLSVPVIAKDLVIPPAFKLLFNTFTVLAGEDLKVDVPFIGRPTPAVTWHKDNVPLKQTTRVNAESTENNSLLTIKDACREDVGHYVVKLTNSAGEAIETLNVIVLDKPGPPTGPVKMDEVTADSITLSWGPPKYDGGSSINNYIVEKRDTSTTTWQIVSATVARTTIKACRLKTGCEYQF RIAAENRYGKSTYLNSEPTVAQYPFKVPGPPGTPVVTLSSRDSMEVQWNEPISDGGSRVIGYHLERKERNSILWVKLNKTPIPQTKFKTTGLEEGVEYEFRVSAENIVGIGKPSKVSECYVARDPCDPPGRPEAIIVTRNSVTLQWKKPTYDGGSKITGYIVEKKELPEGRWMKASFTNIIDTHFEVTGLVEDHRYEFRVIARNAAGVFSEPSESTGAITARDEVDPPRISMDPKYKDTIVVHAGESFKVDADIYGKPIPTIQWIKGDQELSNTARLEIKSTDFATSLSVKDAVRVDSGNYILKAKNVAGERSVTVNVKVLDRPGPPEGPVV ISGVTAEKCTLAWKPPLQDGGSDIINYIVERRETSRLVWTVVDANVQTLSCKVTKLLEGNEYTFRIMAVNKYGVGEPLESEPVVAKNPFVVPDAPKAPEVTTVTKDSMIVVWERPASDGGSEILGYVLEKRDKEGIRWTRCHKRLIGELRLRVTGLIENHDYEFRVSAENAAGLSEPSPPSAYQKACDPIYKPGPPNNPKVIDITRSSVFLSWSKPIYDGGCEIQGYIVEKCDVSVGEWTMCTPPTGINKTNIEVEKLLEKHEYNFRICAINKAGVGEHADVPGPIIVEEKLEAPDIDLDLELRKIINIRAGGSLRLFVPIKGRPTPEVKWG KVDGEIRDAAIIDVTSSFTSLVLDNVNRYDSGKYTLTLENSSGTKSAFVTVRVLDTPSPPVNLKVTEITKDSVSITWEPPLLDGGSKIKNYIVEKREATRKSYAAVVTNCHKNSWKIDQLQEGCSYYFRVTAENEYGIGLPAQTADPIKVAEVPQPPGKITVDDVTRNSVSLSWTKPEHDGGSKIIQYIVEMQAKHSEKWSECARVKSLQAVITNLTQGEEYLFRVVAVNEKGRSDPRSLAVPIVAKDLVIEPDVKPAFSSYSVQVGQDLKIEVPISGRPKPTITWTKDGLPLKQTTRINVTDSLDLTTLSIKETHKDDGGQYGITVANVVG QKTASIEIVTLDKPDPPKGPVKFDDVSAESITLSWNPPLYTGGCQITNYIVQKRDTTTTVWDVVSATVARTTLKVTKLKTGTEYQFRIFAENRYGQSFALESDPIVAQYPYKEPGPPGTPFATAISKDSMVIQWHEPVNNGGSPVIGYHLERKERNSILWTKVNKTIIHDTQFKAQNLEEGIEYEFRVYAENIVGVGKASKNSECYVARDPCDPPGTPEPIMVKRNEITLQWTKPVYDGGSMITGYIVEKRDLPDGRWMKASFTNVIETQFTVSGLTEDQRYEFRVIAKNAAGAISKPSDSTGPITAKDEVELPRISMDPKFRDTIVVNAGE TFRLEADVHGKPLPTIEWLRGDKEIEESARCEIKNTDFKALLIVKDAIRIDGGQYILRASNVAGSKSFPVNVKVLDRPGPPEGPVQVTGVTSEKCSLTWSPPLQDGGSDISHYVVEKRETSRLAWTVVASEVVTNSLKVTKLLEGNEYVFRIMAVNKYGVGEPLESAPVLMKNPFVLPGPPKSLEVTNIAKDSMTVCWNRPDSDGGSEIIGYIVEKRDRSGIRWIKCNKRRITDLRLRVTGLTEDHEYEFRVSAENAAGVGEPSPATVYYKACDPVFKPGPPTNAHIVDTTKNSITLAWGKPIYDGGSEILGYVVEICKADEEEWQIVTPQT GLRVTRFEISKLTEHQEYKIRVCALNKVGLGEATSVPGTVKPEDKLEAPELDLDSELRKGIVVRAGGSARIHIPFKGRPTPEITWSREEGEFTDKVQIEKGVNYTQLSIDNCDRNDAGKYILKLENSSGSKSAFVTVKVLDTPGPPQNLAVKEVRKDSAFLVWEPPIIDGGAKVKNYVIDKRESTRKAYANVSSKCSKTSFKVENLTEGAIYYFRVMAENEFGVGVPVETVDAVKAAEPPSPPGKVTLTDVSQTSASLMWEKPEHDGGSRVLGYVVEMQPKGTEKWSIVAESKVCNAVVTGLSSGQEYQFRVKAYNEKGKSDPRVLGVPVIA KDLTIQPSLKLPFNTYSIQAGEDLKIEIPVIGRPRPNISWVKDGEPLKQTTRVNVEETATSTVLHIKEGNKDDFGKYTVTATNSAGTATENLSVIVLEKPGPPVGPVRFDEVSADFVVISWEPPAYTGGCQISNYIVEKRDTTTTTWHMVSATVARTTIKITKLKTGTEYQFRIFAENRYGKSAPLDSKAVIVQYPFKEPGPPGTPFVTSISKDQMLVQWHEPVNDGGTKIIGYHLEQKEKNSILWVKLNKTPIQDTKFKTTGLDEGLEYEFKVSAENIVGIGKPSKVSECFVARDPCDPPGRPEAIVITRNNVTLKWKKPAYDGGSKITGY IVEKKDLPDGRWMKASFTNVLETEFTVSGLVEDQRYEFRVIARNAAGNFSEPSDSSGAITARDEIDAPNASLDPKYKDVIVVHAGETFVLEADIRGKPIPDVVWSKDGKELEETAARMEIKSTIQKTTLVVKDCIRTDGGQYILKLSNVGGTKSIPITVKVLDRPGPPEGPLKVTGVTAEKCYLAWNPPLQDGGANISHYIIEKRETSRLSWTQVSTEVQALNYKVTKLLPGNEYIFRVMAVNKYGIGEPLESGPVTACNPYKPPGPPSTPEVSAITKDSMVVTWARPVDDGGTEIEGYILEKRDKEGVRWTKCNKKTLTDLRLRVTGLTEG HSYEFRVAAENAAGVGEPSEPSVFYRACDALYPPGPPSNPKVTDTSRSSVSLAWSKPIYDGGAPVKGYVVEVKEAAADEWTTCTPPTGLQGKQFTVTKLKENTEYNFRICAINSEGVGEPATLPGSVVAQERIEPPEIELDADLRKVVVLRASATLRLFVTIKGRPEPEVKWEKAEGILTDRAQIEVTSSFTMLVIDNVTRFDSGRYNLTLENNSGSKTAFVNVRVLDSPSAPVNLTIREVKKDSVTLSWEPPLIDGGAKITNYIVEKRETTRKAYATITNNCTKTTFRIENLQEGCSYYFRVLASNEYGIGLPAETTEPVKVSEPPLPPGR VTLVDVTRNTATIKWEKPESDGGSKITGYVVEMQTKGSEKWSTCTQVKTLEATISGLTAGEEYVFRVAAVNEKGRSDPRQLGVPVIARDIEIKPSVELPFHTFNVKAREQLKIDVPFKGRPQATVNWRKDGQTLKETTRVNVSSSKTVTSLSIKEASKEDVGTYELCVSNSAGSITVPITIIVLDRPGPPGPIRIDEVSCDSITISWNPPEYDGGCQISNYIVEKKETTSTTWHIVSQAVARTSIKIVRLTTGSEYQFRVCAENRYGKSSYSESSAVVAEYPFSPPGPPGTPKVVHATKSTMLVTWQVPVNDGGSRVIGYHLEYKERSSILW SKANKILIADTQMKVSGLDEGLMYEYRVYAENIAGIGKCSKSCEPVPARDPCDPPGQPEVTNITRKSVSLKWSKPHYDGGAKITGYIVERRELPDGRWLKCNYTNIQETYFEVTELTEDQRYEFRVFARNAADSVSEPSESTGPIIVKDDVEPPRVMMDVKFRDVIVVKAGEVLKINADIAGRPLPVISWAKDGIEIEERARTEIISTDNHTLLTVKDCIRRDTGQYVLTLKNVAGTRSVAVNCKVLDKPGPPAGPLEINGLTAEKCSLSWGRPQEDGGADIDYYIVEKRETSHLAWTICEGELQMTSCKVTKLLKGNEYIFRVTGVNKYGV GEPLESVAIKALDPFTVPSPPTSLEITSVTKESMTLCWSRPESDGGSEISGYIIERREKNSLRWVRVNKKPVYDLRVKSTGLREGCEYEYRVYAENAAGLSLPSETSPLIRAEDPVFLPSPPSKPKIVDSGKTTITIAWVKPLFDGGAPITGYTVEYKKSDDTDWKTSIQSLRGTEYTISGLTTGAEYVFRVKSVNKVGASDPSDSSDPQIAKEREEEPLFDIDSEMRKTLIVKAGASFTMTVPFRGRPVPNVLWSKPDTDLRTRAYVDTTDSRTSLTIENANRNDSGKYTLTIQNVLSAASLTLVVKVLDTPGPPTNITVQDVTKESAVLS WDVPENDGGAPVKNYHIEKREASKKAWVSVTNNCNRLSYKVTNLQEGAIYYFRVSGENEFGVGIPAETKEGVKITEKPSPPEKLGVTSISKDSVSLTWLKPEHDGGSRIVHYVVEALEKGQKNWVKCAVAKSTHHVVSGLRENSEYFFRVFAENQAGLSDPRELLLPVLIKEQLEPPEIDMKNFPSHTVYVRAGSNLKVDIPISGKPLPKVTLSRDGVPLKATMRFNTEITAENLTINLKESVTADAGRYEITAANSSGTTKAFINIVVLDRPGPPTGPVVISDITEESVTLKWEPPKYDGGSQVTNYILLKRETSTAVWTEVSATVARTMM KVMKLTTGEEYQFRIKAENRFGISDHIDSACVTVKLPYTTPGPPSTPWVTNVTRESITVGWHEPVSNGGSAVVGYHLEMKDRNSILWQKANKLVIRTTHFKVTTISAGLIYEFRVYAENAAGVGKPSHPSEPVLAIDACEPPRNVRITDISKNSVSLSWQQPAFDGGSKITGYIVERRDLPDGRWTKASFTNVTETQFIISGLTQNSQYEFRVFARNAVGSISNPSEVVGPITCIDSYGGPVIDLPLEYTEVVKYRAGTSVKLRAGISGKPAPTIEWYKDDKELQTNALVCVENTTDLASILIKDADRLNSGCYELKLRNAMGSASATIRVQ ILDKPGPPGGPIEFKTVTAEKITLLWRPPADDGGAKITHYIVEKRETSRVVWSMVSEHLEECIITTTKIIKGNEYIFRVRAVNKYGIGEPLESDSVVAKNAFVTPGPPGIPEVTKITKNSMTVVWSRPIADGGSDISGYFLEKRDKKSLGWFKVLKETIRDTRQKVTGLTENSDYQYRVCAVNAAGQGPFSEPSEFYKAADPIDPPGPPAKIRIADSTKSSITLGWSKPVYDGGSAVTGYVVEIRQGEEEEWTTVSTKGEVRTTEYVVSNLKPGVNYYFRVSAVNCAGQGEPIEMNEPVQAKDILEAPEIDLDVALRTSVIAKAGEDVQVLI PFKGRPPPTVTWRKDEKNLGSDARYSIENTDSSSLLTIPQVTRNDTGKYILTIENGVGEPKSSTVSVKVLDTPAACQKLQVKHVSRGTVTLLWDPPLIDGGSPIINYVIEKRDATKRTWSVVSHKCSSTSFKLIDLSEKTPFFFRVLAENEIGIGEPCETTEPVKAAEVPAPIRDLSMKDSTKTSVILSWTKPDFDGGSVITEYVVERKGKGEQTWSHAGISKTCEIEVSQLKEQSVLEFRVFAKNEKGLSDPVTIGPITVKELIITPEVDLSDIPGAQVTVRIGHNVHLELPYKGKPKPSISWLKDGLPLKESEFVRFSKTENKITLSIKN AKKEHGGKYTVILDNAVCRIAVPITVITLGPPSKPKGPIRFDEIKADSVILSWDVPEDNGGGEITCYSIEKRETSQTNWKMVCSSVARTTFKVPNLVKDAEYQFRVRAENRYGVSQPLVSSIIVAKHQFRIPGPPGKPVIYNVTSDGMSLTWDAPVYDGGSEVTGFHVEKKERNSILWQKVNTSPISGREYRATGLVEGLDYQFRVYAENSAGLSSPSDPSKFTLAVSPVDPPGTPDYIDVTRETITLKWNPPLRDGGSKIVGYSIEKRQGNERWVRCNFTDVSECQYTVTGLSPGDRYEFRIIARNAVGTISPPSQSSGIIMTRDENVPPI VEFGPEYFDGLIIKSGESLRIKALVQGRPVPRVTWFKDGVEIEKRMNMEITDVLGSTSLFVRDATRDHRGVYTVEAKNASGSAKAEIKVKVQDTPGKVVGPIRFTNITGEKMTLWWDAPLNDGCAPITHYIIEKRETSRLAWALIEDKCEAQSYTAIKLINGNEYQFRVSAVNKFGVGRPLDSDPVVAQIQYTVPDAPGIPEPSNITGNSITLTWARPESDGGSEIQQYILERREKKSTRWVKVISKRPISETRFKVTGLTEGNEYEFHVMAENAAGVGPASGISRLIKCREPVNPPGPPTVVKVTDTSKTTVSLEWSKPVFDGGMEIIGYI IEMCKADLGDWHKVNAEACVKTRYTVTDLQAGEEYKFRVSAINGAGKGDSCEVTGTIKAVDRLTAPELDIDANFKQTHVVRAGASIRLFIAYQGRPTPTAVWSKPDSNLSLRADIHTTDSFSTLTVENCNRNDAGKYTLTVENNSGSKSITFTVKVLDTPGPPGPITFKDVTRGSATLMWDAPLLDGGARIHHYVVEKREASRRSWQVISEKCTRQIFKVNDLAEGVPYYFRVSAVNEYGVGEPYEMPEPIVATEQPAPPRRLDVVDTSKSSAVLAWLKPDHDGGSRITGYLLEMRQKGSDFWVEAGHTKQLTFTVERLVEKTEYEFRVKAK NDAGYSEPREAFSSVIIKEPQIEPTADLTGITNQLITCKAGSPFTIDVPISGRPAPKVTWKLEEMRLKETDRVSITTTKDRTTLTVKDSMRGDSGRYFLTLENTAGVKTFSVTVVVIGRPGPVTGPIEVSSVSAESCVLSWGEPKDGGGTEITNYIVEKRESGTTAWQLVNSSVKRTQIKVTHLTKYMEYSFRVSSENRFGVSKPLESAPIIAEHPFVPPSAPTRPEVYHVSANAMSIRWEEPYHDGGSKIIGYWVEKKERNTILWVKENKVPCLECNYKVTGLVEGLEYQFRTYALNAAGVSKASEASRPIMAQNPVDAPGRPEVTDVTRS TVSLIWSAPAYDGGSKVVGYIIERKPVSEVGDGRWLKCNYTIVSDNFFTVTALSEGDTYEFRVLAKNAAGVISKGSESTGPVTCRDEYAPPKAELDARLHGDLVTIRAGSDLVLDAAVGGKPEPKIIWTKGDKELDLCEKVSLQYTGKRATAVIKFCDRSDSGKYTLTVKNASGTKAVSVMVKVLDSPGPCGKLTVSRVTQEKCTLAWSLPQEDGGAEITHYIVERRETSRLNWVIVEGECPTLSYVVTRLIKNNEYIFRVRAVNKYGPGVPVESEPIVARNSFTIPSPPGIPEEVGTGKEHIIIQWTKPESDGGNEISNYLVDKREKKSLR WTRVNKDYVVYDTRLKVTSLMEGCDYQFRVTAVNAAGNSEPSEASNFISCREPSYTPGPPSAPRVVDTTKHSISLAWTKPMYDGGTDIVGYVLEMQEKDTDQWYRVHTNATIRNTEFTVPDLKMGQKYSFRVAAVNVKGMSEYSESIAEIEPVERIEIPDLELADDLKKTVTIRAGASLRLMVSVSGRPPPVITWSKQGIDLASRAIIDTTESYSLLIVDKVNRYDAGKYTIEAENQSGKKSATVLVKVYDTPGPCPSVKVKEVSRDSVTITWEIPTIDGGAPVNNYIVEKREAAMRAFKTVTTKCSKTLYRISGLVEGTMYYFRVLPENIY GIGEPCETSDAVLVSEVPLVPAKLEVVDVTKSTVTLAWEKPLYDGGSRLTGYVLEACKAGTERWMKVVTLKPTVLEHTVTSLNEGEQYLFRIRAQNEKGVSEPRETVTAVTVQDLRVLPTIDLSTMPQKTIHVPAGRPVELVIPIAGRPPPAASWFFAGSKLRESERVTVETHTKVAKLTIRETTIRDTGEYTLELKNVTGTTSETIKVIILDKPGPPTGPIKIDEIDATSITISWEPPELDGGAPLSGYVVEQRDAHRPGWLPVSESVTRSTFKFTRLTEGNEYVFRVAATNRFGIGSYLQSEVIECRSSIRIPGPPETLQIFDVSRDGMT LTWYPPEDDGGSQVTGYIVERKEVRADRWVRVNKVPVTMTRYRSTGLTEGLEYEHRVTAINARGSGKPSRPSKPIVAMDPIAPPGKPQNPRVTDTTRTSVSLAWSVPEDEGGSKVTGYLIEMQKVDQHEWTKCNTTPTKIREYTLTHLPQGAEYRFRVLACNAGGPGEPAEVPGTVKVTEMLEYPDYELDERYQEGIFVRQGGVIRLTIPIKGKPFPICKWTKEGQDISKRAMIATSETHTELVIKEADRGDSGTYDLVLENKCGKKAVYIKVRVIGSPNSPEGPLEYDDIQVRSVRVSWRPPADDGGADILGYILERREVPKAAWYTIDSR VRGTSLVVKGLKENVEYHFRVSAENQFGISKPLKSEEPVTPKTPLNPPEPPSNPPEVLDVTKSSVSLSWSRPKDDGGSRVTGYYIERKETSTDKWVRHNKTQITTTMYTVTGLVPDAEYQFRIIAQNDVGLSETSPASEPVVCKDPFDKPSQPGELEILSISKDSVTLQWEKPECDGGKEILGYWVEYRQSGDSAWKKSNKERIKDKQFTIGGLLEATEYEFRVFAENETGLSRPRRTAMSIKTKLTSGEAPGIRKEMKDVTTKLGEAAQLSCQIVGRPLPDIKWYRFGKELIQSRKYKMSSDGRTHTLTVMTEEQEDEGVYTCIATNEVGE VETSSKLLLQATPQFHPGYPLKEKYYGAVGSTLRLHVMYIGRPVPAMTWFHGQKLLQNSENITIENTEHYTHLVMKNVQRKTHAGKYKVQLSNVFGTVDAILDVEIQDKPDKPTGPIVIEALLKNSAVISWKPPADDGGSWITNYVVEKCEAKEGAEWQLVSSAISVTTCRIVNLTENAGYYFRVSAQNTFGISDPLEVSSVVIIKSPFEKPGAPGKPTITAVTKDSCVVAWKPPASDGGAKIRNYYLEKREKKQNKWISVTTEEIRETVFSVKNLIEGLEYEFRVKCENLGGESEWSEISEPITPKSDVPIQAPHFKEELRNLNVRYQSNA TLVCKVTGHPKPIVKWYRQGKEIIADGLKYRIQEFKGGYHQLIIASVTDDDATVYQVRATNQGGSVSGTASLEVEVPAKIHLPKTLEGMGAVHALRGEVVSIKIPFSGKPDPVITWQKGQDLIDNNGHYQVIVTRSFTSLVFPNGVERKDAGFYVVCAKNRFGIDQKTVELDVADVPDPPRGVKVSDVSRDSVNLTWTEPASDGGSKITNYIVEKCATTAERWLRVGQARETRYTVINLFGKTSYQFRVIAENKFGLSKPSEPSEPTITKEDKTRAMNYDEEVDETREVSMTKASHSSTKELYEKYMIAEDLGRGEFGIVHRCVETSSKKTY MAKFVKVKGTDQVLVKKEISILNIARHRNILHLHESFESMEELVMIFEFISGLDIFERINTSAFELNEREIVSYVHQVCEALQFLHSHNIGHFDIRPENIIYQTRRSSTIKIIEFGQARQLKPGDNFRLLFTAPEYYAPEVHQHDVVSTATDMWSLGTLVYVLLSGINPFLAETNQQIIENIMNAEYTFDEEAFKEISIEAMDFVDRLLVKERKSRMTASEALQHPWLKQKIERVSTKVIRTLKHRRYYHTLIKKDLNMVVSAARISCGGAIRSQKGVSVAKVKVASIEIGPVSGQIMHAVGEEGGHVKYVCKIENYDQSTQVTWYFGVRQL ENSEKYEITYEDGVAILYVKDITKLDDGTYRCKVVNDYGEDSSYAELFVKGVREVYDYYCRRTMKKIKRRTDTMRLLERPPEFTLPLYNKTAYVGENVRFGVTITVHPEPHVTWYKSGQKIKPGDNDKKYTFESDKGLYQLTINSVTTDDDAEYTVVARNKYGEDSCKAKLTVTLHPPPTDSTLRPMFKRLLANAECQEGQSVCFEIRVSGIPPPTLKWEKDGQPLSLGPNIEIIHEGLDYYALHIRDTLPEDTGYYRVTATNTAGSTSCQAHLQVERLRYKKQEFKSKEEHERHVQKQIDKTLRMAEILSGTESVPLTQVAKEALREAAVL YKPAVSTKTVKGEFRLEIEEKKEERKLRMPYDVPEPRKYKQTTIEEDQRIKQFVPMSDMKWYKKIRDQYEMPGKLDRVVQKRPKRIRLSRWEQFYVMPLPRITDQYRPKWRIPKLSQDDLEIVRPARRRTPSPDYDFYYRPRRRSLGDISDEELLLPIDDYLAMKRTEEERLRLEEELELGFSASPPSRSPPHFELSSLRYSSPQAHVKVEETRKDFRYSTYHIPTKAEASTSYAELRERHAQAAYRQPKQRQRIMAEREDEELLRPVTTTQHLSEYKSELDFMSKEEKSRKKSRRQREVTEITEIEEEYEISKHAQRESSSSASRLLRRRR SLSPTYIELMRPVSELIRSRPQPAEEYEDDTERRSPTPERTRPRSPSPVSSERSLSRFERSARFDIFSRYESMKAALKTQKTSERKYEVLSQQPFTLDHAPRITLRMRSHRVPCGQNTRFILNVQSKPTAEVKWYHNGVELQESSKIHYTNTSGVLTLEILDCHTDDSGTYRAVCTNYKGEASDYATLDVTGGDYTTYASQRRDEEVPRSVFPELTRTEAYAVSSFKKTSEMEASSSVREVKSQMTETRESLSSYEHSASAEMKSAALEEKSLEEKSTTRKIKTTLAARILTKPRSMTVYEGESARFSCDTDGEPVPTVTWLRKGQVLSTSA RHQVTTTKYKSTFEISSVQASDEGNYSVVVENSEGKQEAEFTLTIQKARVTEKAVTSPPRVKSPEPRVKSPEAVKSPKRVKSPEPSHPKAVSPTETKPTPTEKVQHLPVSAPPKITQFLKAEASKEIAKLTCVVESSVLRAKEVTWYKDGKKLKENGHFQFHYSADGTYELKINNLTESDQGEYVCEISGEGGTSKTNLQFMGQAFKSIHEKVSKISETKKSDQKTTESTVTRKTEPKAPEPISSKPVIVTGLQDTTVSSDSVAKFAVKATGEPRPTAIWTKDGKAITQGGKYKLSEDKGGFFLEIHKTDTSDSGLYTCTVKNSAGSVSSSC KLTIKAIKDTEAQKVSTQKTSEITPQKKAVVQEEISQKALRSEEIKMSEAKSQEKLALKEEASKVLISEEVKKSAATSLEKSIVHEEITKTSQASEEVRTHAEIKAFSTQMSINEGQRLVLKANIAGATDVKWVLNGVELTNSEEYRYGVSGSDQTLTIKQASHRDEGILTCISKTKEGIVKCQYDLTLSKELSDAPAFISQPRSQNINEGQNVLFTCEISGEPSPEIEWFKNNLPISISSNVSISRSRNVYSLEIRNASVSDSGKYTIKAKNFRGQCSATASLMVLPLVEEPSREVVLRTSGDTSLQGSFSSQSVQMSASKQEASFSSFSS SSASSMTEMKFASMSAQSMSSMQESFVEMSSSSFMGISNMTQLESSTSKMLKAGIRGIPPKIEALPSDISIDEGKVLTVACAFTGEPTPEVTWSCGGRKIHSQEQGRFHIENTDDLTTLIIMDVQKQDGGLYTLSLGNEFGSDSATVNIHIRSI Titin34,350 residues Tuesday, August 13, 13
  29. BLAST Tuesday, August 13, 13

  30. 30-90 seconds you say? Tuesday, August 13, 13

  31. challenge accepted. Tuesday, August 13, 13

  32. Exact identity: term filter Tuesday, August 13, 13

  33. Exact identity: term filter Similarity: fuzzy query Tuesday, August 13,

    13
  34. Exact identity: term filter Similarity: fuzzy query Gaps: sloppy phrase

    Tuesday, August 13, 13
  35. a month passes Tuesday, August 13, 13

  36. panic sets in Tuesday, August 13, 13

  37. a sampler of my failures (with success at the end)

    Tuesday, August 13, 13
  38. n-grams MAQDQGEKENPMRELRIRKL Tuesday, August 13, 13

  39. n-grams MAQDQGEKENPMRELRIRKL MAQ MAQ Tuesday, August 13, 13

  40. n-grams MAQDQGEKENPMRELRIRKL MAQ AQD AQD Tuesday, August 13, 13

  41. n-grams MAQDQGEKENPMRELRIRKL MAQ AQD QDQ QDQ ... ... Tuesday, August

    13, 13
  42. query: match: query: “MAQDQGEKENPMRELRIRKL” analyzer: ngram3 min_should_match: 20% match query

    Tuesday, August 13, 13
  43. match query precision: recall: speed: C+ B+ D+ not fuzzy

    enough - poor gap support - I/O thrashing Tuesday, August 13, 13
  44. query: bool: must: match: query: “MAQDQGEKENPMRELRIRKL” min_should_match: 20% should: -

    custom_filters_score: query: match_all filters: - “MAQ” (exact match), boost10 - “MAE” (alternative seed), boost5 - “MRL” (alternative seed), boost3 score_mode: first - custom_filters_score: ... ... min_should_match: 20% custom_filters_score Tuesday, August 13, 13
  45. custom_filters_score precision: recall: speed: C+ A+ D+ fuzzy with no

    ordering - I/O thrashing - filter evictions Tuesday, August 13, 13
  46. query: bool: should: - span_near: - span_term (MAQ) - span_term

    (KEN) - span_near: - span_term (KEN) - span_or: - span_term (RKL) - span_term (ROL) - span_term (RED) - span_near: - span_or: - span_term (RKL) - span_term (ROL) - span_term (RED) - span_term (...) ... span queries MAQDQGEKENPMRELRIRKL Tuesday, August 13, 13
  47. span queries precision: recall: speed: A- A- F- slow, unable

    to filter Tuesday, August 13, 13
  48. Bucketed spans Chaos Game Representation w/ geohash Common Terms query

    Fuzzy query Synonym injection ... many hours of experiments FAILED Tuesday, August 13, 13
  49. what’s the problem? Tuesday, August 13, 13

  50. sparsity inverted indices love Tuesday, August 13, 13

  51. 20 characters alphabet 3-gram: 4-gram: 5-gram: 6,840 permutations 116,280 1,860,480

    Tuesday, August 13, 13
  52. 20 characters alphabet 3-gram: 5-gram: too much sharing! too insensitive!

    Tuesday, August 13, 13
  53. 3-gram: single hit Tuesday, August 13, 13

  54. 3-gram: single hit Tuesday, August 13, 13

  55. we need a new approach. Tuesday, August 13, 13

  56. we need a new approach. sparsity. Tuesday, August 13, 13

  57. we need a new approach. sparsity. fuzziness. Tuesday, August 13,

    13
  58. we need a new approach. sparsity. fuzziness. a hash! Tuesday,

    August 13, 13
  59. hash? we need a Tuesday, August 13, 13

  60. hash: distributes similar values into unique buckets Tuesday, August 13,

    13
  61. locality sensitive hash: distributes similar values into the same bucket

    Tuesday, August 13, 13
  62. collisions useful Tuesday, August 13, 13

  63. Random Projections 1) pick k random 3-grams QDK CVV QED

    MRE ... Tuesday, August 13, 13
  64. Random Projections 2) iterate over sequence MAQDQGEKENPMRELRIRKL -3 QDK QDK

    Tuesday, August 13, 13
  65. Random Projections 2) iterate over sequence ...QDK MAQDQGEKENPMRELRIRKL 6 QDK

    Tuesday, August 13, 13
  66. Random Projections 2) iterate over sequence MAQDQGEKENPMRELRIRKL 13 ......QDK QDK

    Tuesday, August 13, 13
  67. Random Projections 2) iterate over sequence MAQDQGEKENPMRELRIRKL 13 Emit a

    “1” ......QDK QDK 1 Tuesday, August 13, 13
  68. Random Projections 3) repeat for all trigrams MAQDQGEKENPMRELRIRKL .............................................. 0

    CVV ......QDK QDK 1 Tuesday, August 13, 13
  69. Random Projections 3) repeat for all trigrams MAQDQGEKENPMRELRIRKL .............................................. 0

    ...........QED 1 ..........................MRE 1 CVV QED MRE ......QDK QDK 1 Tuesday, August 13, 13
  70. 101101000101000101 bitmap Tuesday, August 13, 13

  71. 101101000101000101 Observation #1 Each bit represents a fuzzy trigram evaluation

    “QDK” Tuesday, August 13, 13
  72. 101101000101000101 Observation #2 Each sequence is compressed to k bits

    Titin 34,350 residues Tuesday, August 13, 13
  73. 101101000101000101 Observation #3 Similar sequences share similar bitmaps 100101000101000101 Tuesday,

    August 13, 13
  74. 101101000101000101 100101000101000101 Hamming Distance SLOW Tuesday, August 13, 13

  75. 101101000101000101 Observation #4 Output index values as tokens 001 003

    Tuesday, August 13, 13
  76. minimum_should_match Match Query FAST! Tuesday, August 13, 13

  77. Observation #5 Term dictionary compressed to k terms Cacheable! Tuesday,

    August 13, 13
  78. query: filtered: query: match: query: “MAQDQGEKENPMRELRIRKL” min_should_match: 20% analyzer: ngram3

    filter: query: match: query: “MAQDQGEKENPMRELRIRKL” min_should_match: 60% analyzer: lsh rescore: smith_waterman: query: “MAQDQGEKENPMRELRIRKL” LSH query Tuesday, August 13, 13
  79. query: filtered: query: match: query: “MAQDQGEKENPMRELRIRKL” min_should_match: 20% analyzer: ngram3

    filter: query: match: query: [0x01, 0x03, 0x16] min_should_match: 60% analyzer: lsh rescore: smith_waterman: query: “MAQDQGEKENPMRELRIRKL” LSH query Tuesday, August 13, 13
  80. LSH query precision: recall: speed: B+ A+ B+ Tuesday, August

    13, 13
  81. Probabilistic Slower indexing speed Many parameters to tune Tuesday, August

    13, 13
  82. Questions? ʘ‿ʘ ಠ_ಠ Tuesday, August 13, 13