Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigQuery Python UDFで電話番号正規化にチャレンジ

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

BigQuery Python UDFで電話番号正規化にチャレンジ

2026/01/12 しゃち.py みーてぃんぐ 14th. 新年LT大会

Avatar for leonard475192

leonard475192

January 12, 2026
Tweet

More Decks by leonard475192

Other Decks in Programming

Transcript

  1. UDF UDF (User-Defined Function) x, ÷ü¿ùü¹~}¿· wßÿ|÷wt ¯~ zý¸ó¿´{Û~ó¸ý¹‰þ²SQL w߯y»±~ýwy.

    BigQueryw, s¼~wJavaScriptwUDF²šÿw, {þSQL¿¸ú{¹öç|sûy sx|w}~w. CREATE FUNCTION mydataset.AddFourAndDivide(x INT64, y INT64) RETURNS FLOAT64 AS ( (x + 4) / y ); SELECT val, AddFourAndDivide(val, 2) as rusult FROM UNNEST([2,3,5,8]) AS val; µó÷û³üù: Google Cloud~}_ù½õñóø¸º
  2. JS UDF ; ; BigQuery~JavaScript UDF~tu¹tùxwv, _øJavaScriptù´öùú~{o y»{, ¹¿ú÷ø²GCS{²ó÷ýüùy»ß‰|rº~w CREATE

    TEMP FUNCTION myFunc(a FLOAT64, b STRING) RETURNS STRING LANGUAGE js OPTIONS ( library=['gs://my-bucket/path/to/lib1.js', 'gs://my-bucket/path/to/lib2.js']) AS r""" // Assumes 'doInterestingStuff' is defined in one of the library files. return doInterestingStuff(a, b); """; µó÷û³üù: Google Cloud~}_ù½õñóø¸º
  3. Python UDF OPTIONS (packages='...') {, {owtù´öùúx}~ðü¸÷󲚏y»€qw, _øù´öùú²ow}~y. CREATE FUNCTION `PROJECT_ID.DATASET_ID`.area(radius

    FLOAT64) RETURNS FLOAT64 LANGUAGE python OPTIONS (entry_point='area_handler', runtime_version='python-3.11', // packages=['scipy==1.15.3']) AS r""" import scipy def area_handler(radius): return scipy.constants.pi*radius*radius """; SELECT `PROJECT_ID.DATASET_ID`.area(4.5); µó÷û³üù: Google Cloud}_ù½õñóø¸º
  4. vs BigQuery NoteBook BigQuery~÷ü¿² vy»ÿöxwv, BigQuery NoteBook (colabtz²~) |rº, Û߯y»sx²w}~y.

    BigQuery NoteBookxwv, BigQuery Python UDF{ïO~~ß?|rº~y. 1 BigQuery NoteBookwßÍu¼»~¹¿ú÷øwy|, BigQuery Python UDF· wy. }~±, UDF~unittest~ß}²otv _ö ¹ø²ßÍw}~y(v³) 2 BigQuery NoteBookwßÍu¼»~¹¿ú÷ø~±, ýþ߉zý¸ ó¿²ßÍy»ß‰|rº~y. BigQuery Python UDF, ߯kÖuxu ¼€, BigQueryNw¿¸ú²þ¸v{ys{¹w²|sûysx|w} ~y
  5. Python import phonenumbers from phonenumbers import PhoneNumberFormat from phonenumbers.phonenumberutil import

    NumberParseException def main(phone_number: str | None) -> dict | None: if phone_number is None or not isinstance(phone_number, str): return None phone_number = phone_number.strip() if not phone_number: return None try: parsed = phonenumbers.parse(phone_number, "JP") return { "is_valid": phonenumbers.is_valid_number(parsed), "national_format": phonenumbers.format_number(parsed, PhoneNumberFormat.NATIONAL), "international_format": phonenumbers.format_number(parsed, PhoneNumberFormat.INTERNATIONAL), "country_code": strparsed.country_code, "national_number": strparsed.national_number, "extension": strparsed.extension, "italian_leading_zero": strparsed.italian_leading_zero, "number_of_leading_zeros": strparsed.number_of_leading_zeros, "raw_input": strparsed.raw_input, "country_code_source": parsed.country_code_source, "preferred_domestic_carrier_code": parsed.preferred_domestic_carrier_code, } except NumberParseException: return None
  6. BigQuery Python UDF create or replace function `sandbox.normalize_phone_number`(phone_number string) //

    NOTE: https://github.com/daviddrysdale/python-phonenumbers/blob/dev/python/phonenumbers/phonenumber.pyi returns struct< is_valid bool, national_format string, international_format string, country_code int, national_number int, extension string, italian_leading_zero bool, number_of_leading_zeros int, raw_input string, country_code_source int, preferred_domestic_carrier_code string > language python options ( runtime_version = "python-3.11", entry_point = "main", packages = ["phonenumbers"] ) AS /* python */""" // python """
  7. Python UDF Tips Python BigQuery Python UDF_Þ߯{1min{y{{º~w. BigQuery UDFxwv÷÷ý´y»_{, Jupyter

    Notebook{w, Python³üù ~ý¸ó¿²ö¹øz¸s ¾y»sxw®²¿ŸWw}~y. s¼{¸º, Ÿ~µ´¿û²ÿ~w, BigQueryNw~÷ðóÀ_m²¹ysx |w}~y. Python Python UDF´JS UDF, BigQuery~¿¸úñõ¹üþó¹{ Wu{»ÿ ²Nx»ÿýg|rº~y. ²w{þSQL· w‰þ²{»~wr¼€, {þSQL²vsxw¿Ÿz÷ü¿u²ßÿw}~y.
  8. ¾ t/ø~÷ü¿w, Google libphonenumber{¸»kW|w}zzs±ü¹|rº~w. €w, s¼¹~±ü¹, Îß~_ vw ÿýxtx}vww. y¿ÿ´õó<ÿ=(U+FF

    ), OÞ}þ´ú¹<«=(U+ B), NÞ}þ´ú¹<{=(U+ B) |~¼»ÿû÷{ztv, kWw}~{³ww. ÿû÷{O‰zš÷´o_W (Ï: <TEL:=zy)|ÿ/wvt»|\, ÿû÷~²kwÿûw}~{³ww.
  9. ¾ ߯f—}11]. ³¹ø{ö}y»í¿¹ýóøf—²8]{ òw~w. o (JS UDF) ÎÞ (Python UDF)

    ߯f— 6.76û 1V16.36û (}11.3]) í¿¹ýóøf— 6.61û 53.15û (}8.0]) ³ 0.1×¹t zz, \{~Õî²ßÿy»?ÿxwv, CloudRun²qNrv, Remote Function{¹API²ûxtv²÷ýüñx~¯svtzt|, \ x¹t~߯f—xí¿¹ýóøf—z |w~w. CloudRun + Remote Function~ërq»CloudRun~_Só²qNr»³¹ø|, Previewz{ztvPython UDF{ztv, }~³¹ø|ÿ ~owzt±, ||~Ûuxtx}vx}t~w.
  10. 1 BigQuery Python UDF, Previewz~ ±, qýw{ow}~y. GA{rsv, ý¯|y~¸vz¯Ûu ¼»{woôÛw{oy»{~݉

    zö/{zº}vwy. 2 TUDFwPython~ðü¸÷óxù´ö ùú~ðü¸÷ó²öçgÛwvt~ y. unittest~ßÍ / CIwñ·ó¿ Ai {¸»ðü¸÷ó²ó÷ {¸º, ħ ö{ñóöúó¹ÿýw ÿx}svt~y. 3 <Þ~ß͸ùü|~owx}, NULL²ßwvt~y. Ûo²ƒowx}, ¸ùüñó»ü¸ ²ü¹ýó¹{±»sxw, ĉÛg² r¹¼}vx}svt~y.
  11. BigQuery Python UDF{¸º, phonenumbers~¸vz?z_øù´öù ú²BigQuery wöç{ow},  z÷ü¿ v|ûw{zº~ w.

    ETL ߯f—xí¿¹ýóø òw~w|, ¯~{O³¹øw¿Õ îzkW²ßÿ·Þ~½úõü·÷óxwv², ¿o ûÿ {{¼vt»sx|Wu¼~w.{