Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyGotham: Absolutely Awesome Automated APIs

Tim A.
October 04, 2019

PyGotham: Absolutely Awesome Automated APIs

When you have 60,000 tables and views across hundreds of schemata sitting on top of a 3 petabyte storage footprint, automation and access privileges are key if you want to expose these data through a RESTful API. Our University’s research service leverages Python, Django, Django REST Framework, and PostgreSQL to accomplish this. We are continuing to open source the tools we have built to make this possible!

Databases stay relevant by continuing to reinvent themselves to serve new technologies further up the stack. The latest buzzwords further up that stack are APIs and microservices, so prevalent that it is hard to see a tech advertisement that doesn’t mention them. While related to “the cloud” and “big data”, whatever the heck those terms actually mean, APIs and microservices have a slightly less annoying marketing schtick and more concrete relations to relational databases.

But what do these relations look like in practice? In this talk, the speaker will show how his team at the University has evolved from providing financial data exclusively in SAS data formats to a robust backend powered by PostgreSQL, which allows financial research to happen in many ecosystems: still available in SAS, but also R, Python, Perl, Matlab, Julia, and more. The speaker will present a case study of using Django REST Framework to build an “API through introspection.” This case study will show how Django web site and RESTful web service were built by introspecting financial data stored in a PostgreSQL database cluster.

The models for the ORM, serializers for the RESTful API, views for presenting the data to a user, filters for refining queries, URL routing, web browsable interface, user token authorization, and permissions, are all created by introspecting the PostgreSQL database information schema, all with Python. These components have now been open-sourced, including the endpoint spreadsheet exporter and the generic automated API builder.

Tim A.

October 04, 2019
Tweet

More Decks by Tim A.

Other Decks in Technology

Transcript

  1. AW E S O M E A U T O

    M AT E D A P I S W I T H A U T O M A G I C - R E S T … G O O D D E C I S I O N S , B A D D E C I S I O N S , A N D P U S H I N G T H E L I M I T S O F D J A N G O T H A N K Y O U T O : @ T H E P R A C T I C A L D E V F O R T H E FA K E C O V E R S , B A RT E K F O R T H E P H O T O , A N D T H E W R D S T E A M F O R M A K I N G T H I S A L L P O S S I B L E !
  2. H O W D Y ! I ’ M T

    I M . M Y P R O N O U N S : H E / H I M I T D I R E C T O R , A D VA N C E D I N I T I AT I V E S F O R W H A R T O N R E S E A R C H D ATA S E R V I C E S ( W R D S ) AT T H E W H A R T O N S C H O O L @ F L I P P E R PA - T W I T T E R , G I T H U B , FA C E B O O K , E V E RY W H E R E ! P H I L LY P U G A N D C O M M U N I T Y T E C H E V E N T O R G A N I Z E R P Y T H O N I S TA A N D D J A N G O N A U T ( P S F & D S F M E M B E R ) F U N L O V I N G G E E K , H O C K E Y FA N ( G O F LY E R S ! ) , G U I TA R I S T, O W N E D B Y A C AT I R E A L LY L O V E I C E C R E A M !
  3. W R D S : A B R I E

    F H I S T O RY A S I N G L E S O U R C E F O R L E A D I N G G L O B A L R E S E A R C H D ATA B A S E S F I R S T A C A D E M I C C L I E N T: S TA N F O R D U N I V E R S I T Y I N 1 9 9 7 F I R S T G O V E R N M E N T C L I E N T: F R B N Y I N 2 0 0 3 F I R S T C O R P O R AT E C L I E N T: C O M PA S S - L E X E C O N I N 2 0 1 1 F O U N D E D I N 1 9 9 2 F O R W H A R T O N FA C U LT Y ) I N I T I A L LY P R O V I D E D F O R S A S , W I T H S & P A N D C R S P D ATA
  4. W R D S : T O D AY A

    S I N G L E S O U R C E F O R L E A D I N G G L O B A L R E S E A R C H D ATA B A S E S S S H S U P P O R T O N O U R C L O U D : S A S , R , P Y T H O N , & M O R E 4 0 0 + T B O F D ATA ; 3 + P B T O TA L S T O R A G E F O O T P R I N T C O N T I N U I N G E X PA N S I O N B E Y O N D F I N A N C I A L D ATA A L M O S T 5 0 0 I N S T I T U T I O N S ; ~ 5 0 % O F G R O W T H H A S B E E N I N T E R N AT I O N A L O V E R T H E PA S T S I X Y E A R S 6 3 , 0 0 0 + A C T I V E U S E R S V I A W E B , S S H , P O S T G R E S Q L , R E M O T E C O N N E C T S
  5. D U B I O U S D ATA B

    A S E D E C I S I O N S S P O I L E R A L E R T ! P O S T G R E S Q L H A S M A D E O U R L I V E S B E T T E R . S Q L S E R V E R F O R W E B S I T E D ATA R E S E A R C H D ATA - E X C E P T N Y S E TA Q - I N O R A C L E P E R M I S S I O N S … A L L R E S E A R C H D ATA WA S S T O R E D I N S A S . ( S A S 7 B D AT ) M Y S Q L F O R W E B S I T E D ATA
  6. T H E N C A M E P O

    S T G R E S ! A P P R E H E N S I O N A B O U T “ Y E T A N O T H E R D ATA B A S E … ” D J A N G O A N D WA G TA I L O V E R C F A N D W O R D P R E S S I N T E G R AT I O N W I T H L D A P A N D P E R M I S S I O N S / A C L S U S E O F C I T U S D ATA E X T E N S I O N F O R N Y S E T R A D E S & Q U O T E S A L L R E S E A R C H D ATA S U C C E S S F U L LY L O A D E D I N P O S T G R E S P O S T G R E S B A C K I N G D J A N G O F O R T H E W E B S I T E
  7. F O U N D AT I O N C

    O M P L E T E . B U I L D I N G T H E R E S T F U L A P I . 1 9 9 2 : W R D S WA S C R E AT E D T O D U M P D ATA T O L O T U S 1 - 2 - 3 . 2 0 1 8 - 2 0 1 9 : W R D S C R E AT E S A R E S T F U L A P I . C O D E G E N E R AT E D B Y I N T R O S P E C T I N G P O S T G R E S Q L . W E B B R O W S A B L E E N D P O I N T S P E R M I T T E D B Y U S E R . F I LT E R S C R E AT E D F O R T H E F I R S T C O L U M N I N I N D I C E S . D J A N G O R E S T F R A M E W O R K : A N A M A Z I N G O P E N S O U R C E P R O J E C T F O R B U I L D I N G R E S T F U L A P I S I N D J A N G O M O D E L S , S E R I A L I Z E R S , V I E W S , F I LT E R S , A N D P E R M I S S I O N S .
  8. H O W W E B U I LT I

    T: F I E L D M A P P I N G U S I N G D J A N G O R E S T F R A M E W O R K W I T H 6 0 , 0 0 0 E N D P O I N T S P R O V E D A C H A L L E N G E . W E S TA R T E D B Y I N T R O S P E C T I O N O F T H E D ATA B A S E . COLUMN_FIELD_MAP = { 'smallint': 'IntegerField({}blank=True, null=True{})', 'integer': 'IntegerField({}blank=True, null=True{})', 'bigint': 'BigIntegerField({}blank=True, null=True{})', 'numeric': 'DecimalField({}blank=True, null=True{})', 'double precision': 'FloatField({}blank=True, null=True{})', 'date': 'DateField({}blank=True, null=True{})', 'timestamp without time zone': 'DateTimeField({}blank=True, null=True{})', 'time without time zone': 'TimeField({}blank=True, null=True{})', 'character varying': 'TextField({}blank=True, null=True{})’, [...] }
  9. R E S E R V E D W O

    R D S : I N P Y T H O N A N D D R F ! W O R D S L I K E Y I E L D A N D R E T U R N A R E U S E D F R E Q U E N T LY I N F I N A N C E . I T T U R N S O U T T H E Y A R E P O P U L A R I N P Y T H O N T O O ! W E M A D E A L I S T O F A L L R E S E R V E D W O R D S . T H E C O L U M N S H AV E _ VA R A P P E N D E D I F T H E Y M AT C H . RESERVED_WORDS = [ 'False', 'None', 'True', ... 'return', 'try', 'while', 'with', 'yield', ]
  10. E N D R E S U LT: M O

    D E L S W E U S E D A F E W H A C K S , B U T E N D E D U P W I T H 6 0 , 0 0 0 D ATA M O D E L S . A N D I T W O R K E D ! N O T E T H E H A C K S B E L O W T O H A N D L E M U LT I P L E S C H E M ATA . class CrspFundSummaryModel(models.Model): summary_period2 = models.TextField(primary_key=True) [...] yield_var = models.FloatField(blank=True, null=True, db_column='yield') div_ytd = models.FloatField(blank=True, null=True) cap_gains_ytd = models.FloatField(blank=True, null=True) [...] class Meta: managed = False db_table = 'crsp\".\"fund_summary'
  11. B I G D ATA P R O B L

    E M S W I T H T H E A M O U N T O F D ATA A N D E N D P O I N T S , W E H A D S E V E R A L I N T E R E S T I N G P R O B L E M S T O S O LV E . T H E C O U N T ( * ) P R O B L E M F I LT E R S : H O W D O W E A U T O M AT I C A L LY C R E AT E T H E M ? F I N A N C E F O L K S ❤ ❤ ❤ S P R E A D S H E E T S 6 0 , 0 0 0 TA B L E S , 6 0 , 0 0 0 M O D E L S , 6 0 , 0 0 0 E N D P O I N T S P E R M I S S I O N S : 6 3 , 0 0 0 + U S E R S A C R O S S 5 0 0 S U B S C R I B E R S .
  12. 6 0 , 0 0 0 M O D E

    L S : S L O W ! # urls.py router.register(r"crsp.dsf", WhartonResearchDataServicesDataViewSet, base_name="pgdata.data.crsp.dsf") # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) def __init__(self, **kwargs): super().__init__(**kwargs) self.db_name, python_path_name, schema_name, table_name = split_basename( self.basename, ) api_model = getattr( import_module(f"{python_path_name}.models.{schema_name}"), f"{schema_name}_{table_name}_model", ) api_serializer = getattr( import_module(f"{python_path_name}.serializers. {schema_name}"), f"{schema_name}_{table_name}_serializer", )
  13. P E R M I S S I O N

    S F O R 6 3 , 0 0 0 + def check_permission(db_name, username, schema_name, table_name): """ This function checks the PostgreSQL database to see if the given user has permission to the schema / product. “"" permitted = False connection = connections[db_name] cursor = connection.cursor() try: # This will return True or False # It will kick a ProgrammingError if the username, schema, # or table do not exist, so we wrap in a try block. cursor.execute( """ SELECT pg_catalog.has_table_privilege(%s, %s, 'SELECT'); """, [username, f"{schema_name}.{table_name}"] ) row = cursor.fetchone() permitted = row[0] except ProgrammingError: # This is to catch when a passed username, schema, or table do # not exist in PostgreSQL. permitted = False return permitted
  14. S P E E D I N G U P

    S L O W C O U N T ( * ) # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) table_estimate_count = estimate_count( f'SELECT * FROM {schema_lower}.{table_lower}' ) if table_estimate_count > 1000000: self.pagination_class = CountEstimatePagination # pagination.py def parse_explain(explain_string): return int(re.search("rows=([0-9]+) ", explain_string).group(1)) class CountEstimatePagination(LimitOffsetPagination): def paginate_queryset(self, queryset, request, view=None): self.count = parse_explain(queryset.explain()) self.limit = self.get_limit(request) if self.limit is None: return None self.offset = self.get_offset(request) self.request = request if self.count > self.limit and self.template is not None: self.display_page_controls = True if self.count == 0 or self.offset > self.count: return [] return list(queryset[self.offset: self.offset + self.limit])
  15. F I LT E R S O N I N

    D E X E D C O L U M N S # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) index_sql = """ SELECT DISTINCT a.attname AS index_column FROM pg_namespace n JOIN pg_class c ON n.oid = c.relnamespace JOIN pg_index i ON c.oid = i.indrelid JOIN pg_attribute a ON a.attnum = i.indkey[0] AND a.attrelid = c.oid WHERE n.nspname = %(table_schema)s AND c.relname = %(table_name)s """ for field in api_model._meta.get_fields(): if field.name in index_columns: field_type = field.get_internal_type() if field_type in ('CharField', 'TextField'): # Add column to searchable fields, with 'starts with' search ('^') self.search_fields.append(f'^{field.name}') # Add column to filterable fields with all search options self.filter_fields[field.name] = [ 'exact', 'contains', 'startswith', 'endswith', ] elif field_type in ('IntegerField', 'BigIntegerField', ‘DecimalField', ‘FloatField’, 'DateField', 'DateTimeField', 'TimeField'): # Add column to filterable fields with all search options self.filter_fields[field.name] = [ 'exact', 'lt', 'lte', 'gt', 'gte', ]
  16. S P R E A D S H E E

    T S U S E R S C A N N O W S L I C E A N D D I C E T H E I R D ATA … … B U T F I N A N C E P E O P L E L O V E T H E I R S P R E A D S H E E T S ! W E C R E AT E D D R F - R E N D E R E R - X L S X : P I P I N S TA L L A B L E ! D R F S H I P S W I T H S E V E R A L R E N D E R E R S : T H E B R O W S A B L E A P I , J S O N , D J A N G O H T M L T E M P L AT E S , T H E A D M I N …
  17. T H E “ T O - D O ”

    L I S T T H E W R D S A P I WA S B U I LT F O R A F O R T U N E 5 0 C L I E N T W H O N E E D E D I T Y E S T E R D AY. T H E R E ’ S S T I L L S O M E T H I N G S T O D O . A B S T R A C T H A R D C O D E D VA L U E S T O S E T T I N G S / M E T H O D S . C R E AT E P I P PA C K A G E S F O R F E AT U R E S B E Y O N D X L S X E X P O R T A N D R E L E A S E T O G I T H U B / P Y P I . C L E A N U P C U R S O R . E X E C U T E ( ) M E T H O D S ; PA R A M E T E R I Z E . S W I T C H T O D J A N G O T E M P L AT E S F O R C O D E G E N E R AT I O N .
  18. T H E D O N E L I S

    T ! T H E W R D S A P I WA S B U I LT F O R A F O R T U N E 5 0 C L I E N T W H O N E E D E D I T Y E S T E R D AY. T H E R E ’ S S T I L L S O M E T H I N G S T O D O . A B S T R A C T H A R D C O D E D VA L U E S T O S E T T I N G S / M E T H O D S . C R E AT E P I P PA C K A G E S F O R F E AT U R E S B E Y O N D X L S X E X P O R T A N D R E L E A S E T O G I T H U B / P Y P I . C L E A N U P C U R S O R . E X E C U T E ( ) M E T H O D S ; PA R A M E T E R I Z E . S W I T C H T O D J A N G O T E M P L AT E S F O R C O D E G E N E R AT I O N .
  19. L E T ’ S T RY T O D

    O A L I V E D E M O ! O K AY E V E RY O N E … P R AY T O T H E W I F I G O D S … T H A N K Y O U , P Y G O T H A M ! @ F L I P P E R PA - T W I T T E R / G I T H U B / E T C T I M @ P Y P H I L LY. O R G Ask me about working at Wharton… or anything you like!