$30 off During Our Annual Pro Sale. View Details »

Absolutely Awesome Automated APIs With Django REST Framework: PyCaribbean

Tim A.
February 17, 2019

Absolutely Awesome Automated APIs With Django REST Framework: PyCaribbean

Databases are the elephants in the room of modern technology: mature, reliable, sturdy fixtures in the technology world that have served us well for many decades. Databases stay relevant by continuing to reinvent themselves to serve new technologies further up the stack. The latest buzzwords further up that stack are APIs and microservices, so prevalent that it is hard to see a tech advertisement that doesn't mention them. While related to "the cloud" and "big data", whatever the heck those terms actually mean, APIs and microservices have a slightly less annoying marketing schtick and more concrete relations to relational databases. But what do these relations look like in practice? In this talk, Timothy Allen of Wharton Research Data Services will show how his team at The Wharton School has evolved from providing financial data exclusively in SAS data formats to a robust backend powered by PostgreSQL, which allows financial research to happen in many ecosystems: still available in SAS, but also R, Python, Perl, Matlab, Julia, and more. He will present a case study of using Django REST Framework to build an "API through introspection." This case study will show how WRDS built a Django web site and RESTful web service by introspecting financial data stored in a PostgreSQL database cluster. The models for the ORM, serializers for the RESTful API, views for presenting the data to a user, filters for refining queries, URL routing, web browsable interface, user token authorization, and permissions, are all handled by introspecting various features of the PostgreSQL database information schema, all with Python. Many of these components have been open-sourced, such as the ability to export an endpoint as a spreadsheet file (XLSX), and we are hoping to open-source more of it!

Tim A.

February 17, 2019
Tweet

More Decks by Tim A.

Other Decks in Technology

Transcript

  1. A B S O L U T E LY AW

    E S O M E A U T O M AT E D A P I S ! … G O O D D E C I S I O N S , B A D D E C I S I O N S , A N D P U S H I N G T H E L I M I T S O F D J A N G O T H A N K Y O U T O : @ T H E P R A C T I C A L D E V F O R T H E FA K E C O V E R S , A N D T H E W R D S P D I & D ATA T E A M S F O R M A K I N G T H I S A L L P O S S I B L E !
  2. ¡ H O L A ! M I N O

    M B R E E S T I M . I T D I R E C T O R , A D VA N C E D I N I T I AT I V E S F O R W H A R T O N R E S E A R C H D ATA S E R V I C E S ( W R D S ) AT T H E W H A R T O N S C H O O L @ F L I P P E R PA - T W I T T E R , G I T H U B , FA C E B O O K , E V E RY W H E R E ! P H I L LY P U G A N D D J A N G O C O N U S O R G A N I Z E R ; P Y T H O N I S TA A N D D J A N G O N A U T ( P S F & D S F M E M B E R ) ; F U N L O V I N G G E E K , H O C K E Y FA N , G U I TA R I S T, O W N E D B Y A C AT I R E A L LY, R E A L LY L O V E I C E C R E A M !
  3. W R D S : A B R I E

    F H I S T O RY A S I N G L E S O U R C E F O R L E A D I N G G L O B A L R E S E A R C H D ATA B A S E S F I R S T A C A D E M I C C L I E N T: S TA N F O R D U N I V E R S I T Y I N 1 9 9 7 F I R S T G O V E R N M E N T C L I E N T: F R B N Y I N 2 0 0 3 F I R S T C O R P O R AT E C L I E N T: C O M PA S S - L E X E C O N I N 2 0 1 1 F O U N D E D I N 1 9 9 2 F O R W H A R T O N FA C U LT Y I N I T I A L LY P R O V I D E D S A S , W I T H S & P A N D C R S P D ATA
  4. W R D S : T O D AY A

    S I N G L E S O U R C E F O R L E A D I N G G L O B A L R E S E A R C H D ATA B A S E S S S H S U P P O R T O N O U R C L O U D : S A S , R , P Y T H O N , & M O R E 4 0 0 + T B O F D ATA ; 3 + P B T O TA L R A W S T O R A G E C O N T I N U I N G E X PA N S I O N B E Y O N D F I N A N C I A L D ATA A L M O S T 5 0 0 I N S T I T U T I O N S ; ~ 5 0 % O F G R O W T H H A S B E E N I N T E R N AT I O N A L O V E R T H E PA S T F I V E Y E A R S 5 0 , 0 0 0 + A C T I V E U S E R S V I A W E B , S S H , P O S T G R E S Q L , R E M O T E C O N N E C T S
  5. D U B I O U S D ATA B

    A S E D E C I S I O N S S P O I L E R A L E R T ! P O S T G R E S Q L H A S M A D E O U R L I V E S B E T T E R . S Q L S E R V E R F O R W E B S I T E D ATA R E S E A R C H D ATA - E X C E P T N Y S E TA Q - I N O R A C L E P E R M I S S I O N S … A L L R E S E A R C H D ATA WA S S T O R E D I N S A S . ( S A S 7 B D AT ) M Y S Q L F O R W E B S I T E D ATA
  6. T H E N C A M E P O

    S T G R E S ! A P P R E H E N S I O N A B O U T “ Y E T A N O T H E R D ATA B A S E … ” D J A N G O A N D WA G TA I L O V E R C F A N D W O R D P R E S S I N T E G R AT I O N W I T H L D A P A N D P E R M I S S I O N S / A C L S U S E O F C I T U S D ATA E X T E N S I O N F O R N Y S E T R A D E S & Q U O T E S A L L R E S E A R C H D ATA S U C C E S S F U L LY L O A D E D I N P O S T G R E S P O S T G R E S B A C K I N G D J A N G O F O R T H E W E B S I T E
  7. F O U N D AT I O N C

    O M P L E T E . B U I L D I N G T H E R E S T F U L A P I . 1 9 9 2 : W R D S WA S C R E AT E D T O D U M P D ATA T O L O T U S 1 - 2 - 3 . 2 0 1 7 - 2 0 1 9 : W R D S C R E AT E S A R E S T F U L A P I . C O D E G E N E R AT E D B Y I N T R O S P E C T I N G P O S T G R E S Q L . W E B B R O W S A B L E E N D P O I N T S P E R M I T T E D B Y U S E R . F I LT E R S C R E AT E D F O R T H E F I R S T C O L U M N I N I N D I C E S . D J A N G O R E S T F R A M E W O R K : A N A M A Z I N G O P E N S O U R C E P R O J E C T F O R B U I L D I N G R E S T F U L A P I S I N D J A N G O M O D E L S , S E R I A L I Z E R S , F I LT E R S , A N D P E R M I S S I O N S .
  8. H O W W E B U I LT I

    T U S I N G D J A N G O R E S T F R A M E W O R K W I T H 6 0 , 0 0 0 E N D P O I N T S P R O V E D A C H A L L E N G E . W E S TA R T E D B Y I N T R O S P E C T I O N O F T H E D ATA B A S E . COLUMN_FIELD_MAP = { 'smallint': 'IntegerField({}blank=True, null=True{})', 'integer': 'IntegerField({}blank=True, null=True{})', 'bigint': 'BigIntegerField({}blank=True, null=True{})', 'numeric': 'DecimalField({}blank=True, null=True{})', 'double precision': 'FloatField({}blank=True, null=True{})', 'date': 'DateField({}blank=True, null=True{})', 'timestamp without time zone': 'DateTimeField({}blank=True, null=True{})', 'time without time zone': 'TimeField({}blank=True, null=True{})', 'character varying': 'TextField({}blank=True, null=True{})', }
  9. R E S E R V E D W O

    R D S W O R D S L I K E Y I E L D A N D R E T U R N A R E U S E D F R E Q U E N T LY I N F I N A N C E . I T T U R N S O U T T H E Y A R E P O P U L A R I N P Y T H O N T O O ! W E M A D E A L I S T O F A L L R E S E R V E D W O R D S . T H E C O L U M N S H AV E _ VA R A P P E N D E D I F T H E Y M AT C H . RESERVED_WORDS = [ 'False', 'None', 'True', ... 'return', 'try', 'while', 'with', 'yield', ]
  10. E N D R E S U LT: M O

    D E L S W E U S E D A F E W H A C K S , B U T E N D E D U P W I T H 6 0 , 0 0 0 D ATA M O D E L S . A N D I T W O R K E D ! N O T E T H E H A C K S B E L O W T O H A N D L E M U LT I P L E S C H E M A S . class CrspFundSummaryModel(models.Model): summary_period2 = models.TextField(primary_key=True) [...] yield_var = models.FloatField(blank=True, null=True, db_column='yield') div_ytd = models.FloatField(blank=True, null=True) cap_gains_ytd = models.FloatField(blank=True, null=True) [...] class Meta: managed = False db_table = 'crsp\".\"fund_summary'
  11. B I G D ATA P R O B L

    E M S W I T H T H E A M O U N T O F D ATA A N D E N D P O I N T S , W E H A D S E V E R A L I N T E R E S T I N G P R O B L E M S T O S O LV E . T H E C O U N T ( * ) P R O B L E M F I LT E R S : H O W D O W E A U T O M AT I C A L LY C R E AT E T H E M ? F I N A N C E F O L K S ❤ ❤ ❤ S P R E A D S H E E T S 6 0 , 0 0 0 TA B L E S , 6 0 , 0 0 0 M O D E L S , 6 0 , 0 0 0 E N D P O I N T S P E R M I S S I O N S : 5 0 , 0 0 0 + U S E R S A C R O S S 5 0 0 S U B S C R I B E R S
  12. 6 0 , 0 0 0 M O D E

    L S : S L O W ! # urls.py router.register(r'crsp_a_stock62.msf62', GenericViewSet, base_name='crsp_a_stock62-msf62') # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) def __init__(self, **kwargs): super().__init__(**kwargs) parts = self.basename.split('-') schema_lower = parts[0].lower() table_lower = parts[1].lower() schema_camel = camelize(parts[0]) table_camel = camelize(parts[1]) api_model = getattr(import_module( f'{self.app_prefix}.models.{schema_lower}'), f'{schema_camel}{table_camel}Model',) api_serializer = getattr(import_module( f'{self.app_prefix}.serializers.{schema_lower}'), f'{schema_camel}{table_camel}Serializer') api_permission = getattr(import_module( f'{self.app_prefix}.permissions.{schema_lower}'), f'{schema_camel}Permission')
  13. P E R M I S S I O N

    S F O R 5 0 , 0 0 0 def check_permission(username, schema_name): permitted = False connection = connections['pgdata'] cursor = connection.cursor() try: cursor.execute( """ WITH RECURSIVE "names" ("name") AS ( SELECT n.nspname AS "name" FROM pg_catalog.pg_namespace n WHERE n.nspname !~ '^pg_' AND n.nspname <> 'information_schema' ) SELECT "name" FROM "names" WHERE pg_catalog.has_schema_privilege('{username}', "name", 'USAGE') = TRUE AND name = '{schema_name}'; """.format( username=username, schema_name=schema_name, ) ) for row in cursor.fetchall(): permitted = True except ProgrammingError: # This is to catch when a role doesn't exist in PostgreSQL for a user. permitted = False return permitted
  14. S P E E D I N G U P

    S L O W C O U N T ( * ) # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) table_estimate_count = estimate_count( f'SELECT * FROM {schema_lower}.{table_lower}' ) if table_estimate_count > 1000000: self.pagination_class = CountEstimatePagination # pagination.py def parse_explain(explain_string): return int(re.search("rows=([0-9]+) ", explain_string).group(1)) class CountEstimatePagination(LimitOffsetPagination): def paginate_queryset(self, queryset, request, view=None): self.count = parse_explain(queryset.explain()) self.limit = self.get_limit(request) if self.limit is None: return None self.offset = self.get_offset(request) self.request = request if self.count > self.limit and self.template is not None: self.display_page_controls = True if self.count == 0 or self.offset > self.count: return [] return list(queryset[self.offset: self.offset + self.limit])
  15. F I LT E R S O N I N

    D E X E D C O L U M N S # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) index_sql = """ SELECT DISTINCT a.attname AS index_column FROM pg_namespace n JOIN pg_class c ON n.oid = c.relnamespace JOIN pg_index i ON c.oid = i.indrelid JOIN pg_attribute a ON a.attnum = i.indkey[0] AND a.attrelid = c.oid WHERE n.nspname = %(table_schema)s AND c.relname = %(table_name)s """ for field in api_model._meta.get_fields(): if field.name in index_columns: field_type = field.get_internal_type() if field_type in ('CharField', 'TextField'): # Add column to searchable fields, with 'starts with' search ('^') self.search_fields.append(f'^{field.name}') # Add column to filterable fields with all search options self.filter_fields[field.name] = [ 'exact', 'contains', 'startswith', 'endswith', ] elif field_type in ('IntegerField', 'BigIntegerField', ‘DecimalField', ‘FloatField’, 'DateField', 'DateTimeField', 'TimeField'): # Add column to filterable fields with all search options self.filter_fields[field.name] = [ 'exact', 'lt', 'lte', 'gt', 'gte', ]
  16. S P R E A D S H E E

    T S U S E R S C A N N O W S L I C E A N D D I C E T H E R E D ATA … … B U T F I N A N C E P E O P L E L O V E T H E I R S P R E A D S H E E T S ! T H E J S O N R E N D E R E R O U T P U T S J S O N W E C R E AT E D D R F - R E N D E R E R - X L S X : P I P I N S TA L L A B L E ! O U T P U T S I N X L S X F O R M AT U S I N G O P E N P Y X L D R F S H I P S W I T H S E V E R A L R E N D E R E R S T H E B R O W S E A B L E A P I R E N D E R E R O U T P U T S I N H T M L
  17. T H E “ T O - D O ”

    L I S T T H E W R D S A P I WA S B U I LT F O R A F O R T U N E 5 0 C L I E N T W H O N E E D E D I T Y E S T E R D AY. T H E R E ’ S S T I L L S O M E T H I N G S T O D O . A B S T R A C T A WAY H A R D C O D E D VA L U E S T O S E T T I N G S . C R E AT E P I P PA C K A G E S F O R F E AT U R E S B E Y O N D X L S X E X P O R T A N D R E L E A S E T O G I T H U B / P Y P I . M A K E P O S T G R E S Q L T H E C A N O N I C A L P E R M I S S I O N S O U R C E . C L E A N U P C U R S O R . E X E C U T E ( ) M E T H O D S ; PA R A M E T E R I Z E . S W I T C H T O D J A N G O T E M P L AT E S F O R C O D E G E N E R AT I O N .
  18. L E T ’ S T RY T O D

    O A L I V E D E M O ! O K AY E V E RY O N E … P R AY T O T H E W I F I G O D S … @ F L I P P E R PA - T W I T T E R / G I T H U B / E T C H T T P S : / / 2 0 1 9 . D J A N G O C O N . U S - J O I N U S ! T I M @ P Y P H I L LY. O R G http://whr.tn/techjobs