Slide 1

Slide 1 text

AW E S O M E A U T O M AT E D A P I S W I T H A U T O M A G I C - R E S T … G O O D D E C I S I O N S , B A D D E C I S I O N S , A N D P U S H I N G T H E L I M I T S O F D J A N G O T H A N K Y O U T O : @ T H E P R A C T I C A L D E V F O R T H E FA K E C O V E R S , B A RT E K F O R T H E P H O T O , A N D T H E W R D S T E A M F O R M A K I N G T H I S A L L P O S S I B L E !

Slide 2

Slide 2 text

H O W D Y ! I ’ M T I M . M Y P R O N O U N S : H E / H I M I T D I R E C T O R , A D VA N C E D I N I T I AT I V E S F O R W H A R T O N R E S E A R C H D ATA S E R V I C E S ( W R D S ) AT T H E W H A R T O N S C H O O L @ F L I P P E R PA - T W I T T E R , G I T H U B , FA C E B O O K , E V E RY W H E R E ! P H I L LY P U G A N D C O M M U N I T Y T E C H E V E N T O R G A N I Z E R P Y T H O N I S TA A N D D J A N G O N A U T ( P S F & D S F M E M B E R ) F U N L O V I N G G E E K , H O C K E Y FA N ( G O F LY E R S ! ) , G U I TA R I S T, O W N E D B Y A C AT I R E A L LY L O V E I C E C R E A M !

Slide 3

Slide 3 text

T H A N K Y O U !

Slide 4

Slide 4 text

W R D S : A B R I E F H I S T O RY A S I N G L E S O U R C E F O R L E A D I N G G L O B A L R E S E A R C H D ATA B A S E S F I R S T A C A D E M I C C L I E N T: S TA N F O R D U N I V E R S I T Y I N 1 9 9 7 F I R S T G O V E R N M E N T C L I E N T: F R B N Y I N 2 0 0 3 F I R S T C O R P O R AT E C L I E N T: C O M PA S S - L E X E C O N I N 2 0 1 1 F O U N D E D I N 1 9 9 2 F O R W H A R T O N FA C U LT Y ) I N I T I A L LY P R O V I D E D F O R S A S , W I T H S & P A N D C R S P D ATA

Slide 5

Slide 5 text

W R D S : T O D AY A S I N G L E S O U R C E F O R L E A D I N G G L O B A L R E S E A R C H D ATA B A S E S S S H S U P P O R T O N O U R C L O U D : S A S , R , P Y T H O N , & M O R E 4 0 0 + T B O F D ATA ; 3 + P B T O TA L S T O R A G E F O O T P R I N T C O N T I N U I N G E X PA N S I O N B E Y O N D F I N A N C I A L D ATA A L M O S T 5 0 0 I N S T I T U T I O N S ; ~ 5 0 % O F G R O W T H H A S B E E N I N T E R N AT I O N A L O V E R T H E PA S T S I X Y E A R S 6 3 , 0 0 0 + A C T I V E U S E R S V I A W E B , S S H , P O S T G R E S Q L , R E M O T E C O N N E C T S

Slide 6

Slide 6 text

D U B I O U S D ATA B A S E D E C I S I O N S S P O I L E R A L E R T ! P O S T G R E S Q L H A S M A D E O U R L I V E S B E T T E R . S Q L S E R V E R F O R W E B S I T E D ATA R E S E A R C H D ATA - E X C E P T N Y S E TA Q - I N O R A C L E P E R M I S S I O N S … A L L R E S E A R C H D ATA WA S S T O R E D I N S A S . ( S A S 7 B D AT ) M Y S Q L F O R W E B S I T E D ATA

Slide 7

Slide 7 text

T H E N C A M E P O S T G R E S ! A P P R E H E N S I O N A B O U T “ Y E T A N O T H E R D ATA B A S E … ” D J A N G O A N D WA G TA I L O V E R C F A N D W O R D P R E S S I N T E G R AT I O N W I T H L D A P A N D P E R M I S S I O N S / A C L S U S E O F C I T U S D ATA E X T E N S I O N F O R N Y S E T R A D E S & Q U O T E S A L L R E S E A R C H D ATA S U C C E S S F U L LY L O A D E D I N P O S T G R E S P O S T G R E S B A C K I N G D J A N G O F O R T H E W E B S I T E

Slide 8

Slide 8 text

F O U N D AT I O N C O M P L E T E . B U I L D I N G T H E R E S T F U L A P I . 1 9 9 2 : W R D S WA S C R E AT E D T O D U M P D ATA T O L O T U S 1 - 2 - 3 . 2 0 1 8 - 2 0 1 9 : W R D S C R E AT E S A R E S T F U L A P I . C O D E G E N E R AT E D B Y I N T R O S P E C T I N G P O S T G R E S Q L . W E B B R O W S A B L E E N D P O I N T S P E R M I T T E D B Y U S E R . F I LT E R S C R E AT E D F O R T H E F I R S T C O L U M N I N I N D I C E S . D J A N G O R E S T F R A M E W O R K : A N A M A Z I N G O P E N S O U R C E P R O J E C T F O R B U I L D I N G R E S T F U L A P I S I N D J A N G O M O D E L S , S E R I A L I Z E R S , V I E W S , F I LT E R S , A N D P E R M I S S I O N S .

Slide 9

Slide 9 text

H O W W E B U I LT I T: F I E L D M A P P I N G U S I N G D J A N G O R E S T F R A M E W O R K W I T H 6 0 , 0 0 0 E N D P O I N T S P R O V E D A C H A L L E N G E . W E S TA R T E D B Y I N T R O S P E C T I O N O F T H E D ATA B A S E . COLUMN_FIELD_MAP = { 'smallint': 'IntegerField({}blank=True, null=True{})', 'integer': 'IntegerField({}blank=True, null=True{})', 'bigint': 'BigIntegerField({}blank=True, null=True{})', 'numeric': 'DecimalField({}blank=True, null=True{})', 'double precision': 'FloatField({}blank=True, null=True{})', 'date': 'DateField({}blank=True, null=True{})', 'timestamp without time zone': 'DateTimeField({}blank=True, null=True{})', 'time without time zone': 'TimeField({}blank=True, null=True{})', 'character varying': 'TextField({}blank=True, null=True{})’, [...] }

Slide 10

Slide 10 text

R E S E R V E D W O R D S : I N P Y T H O N A N D D R F ! W O R D S L I K E Y I E L D A N D R E T U R N A R E U S E D F R E Q U E N T LY I N F I N A N C E . I T T U R N S O U T T H E Y A R E P O P U L A R I N P Y T H O N T O O ! W E M A D E A L I S T O F A L L R E S E R V E D W O R D S . T H E C O L U M N S H AV E _ VA R A P P E N D E D I F T H E Y M AT C H . RESERVED_WORDS = [ 'False', 'None', 'True', ... 'return', 'try', 'while', 'with', 'yield', ]

Slide 11

Slide 11 text

E N D R E S U LT: M O D E L S W E U S E D A F E W H A C K S , B U T E N D E D U P W I T H 6 0 , 0 0 0 D ATA M O D E L S . A N D I T W O R K E D ! N O T E T H E H A C K S B E L O W T O H A N D L E M U LT I P L E S C H E M ATA . class CrspFundSummaryModel(models.Model): summary_period2 = models.TextField(primary_key=True) [...] yield_var = models.FloatField(blank=True, null=True, db_column='yield') div_ytd = models.FloatField(blank=True, null=True) cap_gains_ytd = models.FloatField(blank=True, null=True) [...] class Meta: managed = False db_table = 'crsp\".\"fund_summary'

Slide 12

Slide 12 text

B I G D ATA P R O B L E M S W I T H T H E A M O U N T O F D ATA A N D E N D P O I N T S , W E H A D S E V E R A L I N T E R E S T I N G P R O B L E M S T O S O LV E . T H E C O U N T ( * ) P R O B L E M F I LT E R S : H O W D O W E A U T O M AT I C A L LY C R E AT E T H E M ? F I N A N C E F O L K S ❤ ❤ ❤ S P R E A D S H E E T S 6 0 , 0 0 0 TA B L E S , 6 0 , 0 0 0 M O D E L S , 6 0 , 0 0 0 E N D P O I N T S P E R M I S S I O N S : 6 3 , 0 0 0 + U S E R S A C R O S S 5 0 0 S U B S C R I B E R S .

Slide 13

Slide 13 text

6 0 , 0 0 0 M O D E L S : S L O W ! # urls.py router.register(r"crsp.dsf", WhartonResearchDataServicesDataViewSet, base_name="pgdata.data.crsp.dsf") # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) def __init__(self, **kwargs): super().__init__(**kwargs) self.db_name, python_path_name, schema_name, table_name = split_basename( self.basename, ) api_model = getattr( import_module(f"{python_path_name}.models.{schema_name}"), f"{schema_name}_{table_name}_model", ) api_serializer = getattr( import_module(f"{python_path_name}.serializers. {schema_name}"), f"{schema_name}_{table_name}_serializer", )

Slide 14

Slide 14 text

P E R M I S S I O N S F O R 6 3 , 0 0 0 + def check_permission(db_name, username, schema_name, table_name): """ This function checks the PostgreSQL database to see if the given user has permission to the schema / product. “"" permitted = False connection = connections[db_name] cursor = connection.cursor() try: # This will return True or False # It will kick a ProgrammingError if the username, schema, # or table do not exist, so we wrap in a try block. cursor.execute( """ SELECT pg_catalog.has_table_privilege(%s, %s, 'SELECT'); """, [username, f"{schema_name}.{table_name}"] ) row = cursor.fetchone() permitted = row[0] except ProgrammingError: # This is to catch when a passed username, schema, or table do # not exist in PostgreSQL. permitted = False return permitted

Slide 15

Slide 15 text

S P E E D I N G U P S L O W C O U N T ( * ) # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) table_estimate_count = estimate_count( f'SELECT * FROM {schema_lower}.{table_lower}' ) if table_estimate_count > 1000000: self.pagination_class = CountEstimatePagination # pagination.py def parse_explain(explain_string): return int(re.search("rows=([0-9]+) ", explain_string).group(1)) class CountEstimatePagination(LimitOffsetPagination): def paginate_queryset(self, queryset, request, view=None): self.count = parse_explain(queryset.explain()) self.limit = self.get_limit(request) if self.limit is None: return None self.offset = self.get_offset(request) self.request = request if self.count > self.limit and self.template is not None: self.display_page_controls = True if self.count == 0 or self.offset > self.count: return [] return list(queryset[self.offset: self.offset + self.limit])

Slide 16

Slide 16 text

F I LT E R S O N I N D E X E D C O L U M N S # views.py / class GenericViewSet(XLSXFileMixin, ReadOnlyModelViewSet) index_sql = """ SELECT DISTINCT a.attname AS index_column FROM pg_namespace n JOIN pg_class c ON n.oid = c.relnamespace JOIN pg_index i ON c.oid = i.indrelid JOIN pg_attribute a ON a.attnum = i.indkey[0] AND a.attrelid = c.oid WHERE n.nspname = %(table_schema)s AND c.relname = %(table_name)s """ for field in api_model._meta.get_fields(): if field.name in index_columns: field_type = field.get_internal_type() if field_type in ('CharField', 'TextField'): # Add column to searchable fields, with 'starts with' search ('^') self.search_fields.append(f'^{field.name}') # Add column to filterable fields with all search options self.filter_fields[field.name] = [ 'exact', 'contains', 'startswith', 'endswith', ] elif field_type in ('IntegerField', 'BigIntegerField', ‘DecimalField', ‘FloatField’, 'DateField', 'DateTimeField', 'TimeField'): # Add column to filterable fields with all search options self.filter_fields[field.name] = [ 'exact', 'lt', 'lte', 'gt', 'gte', ]

Slide 17

Slide 17 text

S P R E A D S H E E T S U S E R S C A N N O W S L I C E A N D D I C E T H E I R D ATA … … B U T F I N A N C E P E O P L E L O V E T H E I R S P R E A D S H E E T S ! W E C R E AT E D D R F - R E N D E R E R - X L S X : P I P I N S TA L L A B L E ! D R F S H I P S W I T H S E V E R A L R E N D E R E R S : T H E B R O W S A B L E A P I , J S O N , D J A N G O H T M L T E M P L AT E S , T H E A D M I N …

Slide 18

Slide 18 text

T H E “ T O - D O ” L I S T T H E W R D S A P I WA S B U I LT F O R A F O R T U N E 5 0 C L I E N T W H O N E E D E D I T Y E S T E R D AY. T H E R E ’ S S T I L L S O M E T H I N G S T O D O . A B S T R A C T H A R D C O D E D VA L U E S T O S E T T I N G S / M E T H O D S . C R E AT E P I P PA C K A G E S F O R F E AT U R E S B E Y O N D X L S X E X P O R T A N D R E L E A S E T O G I T H U B / P Y P I . C L E A N U P C U R S O R . E X E C U T E ( ) M E T H O D S ; PA R A M E T E R I Z E . S W I T C H T O D J A N G O T E M P L AT E S F O R C O D E G E N E R AT I O N .

Slide 19

Slide 19 text

T H E D O N E L I S T ! T H E W R D S A P I WA S B U I LT F O R A F O R T U N E 5 0 C L I E N T W H O N E E D E D I T Y E S T E R D AY. T H E R E ’ S S T I L L S O M E T H I N G S T O D O . A B S T R A C T H A R D C O D E D VA L U E S T O S E T T I N G S / M E T H O D S . C R E AT E P I P PA C K A G E S F O R F E AT U R E S B E Y O N D X L S X E X P O R T A N D R E L E A S E T O G I T H U B / P Y P I . C L E A N U P C U R S O R . E X E C U T E ( ) M E T H O D S ; PA R A M E T E R I Z E . S W I T C H T O D J A N G O T E M P L AT E S F O R C O D E G E N E R AT I O N .

Slide 20

Slide 20 text

L E T ’ S T RY T O D O A L I V E D E M O ! O K AY E V E RY O N E … P R AY T O T H E W I F I G O D S … T H A N K Y O U , P Y G O T H A M ! @ F L I P P E R PA - T W I T T E R / G I T H U B / E T C T I M @ P Y P H I L LY. O R G Ask me about working at Wharton… or anything you like!