Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python & Databases for Students

Python & Databases for Students

Talk as high-level overview of DB as separate from language area, communication with DB & existed packages in Python.

Iuliia Volkova

April 03, 2021
Tweet

More Decks by Iuliia Volkova

Other Decks in Programming

Transcript

  1. Agenda 1. What is DB? 2. Types of DB 3.

    Relationships between Python & DB 4. How DB impact your code 5. DB Interface vs ORM vs Statements Generators 6. Where is Async? 7. Examples of API & code samples 8. About ORM’s 2
  2. What is Data? Data - is a structured information Information:

    I have a party at 7 PM on Friday and my Mom’s birthday on the next day after 4
  3. What is Data? Data - is a structured information Information:

    I have a party at 7 PM on Friday and my Mom’s birthday on the next day after Data: id | event | start_datetime 1 | party | 2019-21-01T19:00:00 2 | mom’s birthday | 2019-22-01T12:00:00 5
  4. What is Database? Database is an organized collection of data

    (non organized - data lakes/file systems) 6
  5. Standard operations with Data - Store (Create) - Obtain (Read)

    - Update (Update) - Remove (Delete) CRUD operations 7
  6. Types of Databases SQL (RDBMS) NoSQL • Relational data •

    Normalized (but not always) • Data stored as tables with columns • Used SQL query language (or modification) 9
  7. Types of Databases SQL (RDBMS) NoSQL • Relational data •

    Normalized (but not always) • Data stored as tables with columns • Used SQL query language (or modification) • Different types of data structure • Key-value storages • Document-oriented • Graph DB • Column databases & etc. 10
  8. Theory about DB • CAP theorem - https://en.wikipedia.org/wiki/CAP_theorem • ACID

    - https://en.wikipedia.org/wiki/ACID • Normalization forms - https://en.wikipedia.org/wiki/Database_normalization • A lot of termins, features: Indexes, partitions, data types, cursors, functions, triggers, etc • Different DB specifications: for example, ways to store data (in-memory, drive, etc) • deep-deep in the rabbit hole 17
  9. Python just one of the dozen similar languages I don’t

    care who are you I send packages directly with TCP/IP !!! 19
  10. DB - server for you code, and it has own

    protocol PostgreSQL uses a message-based protocol for communication between frontends and backends (clients and servers). The protocol is supported over TCP/IP and also over Unix-domain sockets. https://www.postgresql.org/docs/13/protocol.html 20
  11. TCP/IP based protocols in ‘child’ level Protocol describes what bytes

    and that they are used for What content are allowed in messages & understandable by server https://www.postgresql.org/docs/9.0/protoc ol-overview.html 23
  12. Low-level API Features provided by libraries - Establish Connection to

    DB - Maintain pool of Connections (for example, Postgres run 1 command per connection) - SEND message (query) - prepared statements - scrollable cursors - partial iteration on query results - automatic encoding and decoding of composite types, arrays, and any combination of those - straightforward support for custom data types 25
  13. How DB can impact your code? - Can you use

    async code or only sync? 27
  14. import asyncio import asyncpg async def run(): conn = await

    asyncpg.connect(user='user', password='password', database='database', host='127.0.0.1') values = await conn.fetch( 'SELECT * FROM mytable WHERE id = $1', 10, ) await conn.close() loop = asyncio.get_event_loop() loop.run_until_complete(run()) import psycopg2 def run(): conn = psycopg2.connect("dbname=test user=postgres") cur = conn.cursor() cur.execute("CREATE TABLE test (id serial PRIMARY KEY, num integer, data varchar);") async sync 29
  15. DB Communication Base concepts* *(based on SQL DB, exactly PostgreSQL,

    can be diff for NoSQL) 1. Connection: PostgreSQL is implemented using a simple “process per user” client/server model. I n this model there is one client process connected to exactly one server process. Whenever a request for a connection is detected the postgres process spawns a new server process. https://www.postgresql.org/docs/13/connect-estab.html 34
  16. DB Communication Base concepts* *(based on SQL DB, exactly PostgreSQL,

    can be diff for NoSQL) 1. Connection: PostgreSQL is implemented using a simple “process per user” client/server model. I n this model there is one client process connected to exactly one server process. Whenever a request for a connection is detected the postgres process spawns a new server process. https://www.postgresql.org/docs/13/connect-estab.html When we do something like this: conn = await asyncpg.connect(user='user', password='password', database='database', host='127.0.0.1') 35
  17. DB Communication Base concepts* *(based on SQL DB, exactly PostgreSQL,

    can be diff for NoSQL) And now we already can do something like this: values = await conn.fetch( 'SELECT * FROM mytable WHERE id = $1', 10) Execute the query 36
  18. DB Communication Base concepts* *(based on SQL DB, exactly PostgreSQL,

    can be diff for NoSQL) 2. Transactions The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. https://www.postgresql.org/docs/13/tutorial-transactions.html 38
  19. DB Communication Base concepts* *(based on SQL DB, exactly PostgreSQL,

    can be diff for NoSQL) 2. Transactions The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. https://www.postgresql.org/docs/13/tutorial-transactions.html Transactions in code: async with db.transaction() as transaction: try: ... # register data elements await asyncio.wait([register_data_elements(object_data, object_id, redacted_schema_id, object_schema)]) ... await register_datazone(datazone_data_obj, datazone, object_id, is_dbsource, state.dw, state) ... 39
  20. DB Communication Base concepts* *(based on SQL DB, exactly PostgreSQL,

    can be diff for NoSQL) 3. Cursors Cursors are useful when there is a need to iterate over the results of a large query without fetching all rows at once. 40
  21. DB Communication Base concepts* *(based on SQL DB, exactly PostgreSQL,

    can be diff for NoSQL) Other operations required ‘session’, has ‘historical context’, set of actions one-to-another 41
  22. How DB can impact your code? - Can you use

    async code or not sync? - How much abstractions you need to implement? 42
  23. Why do we want abstractions? def gen_datasource_sql(ds, schema, future_date, sqls):

    sql = """update {}data_sources set effective_end=(CURRENT_DATE -1) WHERE object_id='{}' and effective_start!=CURRENT_DATE and effective_end='{}'""" sql = sql.format(schema, ds.object_id, future_date) sqls.append(sql.replace("\n", " ")) sql = """insert into {}data_sources (object_id, source, file_type, load_method, field_delimiter, tshirt_size, delivery_frequency, contains_header_record, notify, effective_start, effective_end) values ('{}', '{}', '{}', '{}', '{}', '{}', '{}', {}, '{}', CURRENT_DATE, '{}') ON CONFLICT (object_id, effective_start) DO UPDATE SET source=excluded.source, file_type=excluded.file_type,load_method=excluded.load_method,field_delimiter=excluded.field_delimiter, tshirt_size=excluded.tshirt_size,delivery_frequency=excluded.delivery_frequency, contains_header_record=excluded.contains_header_record,notify=excluded.notify, effective_end=excluded.effective_end """ sql = sql.format(schema, ds.object_id, ds.source, ds.file_type, ds.load_method, ds.field_delimiter, ds.tshirt_size, ds.delivery_frequency, ds.contains_header_record, '{{"{}"}}'.format('","'.join(ds.notify)), future_date) sqls.append(sql.replace("\n", " ")) 43
  24. Why do we want abstractions? permission = await Permission.query.where( and_(

    Permission.ns_object_id == ns_object_id, Permission.role_id == entitlement.role, Permission.effective_end == security_future_date, ) ).gino.first() if permission: await permission.update(effective_end=datetime.datetime.utcnow()).apply() 44
  25. ORM Object-Relational Mapping (ORM) is a technique that lets you

    query and manipulate data from a database using an object-oriented paradigm. 45
  26. ORM Object-Relational Mapping (ORM) is a technique that lets you

    query and manipulate data from a database using an object-oriented paradigm. Work with one Table row as with Object And with Table as with Class 46
  27. ORM Object-Relational Mapping (ORM) is a technique that lets you

    query and manipulate data from a database using an object-oriented paradigm. Work with one Table row as with Object And with Table as with Class Table: events id name start_date Table: users id name class Event(Model): id = fields.IntField(pk=True) name = fields.TextField() start_datetime = fields.DateTimeField() class User(Model): id = fields.TextField(pk=True) name = fields.TextField() 47
  28. ORM: Not only 1-to-1 mapping DB to Code • Default

    values (ORM) vs server_default (DB default) CREATE table user_server_default ( id varchar not null ,name varchar not null default 'New User', ) ; 48
  29. ORM: Not only 1-to-1 mapping DB to Code • Default

    values (ORM) vs server_default (DB default) 49
  30. ORM: Not 1-to-1 mapping DB to Code • Default values

    (ORM) vs server_default (DB default) class User(models.Model): class Meta: table = "users" id = fields.TextField(pk=True) name = fields.TextField(default='Default from Tortoise') Tortoise ORM 50
  31. ORM: Not 1-to-1 mapping DB to Code • Default values

    (ORM) vs server_default (DB default) class User(models.Model): class Meta: table = "users" id = fields.TextField(pk=True) name = fields.TextField(default='Default from Tortoise') Tortoise ORM 51
  32. ORM: Not only 1-to-1 mapping DB to Code • Additional

    ‘hooks’ on the code layer • onupdate – A scalar, Python callable, or ClauseElement representing a default value to be applied to the column within UPDATE statements, which wil be invoked upon update if this column is not present in the SET clause of the update. This is a shortcut to using ColumnDefault as a positional argument with for_update=True. 52
  33. ORM: Not only 1-to-1 mapping DB to Code • Additional

    ‘hooks’ on the code layer • onupdate – A scalar, Python callable, or ClauseElement representing a default value to be applied to the column within UPDATE statements, which wil be invoked upon update if this column is not present in the SET clause of the update. This is a shortcut to using ColumnDefault as a positional argument with for_update=True. Good for fields like ‘modified_at` 53
  34. ORM: Not only 1-to-1 mapping DB to Code • ‘Relationships’:

    one-to-one, many-to-many, one-to-many Table: events id name start_date Table: users id name Table: attendees event_id user_id 54
  35. ORM: Not only 1-to-1 mapping DB to Code • ‘Relationships’:

    one-to-one, many-to-many, one-to-many Table: events id name start_date Table: users id name Table: attendees event_id user_id CREATE table events_test( id varchar not null ,user_id_1 varchar not null REFERENCES users (id), user_id_2 varchar not null REFERENCES users (id), ) ; Foreign Keys 55
  36. ORM: Not only 1-to-1 mapping DB to Code • ‘Relationships’:

    one-to-one, many-to-many, one-to-many class Event(models.Model): class Meta: table = "events" id = fields.IntField(pk=True) name = fields.TextField() start_datetime = fields.DatetimeField() participants = fields.ManyToManyField('models.User', related_name='events', through='attendees_tortoise') class User(models.Model): class Meta: table = "users" id = fields.TextField(pk=True) name = fields.TextField(default='Default from Tortoise') 56
  37. ORM: Not only 1-to-1 mapping DB to Code • ‘Relationships’:

    one-to-one, many-to-many, one-to-many 57
  38. ORM: Not only 1-to-1 mapping DB to Code • All

    methods -> convert to SQL query at the end await User.create(id='admin', name='Super Admin') INSERT INTO users(id, name) VALUES('admin', Super Admin); 58
  39. Gino API (client) SQLAlchemy Core Asyncpg (DB Interface) async SQLAlchemy

    Core + async = Async wrappers https://python-gino.org/ 61
  40. Gino API (client) SQLAlchemy Core Asyncpg (DB Interface) async SQLAlchemy

    Core + async = Async wrappers https://python-gino.org/ Do not support Relationships SQLAlchemy Core + async = 62
  41. Statement generators https://github.com/encode/databases (also used SqlAlchemy Core) # Create a

    table. query = """CREATE TABLE HighScores (id INTEGER PRIMARY KEY, name VARCHAR(100), score INTEGER)""" await database.execute(query=query) # Insert some data. query = "INSERT INTO HighScores(name, score) VALUES (:name, :score)" values = [ {"name": "Daisy", "score": 92}, {"name": "Neil", "score": 87}, {"name": "Carol", "score": 43}, ] await database.execute_many(query=query, values=values) # Run a database query. query = "SELECT * FROM HighScores" rows = await database.fetch_all(query=query) print('High Scores:', rows) 65
  42. How DB can impact your code? - Can you use

    async code or not? - How much abstractions you need to implement? - How to store/obtain data in choosed DB (when you try to work with relation data in MongoDB or Cassandra and have no idea why you doing this) 66
  43. Check list • Think of Data structure and operations on

    Data before choose DB • Check does you language has DB Interface library (and does it have async - if async important for you) 68
  44. Check list • Think of Data structure and operations on

    Data before choose DB • Check does you language has DB Interface library (and does it have async - if async important for you) • Check benchmarks / test ORMs/libraries if performance is important for you 69
  45. Check list • Think of Data structure and operations on

    Data before choose DB • Check does you language has DB Interface library (and does it have async - if async important for you) • Check benchmarks / test ORMs/libraries if performance is important for you • Check that library do not limit you in actions (you can use all that you need DB functions - max, distinct and etc) • Easy to use 70