Rapid prototyping in BBC News with Python and AWS

@ben_nuttall Rapid prototyping in BBC News with Python and AWS

@ben_nuttall Ben Nuttall • Software Engineer, BBC News Labs •
Former Community Manager at Raspberry Pi • Based in Cambridgeshire, UK • bennuttall.com • twitter.com/ben_nuttall • github.com/bennuttall

@ben_nuttall COVID • I was looking forward to attending EuroPython
in person for the first time since 2019 (Basel) • Unfortunately, I recently got COVID • Thank you to EuroPython for making a remote-friendly conference

@ben_nuttall BBC News Labs • Multi-disciplinary innovation team within BBC
News & BBC R&D • Prototypes of new audience experiences • Solutions to help journalists • Research and trying out ideas • bbcnewslabs.co.uk • twitter.com/bbc_news_labs

@ben_nuttall Projects • IDX (Identify the X) – Automated clipping
of content in live radio for social media • mosromgr – Processing TV/radio running orders to extract structured metadata • BBC Images – Image metadata enrichment pipeline

@ben_nuttall Project cycles • 3 x 2-week sprints • 2
weeks of tweaks Sprint 1 Sprint 2 Sprint 3 Tweaks Sprint 1

@ben_nuttall Project cycles • Research week • 3 x 2-week
sprints • Wrap-up week Research week Sprint 1 Sprint 2 Sprint 3 Wrap up week Small projects

@ben_nuttall Ideation • Start with department objectives • Devise "how
might we..." statements • Explode and converge • Determine project objectives • Research • Bootstrapping • Spikes • Sprint goals • Ticketing

@ben_nuttall Research week • Identify stakeholders • Set up calls
with journalists • Learn about existing systems and workflows • Get access to systems & data and get to know them • Set up shadowing

@ben_nuttall Shadowing • Sit with a journalist or producer •
Watch them do their job using existing tools • Work out what their workflows are • Look for pain points, inefficiencies, slowness, manual work that could be automated

@ben_nuttall AWS services for building processing pipelines • Lambda functions
• Step functions / state machines • Databases (DynamoDB, RDS, Timestream) • S3 • SNS/SQS • CloudWatch

@ben_nuttall AWS Lambda • Run code without managing server infrastructure
• Pay for compute time instead of provisioning for peak capacity • Python/NodeJS/Go/etc

@ben_nuttall Step functions / state machines • Workflow design •
Sequence of Lambdas • Lambdas can be implemented in different languages • Failures, retries, parallelization

@ben_nuttall Step functions / state machines • Execute with initial
data • Pass new data on • Parallel paths and decisional logic • Specify retry logic • Whole state machine succeeds or fails • Easy access to data, exception info and lambda logs

@ben_nuttall Step functions / state machines def lambda_handler(event: dict, context=None)
-> dict: ... event['thing'] = do_thing(data) return event

@ben_nuttall Pydantic • Data parsing and settings management using python
type annotations • Parse and validate a lambda’s input data and configuration

@ben_nuttall Pydantic models from typing import Optional from datetime import
datetime, timedelta from pydantic import BaseModel class InputEvent(BaseModel): file_id: str ncs_id: Optional[str] start_time: datetime duration: timedelta body: list[str] = []

@ben_nuttall Pydantic models from .models import InputEvent from .utils import
do_thing def lambda_handler(event, context=None): input_event = InputEvent(**event) do_thing(input_event.thing_id)

@ben_nuttall Pydantic settings from pydantic import BaseSettings class Settings(BaseSettings): cert_file_path:
str key_file_path: str @property def cert(self): return (self.cert_file_path, self.key_file_path) class Config: env_prefix = 'MOS_' env_file = '.env'

@ben_nuttall Pydantic settings import requests from .models import Settings settings
= Settings() def fetch_thing(url): r = requests.get(url, cert=settings.cert) return r.json()

@ben_nuttall AWS databases • DynamoDB – Serverless – NoSQL tables
– JSON data storage • Timestream – Serverless time series database – SQL optimised for time series data • RDS – Managed SQL databases – Serverless option available

@ben_nuttall Serverless PostgreSQL • Amazon Aurora PostgreSQL- Compatible Edition •
DB instance class: serverless v1 • Specify capacity range and scaling configuration • Web service data API

@ben_nuttall Serverless PostgreSQL - CloudFormation Database: Type: AWS::RDS::DBCluster Properties: DBClusterIdentifier:
!Ref DBClusterName MasterUsername: !Ref DBUsername MasterUserPassword: !Ref DBPassword DatabaseName: !Ref DBName Engine: aurora-postgresql EngineMode: serverless ScalingConfiguration: AutoPause: true MinCapacity: !Ref DBMinCapacity MaxCapacity: !Ref DBMaxCapacity SecondsUntilAutoPause: !Ref DBSecondsUntilAutoPause EnableHttpEndpoint: true

@ben_nuttall Serverless PostgreSQL

@ben_nuttall Serverless PostgreSQL • Access via boto3 • Or preferably
use aurora-data-api or sqlalchemy-aurora-data-api • Connect using AWS Secrets Manager

@ben_nuttall News Labs Apps Portal • EC2 web server hosting
static files in S3 • Access via BBC Login or BBC certificate • Every project can re-use the infrastructure • Great for SPAs and static sites

@ben_nuttall Static HTML websites with Chameleon • Devise a website
content structure with a layout template • Create Chameleon templates for each page type • Create logic layer for retrieving data required for each page write • Create lambda for writing/rewriting relevant pages – e.g. new episode processed: • write new episode page /<brand>/<episode>/index.htm • update brand index page /<brand>/index.htm • update homepage /index.htm • Create CLI for manual rewrites

@ben_nuttall Structlog • Structured logging • Looks great when running
locally - easy to see relevant information • JSON logging support ideal for running in AWS - can access and search structured logs in CloudWatch • Encourages good logging practice! import structlog if os.environ.get('MOS_LOGGING') == 'JSON': processors = [ structlog.stdlib.add_log_level, structlog.processors.StackInfoRenderer(), structlog.processors.format_exc_info, structlog.processors.JSONRenderer(), ] structlog.configure(processors=processors)

@ben_nuttall Structlog

@ben_nuttall Pydantic database settings from pydantic import BaseSettings class DbSettings(BaseSettings):
arn: str secret_arn: str name: str class Config: env_prefix = 'MOS_DB_' env_file = '.env'

@ben_nuttall sqlalchemy from sqlalchemy import Column, ForeignKey, String, DateTime from
sqlalchemy.orm import declarative_base Base = declarative_base() class Episode(Base): __tablename__ = 'episodes' episode_pid = Column(String, primary_key=True) version_pid = Column(String, nullable=False, unique=True) brand_pid = Column(String, ForeignKey('brands.brand_pid'), nullable=False) title = Column(String, nullable=False) image_pid = Column(String, nullable=False) service_id = Column(String, ForeignKey('services.service_id'), nullable=False) start_time = Column(DateTime) end_time = Column(DateTime) synopsis = Column(String, nullable=False)

@ben_nuttall sqlalchemy class MosDatabase: def __init__(self): settings = Settings() self.engine
= create_engine( f'postgresql+auroradataapi://:@/{settings.name}', connect_args=dict( aurora_cluster_arn=settings.arn, secret_arn=settings.secret_arn, ) ) def get_episode(self, episode_pid: str) -> Episode: with Session(self.engine) as session: query = Episode.__table__.select().where(Episode.episode_pid == episode_pid) return session.execute(query).mappings().one()

@ben_nuttall Lambda function URLs and FastAPI • Dedicated HTTP endpoint
for a Lambda function • Serverless REST API • FastAPI (built on Starlette and Pydantic) makes it very easy to provide a REST API – Serverless API for a serverless database – Define in/out data structure with Pydantic – Easily add authentication

@ben_nuttall Learning • Move fast and learn things! • Take
learnings into the next project • Use spikes to try ideas out • No project is perfect • No hard rules - determine good practice and keep improving • Knowledge share • Prioritise for delivery • Use research week and wrap-up week wisely

@ben_nuttall Rapid prototyping in BBC News with Python and AWS

Rapid prototyping in BBC News with Python and AWS

Rapid prototyping in BBC News with Python and AWS

More Decks by Ben Nuttall

Other Decks in Programming

Featured

Transcript