Rapid prototyping in BBC News with Python and AWS

RAPID PROTOTYPING IN BBC NEWS WITH PYTHON AND AWS BEN
NUTTALL BBC NEWS LABS

IT’S GOOD TO BE BACK

https://www.youtube.com/watch?v=QCte3cOx49U

• Software Engineer, BBC News Labs • Former Community Manager
at Raspberry Pi • PyPI critical project maintainer • Based in Cambridgeshire • bennuttall.com • twitter.com/ben_nuttall • github.com/bennuttall Ben Nuttall

• Multi-disciplinary innovation team within BBC News & BBC R&D
• Prototypes of new audience experiences • Solutions to help journalists • Research and trying out ideas • bbcnewslabs.co.uk • twitter.com/bbc_news_labs BBC News Labs

IDX (Identify the X) Automated clipping of content in live
radio for social media mosromgr Processing TV/radio running orders to extract structured metadata BBC Images Image metadata enrichment pipeline Projects

• 3 x 2-week sprints • 2 weeks of tweaks
Project cycles Sprint 1 Sprint 2 Sprint 3 Tweaks Sprint 1

• Research week • 3 x 2-week sprints • Wrap-up
week Project cycles Research week Sprint 1 Sprint 2 Sprint 3 Wrap-up week Small projects

• Start with department objectives • Devise "how might we..."
statements • Explode and converge • Determine project objectives • Research • Bootstrapping • Spikes • Sprint goals • Ticketing Ideation

• Identify stakeholders • Set up calls with journalists •
Learn about existing systems and workflows • Get access to systems & data and get to know them • Set up shadowing Research week

• Sit with a journalist or producer • Watch them
do their job using existing tools • Work out what their workflows are • Look for pain points, inefficiencies, slowness, manual work that could be automated Shadowing

• Lambda functions • Step functions / state machines •
Databases (DynamoDB, RDS, Timestream, etc) • S3 • SNS/SQS • CloudWatch AWS services for building processing pipelines

• Run code without managing server infrastructure • Pay for
compute time instead of provisioning for peak capacity • Python/JavaScript/Go/etc • Python 3.9 AWS Lambda

• Workflow design • Sequence of Lambdas • Lambdas can
be implemented in different languages • Failures, retries, parallelisation Step functions / state machines

• Execute with initial data • Pass new data on
• Parallel paths and decisional logic • Specify retry logic • Whole state machine succeeds or fails • Easy access to data, exception info and lambda logs Step functions / state machines

def lambda_handler(event: dict, context=None) -> dict: ... event['thing'] = do_thing(data)
return event Step functions / state machines

• Data parsing and settings management using Python type annotations
• Parse and validate a lambda’s input data and configuration Pydantic

from typing import Optional from datetime import datetime, timedelta from
pydantic import BaseModel class InputEvent(BaseModel): file_id: str ncs_id: Optional[str] start_time: datetime duration: timedelta body: list[str] = [] Pydantic models

from .models import InputEvent from .utils import do_thing def lambda_handler(event,
context=None): input_event = InputEvent(**event) do_thing(input_event.thing_id) Pydantic models

from pydantic import BaseSettings class Settings(BaseSettings): cert_file_path: str key_file_path: str
@property def cert(self): return (self.cert_file_path, self.key_file_path) class Config: env_prefix = 'MOS_' env_file = '.env' Pydantic settings

import requests from .models import Settings settings = Settings() def
fetch_thing(url): r = requests.get(url, cert=settings.cert) return r.json() Pydantic settings

• Amazon Aurora PostgreSQL-Compatible Edition • DB instance class: serverless
v1 • Specify capacity range and scaling configuration • Web service data API Serverless PostgreSQL

Database: Type: AWS::RDS::DBCluster Properties: DBClusterIdentifier: !Ref DBClusterName MasterUsername: !Ref DBUsername
MasterUserPassword: !Ref DBPassword DatabaseName: !Ref DBName Engine: aurora-postgresql EngineMode: serverless ScalingConfiguration: AutoPause: true MinCapacity: !Ref DBMinCapacity MaxCapacity: !Ref DBMaxCapacity SecondsUntilAutoPause: !Ref DBSecondsUntilAutoPause EnableHttpEndpoint: true Serverless PostgreSQL - CloudFormation

Serverless PostgreSQL

• Access via boto3 • Or preferably use aurora-data-api or
sqlalchemy-aurora-data-api • Connect using AWS Secrets Manager Serverless PostgreSQL

• EC2 web server hosting static files in S3 •
Access via BBC Login or BBC certificate • Every project can re-use the infrastructure • Great for SPAs and static sites News Labs Apps Portal

• Devise a website content structure with a layout template
• Create Chameleon templates for each page type • Create logic layer for retrieving data required for each page write • Create lambda for writing/rewriting relevant pages • e.g. new episode processed: • write new episode page /<brand>/<episode>/index.htm • update brand index page /<brand>/index.htm • update homepage /index.htm • Create CLI for manual rewrites Static HTML websites with Chameleon

• Structured logging • Looks great when running locally •
easy to see relevant information • JSON logging support ideal for running in AWS • can access and search structured logs in CloudWatch • Encourages good logging practice! Structlog import structlog if os.environ.get('MOS_LOGGING') == 'JSON': processors = [ structlog.stdlib.add_log_level, structlog.processors.StackInfoRenderer(), structlog.processors.format_exc_info, structlog.processors.JSONRenderer(), ] structlog.configure(processors=processors)

Structlog

from pydantic import BaseSettings class DbSettings(BaseSettings): arn: str secret_arn: str
name: str class Config: env_prefix = 'MOS_DB_' env_file = '.env' Pydantic database settings

from sqlalchemy import Column, ForeignKey, String, DateTime from sqlalchemy.orm import
declarative_base Base = declarative_base() class Episode(Base): __tablename__ = 'episodes' episode_pid = Column(String, primary_key=True) version_pid = Column(String, nullable=False, unique=True) brand_pid = Column(String, ForeignKey('brands.brand_pid'), nullable=False) title = Column(String, nullable=False) image_pid = Column(String, nullable=False) service_id = Column(String, ForeignKey('services.service_id'), nullable=False) start_time = Column(DateTime) end_time = Column(DateTime) synopsis = Column(String, nullable=False) sqlalchemy

Rapid prototyping in BBC News with Python and AWS

Rapid prototyping in BBC News with Python and AWS

Ben Nuttall

More Decks by Ben Nuttall

Other Decks in Technology

Featured

Transcript