How we built an AI code reviewer with serverless and Bedrock

How we built an AI Code Reviewer with Serverless and
Bedrock

Yan Cui http://theburningmonk.com @theburningmonk AWS user since 2010

Yan Cui http://theburningmonk.com @theburningmonk running serverless in production since 2016

Developer Advocate @ Yan Cui http://theburningmonk.com @theburningmonk

Yan Cui http://theburningmonk.com @theburningmonk independent consultant

evolua.io Demo

Architecture

API Gateway EventBridge Webhook

API Gateway DynamoDB Bedrock EventBridge Webhook

API Gateway DynamoDB Bedrock EventBridge Webhook evolua.io

API Gateway DynamoDB Bedrock EventBridge Webhook AppSync evolua.io

API Gateway DynamoDB Bedrock EventBridge Webhook AppSync evolua.io Authoriser

Challenges (for an AI code reviewer) Handling sensitive data for
customers

Challenges (for an AI code reviewer) Large fi les. Large
PRs with many fi les. Handling sensitive data for customers

Why Bedrock?

Security

Security Data is encrypted at rest.

www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak

aws.amazon.com/bedrock/faqs

Security Data is encrypted at rest. Inputs & Outputs are
not shared with model providers. Inputs & Outputs are not used to train other models.

API Gateway DynamoDB Bedrock EventBridge Webhook AppSync evolua.io Authoriser Fallback
Primary

privacy.anthropic.com/en/articles/7996885-how-do-you-use-personal-data-in-model-training

Serverless

Serverless Usage-based AND provisioned throughput pricing

1M Input Tokens 1M Output Tokens $0.14 v3 r1 $0.28
$0.55 $2.19 Sonnet $3.75 $15.0 Haiku $0.80 $4.00

Very cost ef fi cient!

Very cost ef fi cient! Data is stored in China.

Data might be used to train other models.

www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak

Data might be used to train other models. Operationally immature.

No token-based pricing yet

No token-based pricing yet “GPU-based instance type like ml.p5e.48xlarge is
recommended”

ml.p5e.48xlarge 💰💰💰💰💰💰💰💰💰💰 💰💰💰💰💰💰💰💰💰💰 💰💰💰💰💰💰💰💰💰💰 💰💰💰💰💰💰💰💰💰💰 💰💰💰💰💰💰💰💰

Other capabilities Guardrails Knowledge base (managed RAG) Agents Cross-region inference
Model evaluations

API Gateway DynamoDB Bedrock EventBridge Webhook AppSync evolua.io Authoriser Fallback
Primary

Lessons

Webhook

Webhook Analyse changes

Webhook Analyse changes Feedback

Condensed view…

Lambda timed out after 15 mins

Succeeded on automatic retry

Webhook Analyse changes Feedback LLM limits GitHub limits AWS limits

Lesson: AI is 10% of the problem

Reasoning ability

Context window Max response tokens API rate limit Reasoning ability

Cost Performance

Cost Performance Important selection criteria for LLMs

Doing cool AI stuff! Working around AI limits

Doing cool AI stuff! Working around AI limits Stop playing
with my bowl…

Cost Performance

Claude 3.5 Sonnet’s default throughput is 50 per minute

Claude 3.5 Sonnet’s default throughput is 50 per minute Can
be raised to 1,000 per minute

Claude 3.5 Sonnet’s default throughput is 50 per minute Can
be raised to 1,000 per minute Bedrock has cross- region inference

Mitigate API rate limit Raise account limits. Use Bedrock cross-region
inference.

inference. Limit no. of parallel requests per PR.

inference. Limit no. of parallel requests per PR. Fallback to Anthropic & less powerful models (Claude 3 Sonnet, Claude 3.5 Haiku)

Future work: incorporate other models (Nova, DeepSeek, etc.)

Future work: incorporate other models (Nova, DeepSeek, etc.) Also good
for cost control!

Lesson: LLMs are still quite expensive

Dif fi cult to build a sustainable and competitive business

Cost control Only analyse changed lines.

Cost control Only analyse changed lines. Good for cost control
Good for UX

Cost control Only analyse changed lines. Limit free users to
few PRs per month.

API Gateway DynamoDB Bedrock EventBridge Webhook

API Gateway DynamoDB Bedrock EventBridge Webhook Built-in retries & DLQ

Lambda timed out after 15 mins

Lambda timed out after 15 mins Reprocess fi les on
retry…

Lambda timed out after 15 mins Reprocess fi les on
retry… Duplicated side- effects (e.g. Github comments)

Cost control Only analyse changed lines. Limit free users to
few PRs per month. Use checkpoints to avoid re-processing fi les on retries

const issues = await executeIdempotently( `${event-id}-${filename}-analyze`, () => analyzeFile(file) );
... await executeIdempotently( `${event-id}-${filename}-add-gh-comment`, () => addReviewComment(filename, comment) );

Webhook Analyse changes Feedback Why not Step Functions?

Webhook Analyse changes Feedback Why not Step Functions? Checkpoints is
just easier 🤷

Lesson: Latency is a challenge

Models take 10s of seconds to analyse each fi le

Wasted CPU cycles in Lambda

Future work: try other models

Future work: make use of these CPU cycles

Lesson: Be ware of hallucinations

“Give me JSON in this format”

“Give me JSON in this format” “Nope!”

Non-existent codes, invalid URLs

Non-existent line numbers

Future works

Go to evolua.io to try it out. We’d love your
feedback!

Questions?

How we built an AI code reviewer with serverles...

How we built an AI code reviewer with serverless and Bedrock

More Decks by Yan Cui

Other Decks in Technology

Featured

Transcript