Full AI potential with async PHP

Full AI potential with async PHP Nemanja Marić maki10

About me • PHP Developer • Member of PHP Serbia
Community • Co-organizer Laravel Serbia Meetup’s and PHP Serbia Meetup’s • Working with PHP since 2014 • In Laravel world from 2016 • Open source contributor • Contributing to the Laravel Framework • And most !important: Husband and father of two little angels • Love dad jokes

Latest moto

Story: What is our representative module?

Story: Let’s assume what is our representative module?

Imagine an application that can harvest all data for one
person or company. For example, if you want to know more about Tesla, Cybertruck or Elon Musk. It is someone who mentioned good for the company or bad, and a lot more data. So, you will harvest all available data and process it. What is all? Articles, news, blogs, images, Twitter, documents etc. It will collect all available data, convert it to AI language and proceed with the desired data. AI-based app for processing news and documents

What do you think when you type Elon Musk, how
much data do we get?

How I will describe this app? “Beauty and the Beast”

Beauty mode: Well engineered. An application is designed 90% as
you think it should be designed.

Beast mode: Multi server applications that CPU peak roughly 90%
of all time, small RAM usage.

“Ana” Main applications Our main application is a Laravel API-based
application. It's on a droplet with 64GB of RAM and 32 CPU cores. Capable of scraping 10k jobs per hour with only 10% of fully capable. Two MySQL read instances and one write. MySQL is capable of 6400 parallel connections. Zero cache. Scaled down from 128GB of RAM with the cut of 50% revenue. For the presentation, we will focus on Laravel Queue and later on Open Search (elastic search).

How we make PHP async?

Laravel Queue (via Horizon) While building your web application, you
may have some tasks, such as parsing and storing an uploaded CSV file, that take too long to perform during a typical web request. Thankfully, Laravel allows you to easily create queued jobs that may be processed in the background. By moving time intensive tasks to a queue, your application can respond to web requests with blazing speed and provide a better user experience to your customers.

Scaling Horizon Scaling by horizon workers Scaling by queue workers

Good scaling example

Pitfall 1 Create a script to send a large number
of emails to N users, incorporating intricate logic in the email content. Specify the content of the emails, including subject, body, and any attachments. How to handle? API Call Job Command

Wrong, you should optimize logic first

Pitfall 2 Create a script to generate two thumbnail images
for each of N pages of PDFs, specifying the desired dimensions and format for the thumbnails. How to handle? API Call Job Command

We need to scale up?!? We are optimized maximum. It's
clear that this is a heavy task, and we need to scale the server to process this.

Bill increases by 2x. Who is going tell to the
client? Check list: 1. Our fault (Hell no) 2. PHP is not right choice (Hell yes) How to solve: 1. Python 2. Node.js 3. Call a senior? (Huh no) PM: Who gonna implement changes? Devs: Senior dev. PM: So let’s call a senior.

Senior joining the project with the first task: Write Python
microservice to support logic for thumbnails in XY component.

Senior joining the project with the first task: Write Python
microservice to support logic for thumbnails in XY component. Optimizing here and there and scaling down the server.

Bill decreased by 2x. Conference call Always try to optimize
your code as much as possible. Before adopting another language in your codebase check if it is needed. When do you need to consider another language? • When a language has proven benefits. • When a language has better support and can process much more data in a given time.

Microservice 1 “Ceca”

“Ceca” is async Node.JS scraper (news, blog, twitter..) Droplet with
24GB of RAM and 32 CPU cores. At the moment I have joined a project capable of scraping 10k jobs in an hour. Pm2 running NodeJS.

Unlock full potential. 10k in minute or 500k hour.

Pitfall 3 How to scrape for example Tesla but company,
not a person? Apple company, not fruit. Mars company for chocolate, not a planet. How to handle this for Tesla for example?

Pitfall 3 How to scrape for example Tesla but company,
not a person? Apple company, not fruit. Mars company for chocolate, not a planet. How we handle this for Tesla for example? 1. Adding custom indicators in search query (-person) 2. Adding company aliases in search query (+tsla) 3. Adding CEO in search query (+“Elon Musk”)

Microservice 2 “Hana”

Droplet with 4GB of RAM and 4 CPU cores. “Hana”
Purpose is to slice pdf into images, and then send each page to AWS textextract. Then AWS returns parsed table data and all text that we will use the later in our app. So “Ana” communicates asynchronies with “Hana”. “Hana” is an async Python (FastApi) microservice for processing PDFs

Limitations pitfall 4 We have PDFs with 10 and 400
pages. 90% of PDFs have tables or images. So, in that case, each PDF has a different execution time. How to bypass AWS Quotas (50 Transactions Per Second) in async mode? How to handle this?

Limitations pitfall 4 We have PDFs with 10 and 400
pages. 90% of PDFs have tables or images. So, in that case, each PDF has a different execution time. How to bypass AWS Quotas (50 Transactions Per Second) in async mode? How we handle this? 1. Switching “Hana” asynchronies to synchronise HTTP 2. Asynchrony slicing PDF pages and sending them to AWS. 3. With Math

So, we know that AWS can process up to 50
images per second (Transactions Per Second), but if the image has more than one table, their calculations will be increased by 1. Let's assume that a pdf with 10 pages has 8 tables in total. In that case, we know that one PDF will need 8 (Transactions Per Second).To download from S3 and put back all images it took roughly 2 seconds per page. Also, there are several factors that we need to keep track of. Our Math on the end is 70 workers that work asynchronies from a master application (“Ana”) that hits "Hana" and waits for a response before dispatching a new job. How we keep limitations in the threshold

Results: Threshold that is from 70% to 95% all time.

Microservice 3 “Era”

“Era” is an async Python (FastApi) microservice for sentiment analysis
Droplet with 4GB of RAM and 4 CPU cores. “Era” Purpose is for Machine Learning. “Ana” communicates asynchronies with “Era”.

How “Era” works? It took a sentence we want to
check and later will train our model from a pre-trained sentence. Tokenize it in a way that Machine Learning can understand.

Pitfall 5 So, since our model for Machine Learning is
written in Python and our main app is a PHP-based app, we need a way to communicate with each other. Yes, we all know that PHP can communicate with Python, but it's not compatible with our training model in a hugging face. How to handle this?

Pitfall 5 So, since our model for Machine Learning is
written in Python and our main app is a PHP-based app, we need a way to communicate with each other. Yes, we all know that PHP can communicate with Python, but it's not compatible with our training model in a hugging face. How we handle this? "Era" is our adapter pattern microservice that allows us to train our model as we wish. Since tokenisation on the hugging face for our model is written in Python, the best way is to train our model with Python.

Pack all together

When all microservices on heavy load “Ana” are only on
40% reserved capacity allows the end user to work as usual.

Pitfall 6 “Ana” is on heavy load from microservices, but
needs to prepare all data to be searchable via Open Search (Elastic). Since we are getting 10k results in minute from "Ceca" Elastic will be DDoS-ed. How not to tear down Elastic? How to handle this?

Pitfall 6 “Ana” is on heavy load from microservices, but
needs to prepare all data to be searchable via Open Search (Elastic). Since we are getting 10k results in minutes from "Ceca" Elastic will be DDoS-ed. How not to tear down Elastic? How we handle this? “Ana” will always prepare a small amount of data. When the user demands more data "Ana" will prepare itself and push all search data to Elastic. This is achieved in zero downtime. The end user doesn't have a clue. If it is scrapping in progress "Ana" will update Elastic every 5 minutes.

Final Pitfall 7 When you have multiple queue workers, and
they have multiple pending jobs. How to add your job to the top? How make it a first executable job? How to handle this?

Final Pitfall 7 When you have multiple queue workers, and
they have multiple pending jobs. How to add your job to the top? How make it a first executable job? How we handle this? Let's play `nice`.

Questions? Links: www.google.com https://laravel.com/docs/10.x/ https://fastapi.tiangolo.com/ https://huggingface.co/ Slides: https://speakerdeck.com/maki10 Twitter: https://twitter.com/NemanjaMaki10

Full AI potential with async PHP

Full AI potential with async PHP

More Decks by Nemanja Maric

Other Decks in Programming

Featured

Transcript