Sequential Operations With REST

Slide 1

Slide 1 text

API Days Decembre 2023 Handle sequential operations with REST and PHP (within API Platform)

Slide 2

Slide 2 text

Grégoire Hébert Principal Developer Author of "Memex - La route du REST" @gregoirehebert @gheb_dev

Slide 3

Slide 3 text

Alternatives What is the subject, and why? 02 03 Summary 04 05 01 Usages Research and implementation What's next?

Slide 4

Slide 4 text

1 What is the subject and why? @gregoirehebert @gheb_dev Sequential operations with REST

Slide 5

Slide 5 text

@gregoirehebert @gheb_dev When might we encounter this situation ? ➔ Submission of forms in several passes 1 What is the subject and why? In a performance or an economical approach, how can we optimise the sending of requests? And, if we are in fintech, where financial related messages are sent, ensure that they are executed in succession, or in groups? Maybe invalidate the first two if the third don't succeed?

Slide 6

Slide 6 text

@gregoirehebert @gheb_dev 1 What is the subject and why? When might we encounter this situation ? ➔ Submission of forms in several passes ➔ Contact multiple endpoint at the same time

Slide 7

Slide 7 text

@gregoirehebert @gheb_dev When might we encounter this situation ? ➔ Submission of forms in several passes ➔ Contact multiple endpoint at the same time ➔ Exploit a Resource right after its creation without waiting for the result to be sent. 1 What is the subject and why?

Slide 8

Slide 8 text

@gregoirehebert @gheb_dev What does Respecting REST entail? ➔ Using HTTP ➔ Being stateless 1 What is the subject and why? HTTP is stateless Meaning that there is no link between two requests executed successively on the same connection. This can be problematic for users trying to interact with certain pages in a consistent way, for example using shopping carts for e-commerce. Therefore while the HTTP protocol itself is stateless, HTTP cookies allow sessions to share the same context, or state.

Slide 9

Slide 9 text

@gregoirehebert @gheb_dev Resolutions thanks to the evolutions of the HTTP protocol. ➔ With HTTP/1.1 ◆ using pipelining ➔ With HTTP/2 ◆ using multiplexing 1 What is the subject and why?

Slide 10

Slide 10 text

@gregoirehebert @gheb_dev 1 What is the subject and why?

Slide 11

Slide 11 text

@gregoirehebert @gheb_dev 1 What is the subject and why? In HTTP/1.1, we open a connection, and then the requests stack up and run one after the other. With pipelining, requests are sent in parallel, and responses will arrive in the same order. With multiplexing, the first to respond wins, the order of execution is not guaranteed. For multiplexing to work, the protocol associates each request with an identifier to correctly form the request/response tuple, and submits it as a stream.

Slide 12

Slide 12 text

@gregoirehebert @gheb_dev 1 What is the subject and why? And in HTTP/3 it's even "quicker"

Slide 13

Slide 13 text

@gregoirehebert @gheb_dev What are the limitations? ➔ Using HTTP/1.1 ◆ with pipelining ➔ Using HTTP/2 ◆ with multiplexing 1 What is the subject and why?

Slide 14

Slide 14 text

What are the limitations? ➔ Using HTTP/1.1 ◆ with pipelining ● HOL, Hop by Hop, disable by default ● idempotent only ➔ Using HTTP/2 ◆ with multiplexing ● Order is not guaranteed @gregoirehebert @gheb_dev 1 What is the subject and why? Pipelining requests leads to improved load times, but a limitation of HTTP 1.1 still applies: the server must send its responses to the same order as the requests were received - so the entire connection remains FIFO and Head Of Line blocking can occur. Example: if a client sends 4 pipelined GET requests to a proxy over a single connection and the first one is not in its cache, the proxy must forward this request to the destination web server; if the next three requests are instead found in its cache, the proxy must wait for the web server's response, then send it to the client and only then it can also send the three cached responses. Furthermore, POST requests cannot be pipelined, only idempotent verbs And finally, HTTP connection handling is Hop By Hop not end to end. It is transmitted, intermediaries by intermediaries. If only one does not support or activate pipelining, the interest is lost.

Slide 15

Slide 15 text

@gregoirehebert @gheb_dev What if it cannot be in HTTP/2? 1 What is the subject and why? What if I don't have HTTP/2? Then HTTP/1.1

Slide 16

Slide 16 text

@gregoirehebert @gheb_dev Another alternative ➔ Domain Sharding : ww1.example.com (prefer HTTP/2) ➔ Batch Operations. 1 What is the subject and why? When limited to HTTP/1.1 and the number of connections established, you can always create multiple domains to be able to contact the same server multiple times. But, of course, here HTTP/2 and 3 is a better choice. Another solution is possible via the protocol to pretty much bypass the notion of statelessness, it is to embed several requests in a single one.

Slide 17

Slide 17 text

@gregoirehebert @gheb_dev 1 What is the subject and why? We are talking about a batch request.

Slide 18

Slide 18 text

2 Research and implementations @gregoirehebert @gheb_dev

Slide 19

Slide 19 text

@gregoirehebert @gheb_dev 1 2 Research and implementations https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html How can we manipulate this to our advantage? Here is what an HTTP request is, taken from https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html : A request message from a client to a server includes, in the first line of the message, the method to be applied to the resource, the resource ID, and the protocol version used. The goal is to send this information, several times at a time, in the same request. First reflex, if I have the idea, the others must have had it too (before).

Slide 20

Slide 20 text

@gregoirehebert @gheb_dev ➔ https://datatracker.ietf.org/doc/html/draft-snell-http-batcxxh-01 ➔ https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html ➔ https://www.odata.org/documentation/#stq=batch&stp=1 ◆ http://docs.oasis-open.org/odata/odata/v4.01/odata-v4.01-part1-protocol.html#sec_BatchRequests ◆ http://docs.oasis-open.org/odata/odata/v4.01/odata-v4.01-part1-protocol.html#sec_Preferencecontinueonerrorodatacontin ◆ http://docs.oasis-open.org/odata/odata/v4.01/odata-v4.01-part1-protocol.html#sec_Preferencerespondasync ➔ https://developers.google.com/gmail/api/guides/batch ➔ https://developers.google.com/people/v1/batch ➔ https://cloud.google.com/storage/docs/batch ➔ https://www.doctrine-project.org/projects/doctrine-dbal/en/latest/reference/transactions.html#transaction-nesting 1 2 Research and implementations

Slide 21

Slide 21 text

@gregoirehebert @gheb_dev ➔ https://datatracker.ietf.org/doc/html/draft-snell-http-batch-01 ➔ Send everything in one endpoint 1 2 Research and implementations IETF, is usually a good place to start, and James M. Snell (an IBM engineer who also worked on PATCH, prefer header, or Json-Merge-Patch) proposed a project to handle batch requests.

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

@gregoirehebert @gheb_dev ➔ https://datatracker.ietf.org/doc/html/draft-snell-http-batch-01 1 2 Research and implementations ➔ Send everything in one endpoint ➔ Uses the multipart/http Content-Type header. ➔ Each subquery is described in the payload, separated by a delimiting string. ➔ Each subquery is identified by a Content-ID to associate the responses. ➔ We cannot assume the execution order of each subquery.

Slide 26

Slide 26 text

@gregoirehebert @gheb_dev ➔ https://datatracker.ietf.org/doc/html/draft-snell-http-batch-01 ➔ Send everything in one endpoint ➔ Uses the multipart/http Content-Type header. ➔ Each subquery is described in the payload, separated by a delimiting string. ➔ Each subquery is identified by a Content-ID to associate the responses. ➔ We cannot assume the execution order of each subquery. ➔ The main query cannot be cached, but the subqueries can be. 1 2 Research and implementations In addition, each sub-query has its own authentication and authorisation handling. A batch query should not inherit the authentication/authorisation of the multipart query that contains it or the other individual queries. Just because a user is authorised to submit a batch query does not mean they are authorised to submit each of the individual queries. This is a really good start, but there is room for improvement. Let's dive into this Content-Type multipart.

Slide 27

Slide 27 text

@gregoirehebert @gheb_dev ➔ Multipart 1 2 Research and implementations A typical Content-Type multipart header field might look like this: This indicates that the entity consists of multiple parts, each with a syntactically identical structure of an RFC 822 message, except that the startline and header field part may be completely empty and the parts are each preceded by the line we see below. This string can be any string.

Slide 28

Slide 28 text

@gregoirehebert @gheb_dev ➔ Multipart 1 2 Research and implementations NEXT DETAIL

Slide 29

Slide 29 text

@gregoirehebert @gheb_dev ➔ Multipart 1 2 Research and implementations

Slide 30

Slide 30 text

@gregoirehebert @gheb_dev ➔ Multipart ➔ Multipart/mixed 1 2 Research and implementations ➔ can contain anything

Slide 31

Slide 31 text

@gregoirehebert @gheb_dev ➔ Multipart ➔ Multipart/mixed ➔ Multipart/alternative 1 2 Research and implementations ➔ used for email usually

Slide 32

Slide 32 text

@gregoirehebert @gheb_dev ➔ Multipart ➔ Multipart/mixed ➔ Multipart/alternative ➔ Multipart/digest 1 2 Research and implementations ➔ variation of mixed containing multiple messages (usually for emails when there are messages from different people)

Slide 33

Slide 33 text

@gregoirehebert @gheb_dev ➔ Multipart ➔ Multipart/mixed ➔ Multipart/alternative ➔ Multipart/digest ➔ Multipart/parallel 1 2 Research and implementations ➔ the parts are intended to be presented in parallel for hardware and software that are capable of doing so.

Slide 34

Slide 34 text

@gregoirehebert @gheb_dev ➔ Multipart ➔ Multipart/mixed ➔ Multipart/alternative ➔ Multipart/digest ➔ Multipart/parallel ➔ A sub message can be Multipart itself. 1 2 Research and implementations Now that we know more, let's see the others, so I looked at what Google was doing, and it's quite similar to what the IETF draft proposes. There is also Odata from Microsoft, which is quite similar, with some nice additions and some that are less pain free.

Slide 35

Slide 35 text

@gregoirehebert @gheb_dev ➔ Odata 1 2 Research and implementations ➔ Dependencies between queries

Slide 36

Slide 36 text

@gregoirehebert @gheb_dev ➔ Odata 1 2 Research and implementations ➔ Dependencies between queries ➔ References in the URL of subsequent queries.

Slide 37

Slide 37 text

Slide 38

Slide 38 text

@gregoirehebert @gheb_dev ➔ Odata 1 2 Research and implementations ➔ Dependencies between queries ➔ References in the URL of subsequent queries. ➔ Values of a response body in the query part of the url or in the body of subsequent queries. ➔ Individual processing in the order of reception.

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Slide 41

Slide 41 text

@gregoirehebert @gheb_dev ➔ Odata ➔ Dependencies between queries ➔ References in the URL of subsequent queries. ➔ Values of a response body in the query part of the url or in the body of subsequent queries. ➔ Individual processing in the order of reception. ➔ Stop at the first error, unless the continue-on-error preference is specified. ➔ Apply all, or nothing. ➔ Odata has been standardized by OASIS and approved as an international ISO/IEC standard. 1 2 Research and implementations Odata offers a more complete solution but also more complex while having the same foundations as the previous ones. It is therefore a good base. So for API Platform, I chose to implement batch following the Odata specification.

Slide 42

Slide 42 text

@gregoirehebert @gheb_dev ➔ Useful knowledge about PHP 1 2 Research and implementations We need to be able to parse the HTTP request, and extract every part of it. Everything that has been sent must be extracted.

Slide 43

Slide 43 text

@gregoirehebert @gheb_dev ➔ Useful knowledge about PHP 1 2 Research and implementations Unfortunately, there is no API offered by the language to get the original HTTP request in our hands. Usually, everything has been processed by FCGI to be accessible from the global variables. Often, PHP is executed by CGI, in a more modern way with FastCGI itself served by FPM FastCgI Process Manager.

Slide 44

Slide 44 text

@gregoirehebert @gheb_dev 1 2 Research and implementations ➔ Useful knowledge about PHP the HTTP Request is split into specific bits, in and out.

Slide 45

Slide 45 text

@gregoirehebert @gheb_dev 1 2 Research and implementations ➔ Useful knowledge about PHP And fastCGI chops the HTTP message before sending it to PHP. Here, we have to use all that PHP offers, but also parse the body of the request, then it into new sub-requests. Because a multipart/mixed HTTP request can have recursive subparts, and because one of them can theoretically contain files and thus increase memory substantially, we can't just take the whole body and parse it without risking memory problems. We have to deal with the submitted data through a stream.

Slide 46

Slide 46 text

@gregoirehebert @gheb_dev 1 2 Research and implementations ➔ Useful knowledge about PHP All that we need comes from the standard input value.

Slide 47

Slide 47 text

@gregoirehebert @gheb_dev ➔ Wrappers & Stream ➔ ﬁle:// ➔ http:// ➔ ftp:// ➔ php:// ➔ zlib:// ➔ data:// ➔ glob:// ➔ phar:// ➔ php:// ➔ ssh2:// ➔ rar:// ➔ ogg:// ➔ expect:// 1 2 Research and implementations To grab the payload in PHP, we need to go through the available wrappers

Slide 48

Slide 48 text

Slide 49

Slide 49 text

@gregoirehebert @gheb_dev ➔ Wrappers & Stream ➔ file:// ➔ http:// ➔ ftp:// ➔ php:// ➔ zlib:// ➔ data:// ➔ glob:// ➔ phar:// ➔ ssh2:// ➔ rar:// ➔ ogg:// ➔ expect:// ➔ php://stdin ➔ php://stdout ➔ php://stderr ➔ php://input ➔ php://output ➔ php://fd ➔ php://memory ➔ php://temp ➔ php://filter 1 2 Research and implementations php://stdin, or even php://input is a read-only stream that allows you to read the raw data from the request body (that's ok too). Despite this, PHP is not really designed to handle multipart outside of files, and to my knowledge at this moment, I didn't the slightest idea of any HTTP parser written in PHP. It was time to scour the internet.

Slide 50

Slide 50 text

@gregoirehebert @gheb_dev ➔ Librairies 1 2 Research and implementations I found an 8 year old library, too simplistic and incomplete, and this one, a few stars... a bit scary BUT ! There are 2 known OSS contributors in it: Tobias Nyholm : Symfony Core team and Nuno Maduro: Engineer at @laravel - working on Laravel, Forge, and Vapor. Created @pestphp, php insights It might be worth taking a look. The package is used in bref, laravel, firebase, or phpleague. I happen to be able to use it but I also know that there is a will to reduce the number of dependencies in API Platform. Especially since a whole part could be part of the HttpFoundation and MIME components. So I started from scratch.

Slide 51

Slide 51 text

@gregoirehebert @gheb_dev ➔ Librairies 1 2 Research and implementations

Slide 52

Slide 52 text

@gregoirehebert @gheb_dev Sequential Operations With API Platform 3 Usages

Slide 53

Slide 53 text

@gregoirehebert @gheb_dev 1 2 3 Usages In API Platform, this is achieved by changing two things in your configuration. By doing this, you have a new endpoint accessible and visible in your OpenApi documentation.

Slide 54

Slide 54 text

@gregoirehebert @gheb_dev 1 2 3 Usages While Swagger UI allows you to sandbox your API and send requests from the interface, it's not really intended to be used for multi-party requests, because you don't have enough control over the headers. You can use it, but I don't recommend it.

Slide 55

Slide 55 text

@gregoirehebert @gheb_dev 1 2 3 Usages Let's create a traditional HTTP resource. The beautiful Greetings.

Slide 56

Slide 56 text

@gregoirehebert @gheb_dev 1 2 3 Usages

Slide 57

Slide 57 text

@gregoirehebert @gheb_dev 1 2 3 Usages

Slide 58

Slide 58 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Request Let's send our first HTTP batch request with a multipart mixed content type.

Slide 59

Slide 59 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Request this means that we will include HTTP sub-requests afterwards,

Slide 60

Slide 60 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Request

Slide 61

Slide 61 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Request

Slide 62

Slide 62 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Request Maintaining an HTTP Parser is not easy to do, but do you know what is?

Slide 63

Slide 63 text

@gregoirehebert @gheb_dev 1 2 3 Usages - JSON json ! It is very convenient, because it's easy to produce. And API Platform is already capable of dealing with it, But, on the other hand, there is a big disadvantage against HTTP/multipart, it is that you cannot include a batch in a batch. Not with json. At least not according to Odata Specification.

Slide 64

Slide 64 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Embedded requests We will retrieve a collection, which should be empty. Then we include a second batch in which we post a new greeting resource, and we retrieve the collection again, which should contain our newly created resource. NEXT, SELECTION THEN ZOOM IN

Slide 65

Slide 65 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Embedded requests

Slide 66

Slide 66 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Embedded requests

Slide 67

Slide 67 text

Slide 68

Slide 68 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Embedded requests

Slide 69

Slide 69 text

@gregoirehebert @gheb_dev 1 2 3 Usages - Embedded requests

Slide 70

Slide 70 text

@gregoirehebert @gheb_dev You will have noticed the following: ➔ Nested batch request support (HTTP format only) ➔ Each sub-operation has its own request in the Symfony stack. ➔ Each request has its own proﬁler ➔ We use an Operation Post API Platform, so we can use any available operation option to some extent. 1 2 3 Usages

Slide 71

Slide 71 text

@gregoirehebert @gheb_dev 1 2 3 Advanced Usages - continue-on-error By default, if one of the sub-requests fails, it cancels the execution of the following requests, and you will also have to cancel the previous valid operations. But by specifying a Prefer header with the continue-on-error attribute, all sub-requests will be processed regardless of previous successes or failures.

Slide 72

Slide 72 text

@gregoirehebert @gheb_dev 1 2 3 Advanced Usages - continue-on-error

Slide 73

Slide 73 text

@gregoirehebert @gheb_dev 1 2 3 Advanced Usages - reference Each individual request in a batch request MUST have an id in a Content-ID header. The request id is case sensitive, MUST be unique within the batch request, and must be a positive integer. Entities created by a POST request can be referenced in the URL of subsequent requests by using the $-prefixed request id as the first segment of the request path.

Slide 74

Slide 74 text

@gregoirehebert @gheb_dev 1 2 3 Advanced Usages - reference In the JSON format, it is possible to define a dependency condition on previous requests. If one or more of the previous individual requests are invalid, the current and subsequent individual requests are not processed and a status code 424 (invalid dependency) is returned. Unless you have set the continue-on-error preference. Here, the second request has been defined as dependent on request "1". But we have marked the first one as "3". The request will fail. And it is not possible to reference future requests.

Slide 75

Slide 75 text

4 Alternatives @gregoirehebert @gheb_dev Alternatives with API Platform ?

Slide 76

Slide 76 text

1 2 3 4 Alternatives @gregoirehebert @gheb_dev ➔ GET only

Slide 77

Slide 77 text

1 2 3 4 Alternatives @gregoirehebert @gheb_dev ➔ PUT, POST ➔ Embedded ressources + Serialization groups ➔ Custom Processor / Custom resources

Slide 78

Slide 78 text

1 2 3 4 Alternatives @gregoirehebert @gheb_dev ➔ RPC ➔ RPC EndPoint ➔ GraphQL EndPoint + mutations

Slide 79

Slide 79 text

5 What's next? @gregoirehebert @gheb_dev Is the code available ? not yet

Slide 80

Slide 80 text

1 2 3 4 5 What's next @gregoirehebert @gheb_dev ❏ Atomicity check in a request group ❏ Protection against forbidden header passing related to authorization and authentication ❏ Header Parser lacks comma separation management ❏ Use of an optional application/http MIME ❏ Adding conﬁguration keys to the Batch operation ❏ Functional tests (50%) ❏ Unit tests (0%) ❏ Documentation (70%) ➔ Dev side that needs to be done I still have this amount of work to do before being satisfied, and comfortable sharing it publicly. I do this during my free time, so if this is a feature you're interested in, please contact me, we can organise a partnership to accelerate things :)

Slide 81

Slide 81 text

@gregoirehebert @gheb_dev ❏ Complete the draft of J. Snell and propose a modern alternative to Odata in HTTP without the complexities and speciﬁcities of Odata. ❏ Implement Odata in API Platform? ➔ Odata vs HTTP 1 2 3 4 5 What's next While Odata is a standard, some rules have been designed for specific reasons that are not tied to HTTP requirements and it is proprietary. In addition to this, for most teams classic RESTful API are designed with OpenAPI documentation, specified with RDF formats and Ontology Web Languages. Furthermore, Odata, expect the API to be exposed with their documentation. Difficult to tell a team that the need a rewrite their documentation automation when nothing is wrong with their technology choices because they're lacking definitions. Sometimes in these situations, pursuing the previous work is also the right move while continuing pushing toward the novelties of http/2 and 3. My next goal is to specify HTTP batch request and continuing the work of J. Snell

Slide 82

Slide 82 text

Merci