Slide 18
Slide 18 text
Throughput experiments: Data
β Hypothesis
β Continuous batching performs better the more variance there is in sequence lengths
β How to test?
β Generate 1000 prompts each with 512 input tokens
β Generate predetermined output length for each prompt, following an exponential distribution
β Configure model to ignore EOS token
β How to control variance in sequence lengths?
β Limit the random sequence lengths artificially
β E.g. to 32, 128, 512, and 1536 output tokens
β 4 experiments