Ray in 2023: Ray in Reflection

December 07, 2023

The quick recap and reflection of how Ray has progressed and its pivotal role in the LLM stack landscape
and Generative AI domain.


  1. 12x 50% 40% 10x 5x 30% Why Ray? faster cheaper

    cheaper cheaper faster cheaper
  2. As AI capabilities have grown, so have the challenges Scale

    Future readiness Cost These are the challenges Ray was built for
  3. Anyscale Endpoints - fine-tuning Llama-2-7B GPT-4 fine-tuned 86% 3% 78%

    Superior task-specific performance at 1/300th the cost of GPT-4!
  4. Spark SageMaker $0 $20 $40 $60 $3.5 $7.3 $57 AWS

    Cost to process 1M images $2.5 Batch inference - costs
  5. Anyscale Endpoints Cost efficient LLM inference Anyscale Endpoints Single GPU

    optimizations Multi-GPU modeling Inference server Autoscaling Multi-region, multi-cloud $1 / million tokens (Llama-2 70B)