Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLM Agent - Part 2

Couger
December 24, 2024
24

LLM Agent - Part 2

In real-world applications such as recommendation systems, an important aspect of LLM-driven agents is their ability to take actions that adapt to users' preferences. In this talk, we will briefly introduce the basic concepts of reinforcement learning (RL) and two widely used policy optimization algorithms, PPO and DPO. Finally, through a demonstration of a recommendation agent, we will show how RL can enable agents to provide more user-adaptive responses.

Couger

December 24, 2024
Tweet

Transcript

  1. Reinforcement Learning (RL) • An optimal control concerned with how

    an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward.
  2. Policy in RL • Policy is the mapping between observations

    and actions. • Policy directly affects the behaviors of your model. • The purpose of RL is to optimize the policy with rewards on “correct behaviors”.
  3. Proximal Policy Optimization (PPO) • Find the best path to

    climb a mountain. Line Search Trust Region
  4. RL and LLM Agent • RL is originally from agent

    system, to optimize action selection. • RL is perfect for building “customization” and “personalization”. • Use cases • Personalized tutorial (each student’s learning style) • Customer support (tones, feedback) • Game AI (diverse NPC behaviors) • Recommendation system (user preferences)
  5. A recommendation sample • Use Amazon review dataset to construct

    the environment • Build recommendation agent with certain actions • Demonstrate the strength of such llm agents • RL to be implemented... • Causal to be implemented...
  6. Identify shopping patterns • Categories: What kind of items is

    the user interested in? • Rating trends: Why does the user give a low rating? • Price: How does the user feel about price and quality trade-off? ...
  7. Analysing user information • From the reviews left by the

    user, we can have some hints on • personality: practical? clear? mocking? • empathy: does the user by a gift? experienced loss? • attitude: straightforward? detail-oriented? • frequency: why does the user by certain items frequently?