Slide 1

Slide 1 text

LLM Agent - Part 2 Couger AI

Slide 2

Slide 2 text

Previously ...

Slide 3

Slide 3 text

Reinforcement Learning (RL) • An optimal control concerned with how an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward.

Slide 4

Slide 4 text

LLM and RL

Slide 5

Slide 5 text

LLM and RL • InstructGPT

Slide 6

Slide 6 text

Policy in RL • Policy is the mapping between observations and actions. • Policy directly affects the behaviors of your model. • The purpose of RL is to optimize the policy with rewards on “correct behaviors”.

Slide 7

Slide 7 text

Proximal Policy Optimization (PPO) • Find the best path to climb a mountain. Line Search Trust Region

Slide 8

Slide 8 text

And for Implementation, you only need to ....

Slide 9

Slide 9 text

Direct Preference Optimization (DPO) • Directly optimize the model over preference

Slide 10

Slide 10 text

RL and LLM Agent • RL is originally from agent system, to optimize action selection. • RL is perfect for building “customization” and “personalization”. • Use cases • Personalized tutorial (each student’s learning style) • Customer support (tones, feedback) • Game AI (diverse NPC behaviors) • Recommendation system (user preferences)

Slide 11

Slide 11 text

A recommendation sample • Use Amazon review dataset to construct the environment • Build recommendation agent with certain actions • Demonstrate the strength of such llm agents • RL to be implemented... • Causal to be implemented...

Slide 12

Slide 12 text

A recommendation sample

Slide 13

Slide 13 text

Identify shopping patterns • Categories: What kind of items is the user interested in? • Rating trends: Why does the user give a low rating? • Price: How does the user feel about price and quality trade-off? ...

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Analysing user information • From the reviews left by the user, we can have some hints on • personality: practical? clear? mocking? • empathy: does the user by a gift? experienced loss? • attitude: straightforward? detail-oriented? • frequency: why does the user by certain items frequently?

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Analysing recommendation features • Get a report of user preferences. • price • rating • categories • ...

Slide 18

Slide 18 text

Give recommendations • category based • give a reson • estimate the expected rating

Slide 19

Slide 19 text

Implementations • Example prompts for reasoning, planning and actions.