LLM Agent - Part 2

LLM Agent - Part 2 Couger AI

Previously ...

Reinforcement Learning (RL) • An optimal control concerned with how
an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward.

LLM and RL

LLM and RL • InstructGPT

Policy in RL • Policy is the mapping between observations
and actions. • Policy directly affects the behaviors of your model. • The purpose of RL is to optimize the policy with rewards on “correct behaviors”.

Proximal Policy Optimization (PPO) • Find the best path to
climb a mountain. Line Search Trust Region

And for Implementation, you only need to ....

Direct Preference Optimization (DPO) • Directly optimize the model over
preference

RL and LLM Agent • RL is originally from agent
system, to optimize action selection. • RL is perfect for building “customization” and “personalization”. • Use cases • Personalized tutorial (each student’s learning style) • Customer support (tones, feedback) • Game AI (diverse NPC behaviors) • Recommendation system (user preferences)

A recommendation sample • Use Amazon review dataset to construct
the environment • Build recommendation agent with certain actions • Demonstrate the strength of such llm agents • RL to be implemented... • Causal to be implemented...

A recommendation sample

Identify shopping patterns • Categories: What kind of items is
the user interested in? • Rating trends: Why does the user give a low rating? • Price: How does the user feel about price and quality trade-off? ...

Analysing user information • From the reviews left by the
user, we can have some hints on • personality: practical? clear? mocking? • empathy: does the user by a gift? experienced loss? • attitude: straightforward? detail-oriented? • frequency: why does the user by certain items frequently?

Analysing recommendation features • Get a report of user preferences.
• price • rating • categories • ...

Give recommendations • category based • give a reson •
estimate the expected rating

Implementations • Example prompts for reasoning, planning and actions.

LLM Agent - Part 2

LLM Agent - Part 2

Couger

More Decks by Couger

Featured

Transcript

LLM Agent - Part 2 Couger AI

Previously ...

Reinforcement Learning (RL) • An optimal control concerned with how

LLM and RL

LLM and RL • InstructGPT

Policy in RL • Policy is the mapping between observations

Proximal Policy Optimization (PPO) • Find the best path to

And for Implementation, you only need to ....

Direct Preference Optimization (DPO) • Directly optimize the model over

RL and LLM Agent • RL is originally from agent

A recommendation sample • Use Amazon review dataset to construct

A recommendation sample

Identify shopping patterns • Categories: What kind of items is

Analysing user information • From the reviews left by the

Analysing recommendation features • Get a report of user preferences.

Give recommendations • category based • give a reson •

Implementations • Example prompts for reasoning, planning and actions.