ai.stackexchange.com
Artificial Intelligence Stack Exchange
https://ai.stackexchange.com › questions › 34744 › what-is-the-difference-between-a-greedy-policy-and-an-optimal-policy
What is the difference between the $\\epsilon$-greedy and softmax policies?
The $\epsilon$-greedy policy is a policy that chooses the best action (i.e. the action associated with the highest value) with probability $1-\epsilon \in [0, 1]$ and a random action with probability $\epsilon $.The problem with $\epsilon$-greedy is that, when it chooses the random actions (i.e. with probability $\epsilon$), it chooses them uniformly (i.e. it considers all actions equally good ...
machine learning - Greedy policy definition - Cross Validated
Your professor's notes are a more general and formal way of expressing exactly the same idea as your first sentence. One possible difference is that you may be thinking in terms of a deterministic policy:
reinforcement learning - Policy Improvement Theorem - Cross Validated
In reinforcement learning, policy improvement is a part of an algorithm called policy iteration, which attempts to find approximate solutions to the Bellman optimality equations. Page-84, 85 in Sutton and Barto's book on RL mentions the following theorem: Policy Improvement Theorem. Given two deterministic policies $\pi$ and $\pi'$: