What is Statistical Reinforcement Learning
In an unknown environment (e.g., in a maze), a computer agent (e.g., a robot) takes an action (e.g., to walk) based on its own control policy. Then its state is updated (e.g., by moving forward) and evaluation of that action is given as a "reward" (e.g., praise, neutral, or scolding). Through such interaction with the environment, the agent is trained to achieve a certain task (e.g., getting out of the maze) without explicit guidance. A crucial advantage of reinforcement learning is its non-greedy nature. That is, the agent is trained not to improve performance in a short term (e.g., greedily approaching an exit of the maze), but to optimize the long-term achievement (e.g., successfully getting out of the maze).
A reinforcement learning problem contains various technical components such as states, actions, transitions, rewards, policies, and values.
Let us consider a maze problem, where a robot agent is located in a maze and we want to guide him to the goal without explicit supervision about which direction to go. States are positions in the maze which the robot agent can visit. In this example, actions are possible directions along which the robot agent can move [more ... ].
In an unknown environment (e.g., in a maze), a computer agent (e.g., a robot) takes an action (e.g., to walk) based on its own control policy. Then its state is updated (e.g., by moving forward) and evaluation of that action is given as a "reward" (e.g., praise, neutral, or scolding). Through such interaction with the environment, the agent is trained to achieve a certain task (e.g., getting out of the maze) without explicit guidance. A crucial advantage of reinforcement learning is its non-greedy nature. That is, the agent is trained not to improve performance in a short term (e.g., greedily approaching an exit of the maze), but to optimize the long-term achievement (e.g., successfully getting out of the maze).
A reinforcement learning problem contains various technical components such as states, actions, transitions, rewards, policies, and values.
Let us consider a maze problem, where a robot agent is located in a maze and we want to guide him to the goal without explicit supervision about which direction to go. States are positions in the maze which the robot agent can visit. In this example, actions are possible directions along which the robot agent can move [more ... ].
No comments:
Post a Comment