Along with unsupervised machine learning and supervised learning, another common form of AI creation is reinforcement learning. Beyond regular reinforcement learning, deep reinforcement learning can lead to astonishingly impressive results, thanks to the fact that it combines the best aspects of both deep learning and reinforcement learning. Lets take a look at precisely how deep reinforcement learning operates. Note that this article wont delve too deeply into the formulas used in deep reinforcement learning, rather it aims to give the reader a high level intution for how the process works.
Before we dive into deep reinforcement learning, it might be a good idea to refresh ourselves on how regular reinforcement learning works. In reinforcement learning, goal-oriented algorithms are designed through a process of trial and error, optimizing for the action that leads to the best result/the action that gains the most reward. When reinforcement learning algorithms are trained, they are given rewards or punishments that influence which actions they will take in the future. Algorithms try to find a set of actions that will provide the system with the most reward, balancing both immediate and future rewards.
Reinforcement learning algorithms are very powerful because they can be applied to almost any task, being able to flexibly and dynamically learn from an environment and discover possible actions.
Photo: Megajuice via Wikimedia Commons, CC 1.0 (https://commons.wikimedia.org/wiki/File:Reinforcement_learning_diagram.svg)
When it comes to deep reinforcement learning, the environment is typically represented with images. An image is a capture of the environment at a particular point in time. The agent must analyze the images and extract relevant information from them, using the information to inform which action they should take. Deep reinforcement learning is typically carried out with one of two different techniques: value-based learning and policy-based learning.
Value-based learning techniques make use of algorithms and architectures like convolutional neural networks and Deep-Q-Networks. These algorithms operate by converting the image to greyscale and cropping out unnecessary parts of the image. Afterward, the image undergoes various convolutions and pooling operations, extracting the most relevant portions of the image. The important parts of the image are then used to calculate the Q-value for the different actions the agent can take. Q-values are used to determine the best course of action for the agent. After the initial Q-values are calculated, backpropagation is carried out in order that the most accurate Q-values can be determined.
Policy-based methods are used when the number of possible actions that the agent can take is extremely high, which is typically the case in real-world scenarios. Situations like these require a different approach because calculating the Q-values for all the individual actions isnt pragmatic. Policy-based approaches operate without calculating function values for individual actions. Instead, they adopt policies by learning the policy directly, often through techniques called Policy Gradients.
Policy gradients operate by receiving a state and calculating probabilities for actions based on the agents prior experiences. The most probable action is then selected. This process is repeated until the end of the evaluation period and the rewards are given to the agent. After the rewards have been dealt with the agent, the networks parameters are updated with backpropagation.
Because Q-Learning is such a large part of the deep reinforcement learning process, lets take some time to really understand how the Q-learning system works.
The Markov Decision Process
A markov decision process. Photo: waldoalvarez via Pixabay, Pixbay License (https://commons.wikimedia.org/wiki/File:Markov_Decision_Process.svg)
In order for an AI agent to carry out a series of tasks and reach a goal, the agent must be able to deal with a sequence of states and events. The agent will begin at one state and it must take a series of actions to reach an end state, and there can be a massive number of states existing between the beginning and end states. Storing information regarding every state is impractical or impossible, so the system must find a way to preserve just the most relevant state information. This is accomplished through the use of a Markov Decision Process, which preserves just the information regarding the current state and the previous state. Every state follows a Markov property, which tracks how the agent change from the previous state to the current state.
Deep Q-Learning
Once the model has access to information about the states of the learning environment, Q-values can be calculated. The Q-values are the total reward given to the agent at the end of a sequence of actions.
The Q-values are calculated with a series of rewards. There is an immediate reward, calculated at the current state and depending on the current action. The Q-value for the subsequent state is also calculated, along with the Q-value for the state after that, and so on until all the Q-values for the different states have been calculated. There is also a Gamma parameter that is used to control how much weight future rewards have on the agents actions. Policies are typically calculated by randomly initializing Q-values and letting the model converge toward the optimal Q-values over the course of training.
Deep Q-Networks
One of the fundamental problems involving the use of Q-learning for reinforcement learning is that the amount of memory required to store data rapidly expands as the number of states increases. Deep Q Networks solve this problem by combining neural network models with Q-values, enabling an agent to learn from experience and make reasonable guesses about the best actions to take. With deep Q-learning, the Q-value functions are estimated with neural networks. The neural network takes the state in as the input data, and the network outputs Q-value for all the different possible actions the agent might take.
Deep Q-learning is accomplished by storing all the past experiences in memory, calculating maximum outputs for the Q-network, and then using a loss function to calculate the difference between current values and the theoretical highest possible values.
Deep Reinforcement Learning vs Deep Learning
One important difference between deep reinforcement learning and regular deep learning is that in the case of the former the inputs are constantly changing, which isnt the case in traditional deep learning. How can the learning model account for inputs and outputs that are constantly shifting?
Essentially, to account for the divergence between predicted values and target values, two neural networks can be used instead of one. One network estimates the target values, while the other network is responsible for the predictions. The parameters of the target network are updated as the model learns, after a chosen number of training iterations have passed. The outputs of the respective networks are then joined together to determine the difference.
Policy-based learning approaches operate differently than Q-value based approaches. While Q-value approaches create a value function that predicts rewards for states and actions, policy-based methods determine a policy that will map states to actions. In other words, the policy function that selects for actions is directly optimized without regard to the value function.
Policy Gradients
A policy for deep reinforcement learning falls into one of two categories: stochastic or deterministic. A deterministic policy is one where states are mapped to actions, meaning that when the policy is given information about a state an action is returned. Meanwhile, stochastic policies return a probability distribution for actions instead of a single, discrete action.
Deterministic policies are used when there is no uncertainty about the outcomes of the actions that can be taken. In other words, when the environment itself is deterministic. In contrast, stochastic policy outputs are appropriate for environments where the outcome of actions is uncertain. Typically, reinforcement learning scenarios involve some degree of uncertainty so stochastic policies are used.
Policy gradient approaches have a few advantages over Q-learning approaches, as well as some disadvantages. In terms of advantages, policy-based methods converge on optimal parameters quicker and more reliably. The policy gradient can just be followed until the best parameters are determined, whereas with value-based methods small changes in estimated action values can lead to large changes in actions and their associated parameters.
Policy gradients work better for high dimensional action spaces as well. When there is an extremely high number of possible actions to take, deep Q-learning becomes impractical because it must assign a score to every possible action for all time steps, which may be impossible computationally. However, with policy-based methods, the parameters are adjusted over time and the number of possible best parameters quickly shrinks as the model converges.
Policy gradients are also capable of implementing stochastic policies, unlike value-based policies. Because stochastic policies produce a probability distribution, an exploration/exploitation trade-off does not need to be implemented.
In terms of disadvantages, the main disadvantage of policy gradients is that they can get stuck while searching for optimal parameters, focusing only on a narrow, local set of optimum values instead of the global optimum values.
Policy Score Function
The policies used to optimize a models performance aim to maximize a score function J(). If J() is a measure of how good our policy is for achieving the desired goal, we can find the values of that gives us the best policy. First, we need to calculate an expected policy reward. We estimate the policy reward so we have an objective, something to optimize towards. The Policy Score Function is how we calculate the expected policy reward, and there are different Policy Score Functions that are commonly used, such as: start values for episodic environments, the average value for continuous environments, and the average reward per time step.
Policy Gradient Ascent
Gradient ascent aims to move the parameters until they are at the place where the score is highest. Photo: Public Domain (https://commons.wikimedia.org/wiki/File:Gradient_ascent_(surface).png)
After the desired Policy Score Function is used, and an expected policy reward calculated, we can find a value for the parameter which maximizes the score function. In order to maximize the score function J(), a technique called gradient ascent is used. Gradient ascent is similar in concept to gradient descent in deep learning, but we are optimizing for the steepest increase instead of decrease. This is because our score is not error, like in many deep learning problems. Our score is something we want to maximize. An expression called the Policy Gradient Theorem is used to estimate the gradient with respect to policy .
In summary, deep reinforcement learning combines aspects of reinforcement learning and deep neural networks. Deep reinforcement learning is done with two different techniques: Deep Q-learning and policy gradients.
Deep Q-learning methods aim to predict which rewards will follow certain actions taken in a given state, while policy gradient approaches aim to optimize the action space, predicting the actions themselves. Policy-based approaches to deep reinforcement learning are either deterministic or stochastic in nature. Deterministic policies map states directly to actions while stochastic policies produce probability distributions for actions.
See the article here:
AI Used to Monitor Health of Coral Reefs and Detect Ocean Trash Pollution - Unite.AI
- AI File Extension - Open . AI Files - FileInfo [Last Updated On: June 14th, 2016] [Originally Added On: June 14th, 2016]
- Ai | Define Ai at Dictionary.com [Last Updated On: June 16th, 2016] [Originally Added On: June 16th, 2016]
- ai - Wiktionary [Last Updated On: June 22nd, 2016] [Originally Added On: June 22nd, 2016]
- Adobe Illustrator Artwork - Wikipedia, the free encyclopedia [Last Updated On: June 25th, 2016] [Originally Added On: June 25th, 2016]
- AI File - What is it and how do I open it? [Last Updated On: June 29th, 2016] [Originally Added On: June 29th, 2016]
- Ai - Definition and Meaning, Bible Dictionary [Last Updated On: July 25th, 2016] [Originally Added On: July 25th, 2016]
- ai - Dizionario italiano-inglese WordReference [Last Updated On: July 25th, 2016] [Originally Added On: July 25th, 2016]
- Bible Map: Ai [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- Ai dictionary definition | ai defined - YourDictionary [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- Ai (poet) - Wikipedia, the free encyclopedia [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- AI file extension - Open, view and convert .ai files [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- History of artificial intelligence - Wikipedia, the free ... [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- Artificial intelligence (video games) - Wikipedia, the free ... [Last Updated On: August 30th, 2016] [Originally Added On: August 30th, 2016]
- North Carolina Chapter of the Appraisal Institute [Last Updated On: September 8th, 2016] [Originally Added On: September 8th, 2016]
- Ai Weiwei - Wikipedia, the free encyclopedia [Last Updated On: September 11th, 2016] [Originally Added On: September 11th, 2016]
- Adobe Illustrator Artwork - Wikipedia [Last Updated On: November 17th, 2016] [Originally Added On: November 17th, 2016]
- 5 everyday products and services ripe for AI domination - VentureBeat [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Realdoll builds artificially intelligent sex robots with programmable personalities - Fox News [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- ZeroStack Launches AI Suite for Self-Driving Clouds - Yahoo Finance [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- AI and the Ghost in the Machine - Hackaday [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Why Google, Ideo, And IBM Are Betting On AI To Make Us Better Storytellers - Fast Company [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Roses are red, violets are blue. Thanks to this AI, someone'll fuck you. - The Next Web [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Wearable AI Detects Tone Of Conversation To Make It Navigable (And Nicer) For All - Forbes [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- Who Leads On AI: The CIO Or The CDO? - Forbes [Last Updated On: February 6th, 2017] [Originally Added On: February 6th, 2017]
- AI For Matching Images With Spoken Word Gets A Boost From MIT - Fast Company [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Teach undergrads ethics to ensure future AI is safe compsci boffins - The Register [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- AI is here to save your career, not destroy it - VentureBeat [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- A Heroic AI Will Let You Spy on Your Lawmakers' Every Word - WIRED [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- With a $16M Series A, Chorus.ai listens to your sales calls to help your team close deals - TechCrunch [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Microsoft AI's next leap forward: Helping you play video games - CNET [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Samsung Galaxy S8's Bixby AI could beat Google Assistant on this front - CNET [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- 3 common jobs AI will augment or displace - VentureBeat [Last Updated On: February 7th, 2017] [Originally Added On: February 7th, 2017]
- Stephen Hawking and Elon Musk endorse new AI code - Irish Times [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- SumUp co-founders are back with bookkeeping AI startup Zeitgold - TechCrunch [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Five Trends Business-Oriented AI Will Inspire - Forbes [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- AI Systems Are Learning to Communicate With Humans - Futurism [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Pinterest uses AI and your camera to recommend pins - Engadget [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Chinese Firms Racing to the Front of the AI Revolution - TOP500 News [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Real life CSI: Google's new AI system unscrambles pixelated faces - The Guardian [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- AI could transform the way governments deliver public services - The Guardian [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Amazon Is Humiliating Google & Apple In The AI Wars - Forbes [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- What's Still Missing From The AI Revolution - Co.Design (blog) [Last Updated On: February 9th, 2017] [Originally Added On: February 9th, 2017]
- Legaltech 2017: Announcements, AI, And The Future Of Law - Above the Law [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Can AI make Facebook more inclusive? - Christian Science Monitor [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- How a poker-playing AI could help prevent your next bout of the flu - ExtremeTech [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Dynatrace Drives Digital Innovation With AI Virtual Assistant - Forbes [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- AI and the end of truth - VentureBeat [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Taser bought two computer vision AI companies - Engadget [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Google's DeepMind pits AI against AI to see if they fight or cooperate - The Verge [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- The Coming AI Wars - Huffington Post [Last Updated On: February 10th, 2017] [Originally Added On: February 10th, 2017]
- Is President Trump a model for AI? - CIO [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Who will have the AI edge? - Bulletin of the Atomic Scientists [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- How an AI took down four world-class poker pros - Engadget [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- We Need a Plan for When AI Becomes Smarter Than Us - Futurism [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- See how old Amazon's AI thinks you are - The Verge [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Ford to invest $1 billion in autonomous vehicle tech firm Argo AI - Reuters [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Zero One: Are You Ready for AI? - MSPmentor [Last Updated On: February 11th, 2017] [Originally Added On: February 11th, 2017]
- Ford bets $1B on Argo AI: Why Silicon Valley and Detroit are teaming up - Christian Science Monitor [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- Google Test Of AI's Killer Instinct Shows We Should Be Very Careful - Gizmodo [Last Updated On: February 12th, 2017] [Originally Added On: February 12th, 2017]
- Google's New AI Has Learned to Become "Highly Aggressive" in Stressful Situations - ScienceAlert [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- An artificially intelligent pathologist bags India's biggest funding in healthcare AI - Tech in Asia [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- Ford pledges $1bn for AI start-up - BBC News [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- Dyson opens new Singapore tech center with focus on R&D in AI and software - TechCrunch [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- How to Keep Your AI From Turning Into a Racist Monster - WIRED [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- How Chinese Internet Giant Baidu Uses AI And Machine Learning - Forbes [Last Updated On: February 13th, 2017] [Originally Added On: February 13th, 2017]
- Humans engage AI in translation competition - The Stack [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Watch Drive.ai's self-driving car handle California city streets on a ... - TechCrunch [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Cryptographers Dismiss AI, Quantum Computing Threats - Threatpost [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Is AI making credit scores better, or more confusing? - American Banker [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI and Robotics Trends: Experts Predict - Datamation [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- IoT And AI: Improving Customer Satisfaction - Forbes [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI's Factions Get Feisty. But Really, They're All on the Same Team - WIRED [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Elon Musk: Humans must become cyborgs to avoid AI domination - The Independent [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Facebook Push Into Video Allows Time To Catch Up On AI Applications - Investor's Business Daily [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Defining AI, Machine Learning, and Deep Learning - insideHPC [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI Predicts Autism From Infant Brain Scans - IEEE Spectrum [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- The Rise of AI Makes Emotional Intelligence More Important - Harvard Business Review [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- Google's AI Learns Betrayal and "Aggressive" Actions Pay Off - Big Think [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- AI faces hype, skepticism at RSA cybersecurity show - PCWorld [Last Updated On: February 15th, 2017] [Originally Added On: February 15th, 2017]
- New AI Can Write and Rewrite Its Own Code to Increase Its Intelligence - Futurism [Last Updated On: February 17th, 2017] [Originally Added On: February 17th, 2017]