When I was in graduate school in the 1990s, one of my favorite classes was neural networks. Back then, we didn’t have access to TensorFlow, PyTorch, or Keras; we programmed neurons, neural networks, and learning algorithms by hand with the formulas from textbooks. We didn’t have access to cloud computing, and we coded sequential experiments that often ran overnight. There weren’t platforms like Alteryx, Dataiku, SageMaker, or SAS to enable a machine learning proof of concept or manage the end-to-end MLops lifecycles.
I was most interested in reinforcement learning algorithms, and I recall writing hundreds of reward functions to stabilise an inverted pendulum. I never got it working and was never sure whether I coded the algorithms incorrectly, chose less-optimal reward functions, or selected imperfect learning parameters. But today, I can find examples of reinforcement learning applied to the inverted pendulum problem and even the schematics to build one.
Reinforcement learning explained
Reinforcement learning is a teaching algorithm. A subject operates in an environment with a current state and actions that it can perform. In this case, the subject is an inverted pendulum placed on a cart that can move left or right in a straight line. The position and velocity of the pendulum and the cart holding the pendulum represent the state. The cart can move in only one dimension, either left or right, to balance the pendulum.
Instead of programming the cart’s action with a bunch of rules, the cart is given a reward function to score the outcomes based on its actions. As the cart moves, the reward function computes a score, and higher scores are given when the pendulum is upright. A reinforcement learning algorithm uses the reward function to tune a neural network based on the function’s scores.
The initial trials will fail, as the pendulum keeps falling. However, with enough attempts, a well-chosen reward function, and optimally selected tuning parameters, the algorithm learns the correct actions to control the cart and balance the pendulum.
Many articles are available to guide you further on the basics of reinforcement learning. You can read overviews of reinforcement learning, learn the basics, jump into its math and algorithms, review research papers, or discover real-world applications.
Getting into more details or experiments will require selecting a programming language, choosing a framework, picking tools, and configuring a cloud environment. I confess that this is an undertaking, so I went looking for opportunities to learn without getting my hands too dirty.
Here’s what I found:
1. Combine work and play with AWS DeepRacer
AWS introduced DeepRacer in November 2018 as the “fastest way to get rolling with machine learning.” In December 2020, they had more than 10,000 competitors and a grand prize that included US$10,000 of AWS promotional credits.
Don’t let the competition scare you away, because DeepRacer is a superb learning tool. Your objective is to train the racer to navigate autonomously around a selected racetrack.
When you sign up for DeepRacer, you get access to a simulator where you can select a track, code a reward function, and adjust tuning parameters. There is a default reward function with tuning parameters to start training your racer and evaluating its performance. From there, you’re off to the races to improve your models and tune the algorithms.
You have more than 20 tracks to choose from and can select from simple time trials to head-to-head racing. You can also purchase a physical DeepRacer, load it with your algorithms, and design tracks to run competitive races.
It didn’t take me long to figure out ways to improve the provided reward function. The basic function scores how far the DeepRacer is from the center of the track, with the highest scores when the racer is on the centerline. I improved the algorithm by factoring in the racer’s steering angles, giving it a higher reward when it was steering toward the centreline.
I felt pretty good that with only my second model and 10 minutes of training, my DeepRacer made it around 26 per cent of the track. Of course, my simple model doesn’t work when you factor in obstacles and other racers. You can go it alone to improve your DeepRacer’s performance, or you can learn from others’ code libraries and racing experiences.
2. Be inspired by recent accomplishments
It isn’t difficult to find real-world examples of business, academic, and government organisations experimenting and succeeding with reinforcement learning. Consider these recent headlines:
- A robot that plays curling, a sport where opponents alternate sliding stones across the ice onto a target to score points. You can watch this robot use a mix of strategy, computer vision, and motor skills to compete against the South Korean “Garlic Girls” curling team.
- Researchers at Binghamton University are applying reinforcement learning to advanced grid-forming photovoltaic inverter control technologies. They hope to develop ways to support higher amounts of solar power on the electric grid reliably.
- Learn from AI academic experts about how “reinforcement learning is the first computational theory of intelligence.” You can also review these top machine learning white papers released in 2020.
- The U.S. Army uses reinforcement learning to get vehicles in different parts of a battle area to work together.
- A grocery store recommendation engine is one use case of Microsoft Personalizer, a personalisation and recommendation engine that uses reinforcement learning.
3. Experiment with code examples
Before embarking on your reinforcement learning journey, you might want to check out coding examples or books, especially when applied to familiar problems. The following options are worth reviewing:
- Reinforcement learning is solving the Rubik’s Cube and using Deep Q-Learning to play Atari Breakout.
- This reinforcement learning code repository has examples from the book Reinforcement Learning: An Introduction. Other repos worth reviewing are Hands-on Reinforcement Learning with Python and Deep Reinforcement Learning Course with TensorFlow and PyTorch.
- These repos are curated: Awesome-rl, RLCode, Keras-rl, Udacity, and RL with TensorFlow.
Given how hard it is to teach and learn by example, reinforcement learning and other unsupervised learning techniques are areas of growth and opportunity. Even if you are a couple of steps behind in grasping machine learning techniques, understanding reinforcement learning is a chance to develop expertise while academics, industry, and government evolve the science and algorithms.