Playing A.I. Takes on the World
OpenAI’s game-playing program, called OpenAI Five, develops its artificial intuition through a form of A.I. called reinforcement learning. It’s a bit like training a dog. The system has five identical algorithms, one for each player, and at first they behave almost randomly, running around and dying. But the system plays many games against a copy of itself, and when the players do something good — killing an enemy or earning gold — they’re rewarded. (Each of the five bots plays 180 years of Dota 2
OpenAI programmers give the bots some guidance. They decide which achievements should be rewarded, as a nudge toward winning games. They also script certain tactics, like the order in which each hero purchases magical items. But they are not Dota 2 experts. Brockman shied from even calling his team an amateur coach. More like proud parents cheering on their progeny? “Right, that’s what we are,” he says. “I like that.”
The program sees the game not as a screen, but as a list of numbers describing the state of play — hero positions, health, number of towers remaining, and so on. Imagine the A.I.’s task: “You have a giant list of numbers come in — 20,000 numbers — and now figure out what to do. And no one told you that there are elves or buildings or anything like that,” Brockman says.
And yet it has steadily improved. Last year at the International, a simpler version of the program defeated top human pros with one hero on each team. Then it began playing five against five, and over the following months, it beat better and better teams. At this year’s tournament, OpenAI was not eligible for prize money and would not be in the brackets, but it would play best-of-three exhibition games on Wednesday, Thursday, and, if need be, Friday.
“Dota has an audience of screaming, belligerent fans watching five dudes trying to take down a computer program,” says Trevor Plint, a fan from Toronto, Ontario. He compared “the spectacle of that, the Romangladiatorial coliseum moment, versus the AlphaGo moment” — when DeepMind’s program beat a human champion atGo, in 2016 — “which was this little South Korean dude sitting at a table in a little room with another guy moving stones around. It’s such a contrast.”
“We’re having a very human response to this A.I.”
The first game, on Wednesday, was against paiN Gaming, a pro team from Brazil who had been eliminated from the tournament during the group stage. “We were very scared, to be honest,” Heitor “Duster” Pereira, a quiet 18-year-old with long black and pink hair, told me the next day in the team’s catered suite overlooking the arena. “I didn’t have high hopes.” They entered a soundproof glass booth containing five computers on the center stage. The booth facing them sat empty.
“We had the strategy of splitting up the map and farming” — collecting resources — “but for the first 20 minutes, we kept fighting with them and everything went wrong,” Pereira says, “because it’s very hard to win a fight against them.” Observers have universally praised OpenAI Five for its team fights. The bots’ reaction time is not much faster than that of humans, about a fifth of a second, but in that time, it processes all of those 20,000 numbers. It knows the exact health and strength and distance of everyone around it. What’s more, the bots receive information on what’s around their teammates, and each bot broadcasts part of what it’s thinking to the other bots (imagine viewing both your monitor and your teammates’, plus having a telepathic link). And the bots are identical. So even without explicit play calling, they know what the others are up to and are familiar with their moves. “I think they play as a unit, and that’s very beautiful to see in Dota, because they are always helping each other,” Pereira says. “It’s pretty exciting to watch.” Half an hour in, the bots had 30 kills versus the humans’ 21.
PaiN had to adjust in other ways. For example, the bots don’t bluff. “When they jump on, you they already know they’re going to kill you,” Pereira says. “So it makes things a little bit weird. You have to run. If you attack back, you’re not winning the fight.” The bots also don’t have egos — they’ve been trained to value their teammates’ performance as much as their own. “If someone needs to die so the others can be alive, they do it,” Pereira says. “They don’t have feelings; they don’t have emotions. So that can be very helpful in crucial games. For sure that helped them a lot against us. It caught us off-guard.”
And yet the bots made obvious errors. Near the center of the map is a monster called Roshan, who drops a valuable item when killed and respawns a few minutes later. The bots frequently returned to his pit and loitered well before he could possibly return, providing several moments of comic relief. When Roshan did drop his item, OpenAI Five would let a support hero pick it up, an unusual move. Another oddity: Each character has an “ult,” or ultimate ability, that’s best used for fatal blows. Instead, OpenAI used them willy-nilly. “This is what you’re losing to right now,” said William “Blitz” Lee, one of the gamecasters in the arena, in a playful jab at paiN.
Eventually paiN found their groove. “We started split-pushing” — dividing and conquering towers — “and just farming, and the bots got confused and started using their ultimates to farm as well. They didn’t know what to do, and that’s pretty much it.” Players could type messages, which were displayed on the giant game screen overhead. In a note, Pereira needled the gamecasters — Lee and Austin “Capitalist” Walsh — both of whom had been on team human against OpenAI Five in the warmup match two weeks prior: “Team human lost to this.” Walsh replied, “Hey Duster, chill, chill. I thought we were cool, homie.” Pereira couldn’t hear the casters in his soundproof booth, but he felt vibrations from the crowd’s laughter.
This time team human won, after 52 minutes.
OpenAI hadn’t fared too poorly, but they still had work to do, and some of them pulled a late night preparing for the next day’s game.
Pondé says they were careful not to meddle too much. Given OpenAI Five’s extensive training, they didn’t want to overcorrect in response to one game. For that night’s training, however, they reduced the rewards the bots received for minor goals relative to the reward for winning, hoping to avoid more lingering near the Roshan pit.
On Thursday, they faced a team of semiretired Chinese superstar players, some of whom had won the tournament in previous years. The bots continued to show their strength at fighting and demonstrated subtle tactical decisions. But they also made the same unforced errors. At one point, they placed five wards (the security cameras) next to each other. “Look at all these wards around the base,” one announcer said to the other. “That is a broken team, Bruno.” (Two days later, watching the human grand final match from the press box, Brockman gestured at a lone ward on the giant screen. “I don’t understand why there are so few wards there,” he said dryly. “Shouldn’t there be a big stack of them?”) Again, the humans won, after 46 minutes. There would be no third game.