1. 程式人生 > >Tencent TStarBots Defeat StarCraft II’s Powerful Builtin AI in the Full Game

Tencent TStarBots Defeat StarCraft II’s Powerful Builtin AI in the Full Game

Deep Reinforcement Learning has shown impressive performance in a wide range of applications, including video games. StarCraft II, one of the most challenging Real Time Strategy (RTS) games, has however remained unsolved. Until now.

In a new paper, Tencent AI Lab introduces a couple of AI agents — TStarBot1 and TStarBot2 — which can defeat StarCraft II’s built-in AI in the full game.

The StarCraft series of games are regarded as a challenging platform for AI research due to their huge space and complicated game model. However, most RTS games’ builtin AI include searching algorithms or multi-agent algorithms which cannot be applied in the full game.

The StarCraft II Learning Environment is a test bench developed by DeepMind and Blizzard. AI agents can reach professional human player level in some mini-games within the environment, but cannot perform nearly as well in full games.

Tencent’s TStarBot1 and TStarBot2 are both able to defeat StarCraft II’s builtin AI agents from level 1 to level 10 in a full game. The paper notes that at levels 8 to 10 the builtin StarCraft AI are cheating agents with full vision of the entire game map and resource harvest boosting, making them very powerful opponents.

Tencent’s two bots have unique profiles. TStarBot1 is a reinforcement learning agent based on a set of flat macro actions. This architecture relieves the learning algorithms from the burden of directly dealing with a massive number of atomic operations, while preserving most of the critical decision flexibility of the full-game’s macro strategies.

TStarBot1: Overview of macro actions and reinforcement learning

TStarBot2 meanwhile was developed with hard-coded expert rules and a hierarchical macro-micro action. This hierarchy rules out irrelevant information while keeping each controller within its own observation and action space, and captures the action structure better — particularly its multiplicative expression power.

TStarBot2: Overview of macro-micro hierarchical actions

Researchers found that although TStarBot1 can always defeat TStarBot2 in a head-to-head match, it lacks the strategy diversity required to beat professional human players consistently. The team plans to build a more carefully hand-tuned action hierarchy in the future to enable the reinforcement learning algorithms to develop better strategies for full StarCraft II games.