Opponent Strategy Identification Using StarCraft - Blog

This is the second in a multi-part QOMPLX Intelligence series that examines how QOMPLX uses StarCraft as a training ground for research to support advanced reinforcement learning and effective decision making using machine learning and artificial intelligence. You can download PDF copies of Part 2: Opponent Strategy Identification Using StarCraft or our first installment: Build Order Selection in StarCraft.

As we've noted in a prior blog post, QOMPLX's business value hinges on our ability to continually develop and improve our analytic and computational tools. Ongoing research and development is an important part of the work we do.

To that end, our engineers have found Blizzard’s StarCraft Real Time Strategy (RTS) game to be a rich canvas for experimentation and concept validation. Like other cutting edge companies and research organizations working on machine learning and artificial intelligence, QOMPLX engineers are leveraging StarCraft for reinforcement learning, multi-agent reinforcement learning techniques, and select experiences in applied research supporting these improved decision-making goals.

Opponent Strategy Identification: A Common Objective

In our latest installment in this series, we are taking a look at how we're using StarCraft to help hone our ability to identify and learn an opponent's strategy.

Identifying and learning an opponent's strategies (and the way such strategies are
chosen) is an important problem in Reinforcement Learning (RL). While identifying strategies in games like Chess or Go is relatively straight forward, it is far more difficult in complex RTS games like StarCraft in which the play space is much larger and in which players must contend with so-called "partial observability" - the fact that large parts of the game board is obscured to them, at least initially.

Peering into the Fog

In our latest research, we have categorized different types of opponents based on how they choose their strategy (or "build order," in StarCraft terminology) by means of a scoring mechanism. We use this to inform our own decision making and to determine a counter-strategy that will anticipate the actions of the opponent.

The four types are opponents are:

1BO - those who use a single BO all the time

RP - those who randomly pick a strategy from a pool of build orders in each game

UCB1 - those who use a popular strategy-selecting algorithm

CP - those who always use the build order they used successfully in the last game, or which maximizes the chance of success over the build order their opponent last used

To examine the effectiveness of the scoring mechanisms for each type of opponent, we simulated an opponent who chose BOs in a way dictated by the opponent type. Our test totaled four experiments of 1000 games of StarCraft each. The scoring mechanisms for all four types of opponent were then applied. Use the button below to download our report and read a discussion of each experiment and its findings.

Applications in Cyber Security

Our research into build orders and dynamically assessing the likely strategy of an opponent is highly relevant for adversarial domains like information security, which involve sentient and learning actors. IT security teams commonly square off against adversaries about whom they have imperfect knowledge of their identity, intentions and objectives or even their location within a compromised environment.

Our research using the StarCraft RTS is an important part of QOMPLX’s broader research into optimal decision-making under uncertainty and how different techniques can improve risk management outcomes.