Шах и Мат

HIARCS 10 Chess Engine Options

  The HIARCS chess engine has many parameter options which can be modified to change the playing style and strength of the chess engine. We believe the default options were the best known at release but maybe you can find a better set of parameters!
  Since release there has been evidence that the Hypermodern setting when switched on improves Hiarcs 10 by about 10 Elo.
  Below you will find the options available in the HIARCS 10 chess engine and a description of their affect on the engine. Any defaults are in parentheses and explained in the text.

Collapse )
Шах и Мат

HIARCS 11 UCI Chess Engine Options

  The HIARCS chess engine has many parameter options which can be modified to change the playing style and strength of the chess engine. We believe the default options were the best known at release but maybe you can find a better set of parameters!
  Below you will find the options available in the HIARCS 11.2 UCI chess engine and a description of their affect on the engine. Any defaults are in parentheses and explained in the text.
Collapse )
Шах и Мат

Neural nets, genetic algorithms, and other methods for automatically learning evaluation functions

Tuning the evaluation function

  Last time I talked about the different types of functions that evaluate features in a position, and how to combine them into an evaluation function by adding the values of many such functions. But, where do the numbers come from?
  E.g., in Othello, you might have say four functions:
f(pos) = material (# of my pieces - # of opponent pieces)
g(pos) = corners (# I control - # opponent controls)
h(pos) = mobility (# moves I have available)

  You want to form an evaluation function by combining these (probably with some other terms):
  eval = a f + b g + c h.
For instance, you might try eval = -1*f + 10*g + 1*h. But where do these numbers come from? What combination of numbers gives the best performance?
  There are various methods for finding numbers by hand:

  • Normalize. Since you only care about the ordering of evaluations, and (usually) not the actual evaluation values, you can multiply everything by the same constant without changing the results. What this means is that you can choose some particular value (say the material value of a pawn) and force it to be one, so that all the other values are expressed in terms of how many pawns that value is worth. The net effect is that you have one fewer parameter that needs setting.

  • Deduce constraints. Sometimes it is possible to choose some of the parameters by considering what you want the machine to do in certain types of positions. For instance, in chess, it is usually bad to trade a rook for a bishop or knight, even if you also end up winning a single pawn, but good if you win two pawns, so the material values should satisfy R>B+P (to prevent the single-pawn trade) and R<B+2P (to encourage the double-pawn trade). The more of these inequalities you have, the smaller the set of weights that satisfy them. This can sometimes help get to a reasonable starting approximation for the evaluation weights, but you probably still need to do some adjustment afterwards.

  • Hand tweaking. Most commonly used. Simply play your program enough times to get an idea of its strengths and weaknesses, guess which parameters would improve those the best, and pick new values for the parameters. Produces a reasonable answer quickly. Requires that you understand the game well enough to play reasonable games against the computer and analyze what it does wrong (i.e. best when the computer is stupid and you are intelligent).

... and without human intervention:

  • Hill-climbing is like hand-tweaking: make a small change to the weights, test out performance of that change, keep the change only if performance improves, repeat many times. Tends to be slow, get stuck in "local optima" in which eval weights are bad but any change makes them even worse.

  • Simulated annealing. Hill-climbing and simulated annealing maintain one good set of weights, which they change gradually. Instead, genetic algorithms maintain a collection of several different good sets of weights, add new sets to the collection by combining pairs of existing ones (take some weights from one and some from another, with a little mutation as well), and keep the size of the collection down by killing off sets of weights with bad performance.

  • Genetic algorithms. Hill-climbing and simulated annealing maintain one good set of weights, which they change gradually. Instead, genetic algorithms maintain a collection of several different good sets of weights, add new sets to the collection by combining pairs of existing ones (take some weights from one and some from another, with a little mutation as well), and keep the size of the collection down by killing off sets of weights with bad performance.

  • Neural networks. Really, this is more a type of evaluation function than a method for choosing weights: a neuron is a function of the form threshhold(weighted sum of inputs), and one can form networks in which the neurons in the first layer take as inputs some basic features of the position (e.g. the individual bits of a bitboard representation) and successive layers take as inputs the neurons from the previous layer. So a one-layer network with only one-input neurons is the same as the first-order evaluation functions we talked about last time, but it's straightforward to build much more complicated neural networks, and it's not hard to use such a thing as the evaluation function (just recompute the outputs of neurons with changed inputs). The question is as before, how to set the weights? Along with the other methods above, there are some that have been developed specifically for neural networks, such as "temporal difference learning". The basic idea is to decide when the network makes a bad decision, and determine for each weight separately whether changing it up or down would lead to a better decision, so it's a lot like hill-climbing. One advantage of neural nets is they need even less human intelligence than the other automatic learning methods: you don't even really need to understand the game well enough to program a decent evaluation function. However, in the time available to us (a few weeks), you'll get good results faster by being more intelligent yourselves and leaving less of the work to the machines.

  All of these methods require some method of automatically evaluating the performance of a program.

  • We can run the program on a large suite of test positions (say, taken from high-quality human games) and see if it gets the right answers.

  • We can play the program against some known opponent (say another program) and see how often it wins. Or, we can play the program against itself, or against other versions of itself; e.g. in hill-climbing the modified program can play against the unmodified one. Both of these have the disadvantage that, unless there is some randomness in the system, both programs will play exactly the same each time, so you only get to see the results of one game which may not be representative of overall play. One possible way around this would be to start playing several different games at positions taken from some test suite.

  • We can compare the results of the evaluation function with the results of combining evaluation and search. If the eval is good, they should be similar, but is vice versa true?

  What has actually been done in automatically learning evaluation weights? A good source for this is Jay Scott's machine learning in games web page. He lists two experiments that I think are particularly interesting:

  • John Stanback (well-known as a commercial chess programmer) tried using a genetic algorithm to set the weights in the evaluation function of his program Zarkov. He only ran 2000-3000 games, which I think is way too few, and got material values that were ok but still worse than hand-tuned values. I think the lesson is that genetic algorithms do work, but need either a lot of generations or a good initial set of weights.

  • Risto Miikkulainen, genetic algorithms researcher at U. Texas, gave a talk last year on some experiments he'd done with Othello. He used a genetic algorithm to tune the weights in a neural net evaluation function. Performance evaluation of a net was done by play against a fixed opponent. If the fixed opponent played randomly, the neural net learned an evaluation in the form of piece-square tables (pieces in corners are good, pieces next to corners are bad, etc) after which it won all the time and stopped learning. Against an opponent combining piece-square tables with a short search, it eventually (after weeks of computer time) learned a better mobility-based strategy. But if its opponent was already a sophisticated mobility-based program, it lost all the time and never started learning. I think the lesson is to play against programs of similar strength, for instance to compare different evaluations in the genetic algorithm's current collection by playing them against each other, or alternatively to play against several fixed opponents of different strengths.
Шах и Мат

Horizon effect; quiescence and selective extensions; null move pruning

Which nodes to search? Full-width vs. selective search

  Alpha-beta tells us how to search, but we still need to know when to expand a node (search its children) and when to just stop and call the evaluation function.

The Horizon Effect

  The pseudo-code I've shown you so far searches every move out to a given fixed depth (this depth is also known as the horizon. Although this can be quite effective at seeing tactical threats that could be carried out within the horizon, it (obviously) can't detect threats that would take effect past the horizon; for instance a depth-8 search (that is, a search four moves deep) would likely not have much or any information about a forced checkmate in five moves. What it don't know, it can't defend against, and it would simply ignore those long-term threats. But this sort of fixed-depth search can behave even worse when the position contains medium-depth threats in which some bad outcome is forced to occur, but where some lines have that outcome within the search horizon and some don't. In that case, the program can play horrible pointless moves in an attempt to delay the bad outcome long enough that it can't be seen. This phenomenon is known as the horizon effect.
  Here's an example. In the following position, black's bishop is trapped by the white pawns. No matter what black does, the bishop will be taken in a few moves; for instance the white rook could maneuver h2-h1-a1-a2 and capture the bishop. That sequence is 8 plys deep, and suppose that the black program is searching to a depth of 8 plys. Probably the best move for black in the actual position is to trade off the bishop and pawns, e.g. bishop x pawn, pawn x bishop. In the remaining endgame, black's three connected passed pawns may be enough to win or draw against the rook. But, a program searching 8 plys will likely instead move the black pawn forwards, checking the white king. White must respond (e.g. by taking the pawn with his king), but that forced response delays the loss of the bishop long enough that the program can't see it anymore, and thinks the bishop is safe. In fact, in this position, a fixed-depth program can continue throwing away its pawns, delaying the bishop capture a few more moves but probably causing the eventual loss of the game.
horizon effect example

  One way to counter the horizon effect is to add knowledge to your program: if it knows from the evaluation that the bishop is trapped, its search won't try to delay the capture by throwing away pawns. Another is to make the search faster and deeper: the more levels your program searches, the less likely you are to run across a situation like this where it is possible to delay the loss of the bishop past the horizon. But the most effective general solution is to make the search depth more flexible, so that the program searches deeper in the lines where a pawn is being given away and less deep in other lines where it doesn't need the depth.
Collapse )
Шах и Мат

Variants of Alpha-Beta Search

  Although the basic alpha-beta search discussed already is simple and works well, there have been several attempts to search game trees even more efficiently. The basic idea behind most of these is that, if we consider the scores in the range from alpha to beta as being "interesting" and all other scores to be "uninteresting", then alpha-beta gets its efficiency by cutting off the search quickly at nodes with uninteresting scores. If we narrow the gap between alpha and beta, fewer scores will be interesting and more cutoffs will happen.
  First, let's quickly review the original alpha-beta search, omitting details like hashing or adjust winning scores for the current ply.
    // basic alpha-beta search
    int alphabeta(int depth, int alpha, int beta)
    {
        move bestmove;
        if (game over or depth <= 0) return winning score or eval();
        for (each possible move m) {
            make move m;
            score = -alphabeta(depth - 1, -beta, -alpha)
            if (score >= alpha) { alpha = score; bestmove = m; }
            unmake move m;
            if (alpha >= beta) break;
        }
        return alpha;
    }

Collapse )
Шах и Мат

Forcing progress in Winning Positions

  If the game reaches a point where a win can be forced, alpha-beta search will find it. But, paradoxically, making a winning move at each turn is not always enough to win the game. The problem is in games like checkers or chess, one can make a sequence of moves that each lead to a forced win, but that don't cause the win to get any closer.
  For example, consider the following chess position:

White: Kd6, Qa7; Black: Ke8
  White to move can win immediately by moving the queen to square e7, checkmating the black queen. But white also has other moves that win more slowly; in fact there is only one move white can make that does not win. For instance, suppose white moves his king to e6; black's only moves are d8 and f8, after either of which white still has a checkmate possible. If black moves to d8, white can still win by moving back to d6. But after the sequence of moves 1. Ke6 Kd8 2. Kd6 Ke8 we are back where we started! White is making winning moves, but he isn't making progress to a win.
  If an alpha-beta search gives the same evaluation to any winning position, it can easily fall into this trap. To prevent this, we need to change the evaluation of winning positions, so that a win in fewer moves is counted slightly better than a delayed win. The code is straightforward: if we keep a variable "ply" denoting how far the current position is from the root of the search, we can adjust the score for a winning position by subtracting the ply. The following pseudocode assumes that we have defined a constant "WIN" which refers to the maximum score possible in a game (in chess, a typical value for WIN would be 100 or 1000 times the value of a pawn).
    // Alpha-beta search with WIN scores adjusted for ply

    int ply;    // global variable initialized to zero at start of search
    int alphabeta(int depth, int alpha, int beta)
    {
        if (game over and current player has won) return WIN - ply;
        else if (game over and current player has lost) return -WIN + ply;
        else if (depth <= 0) return eval();
        ply++;
        for (each possible move m) {
            make move m;
            alpha = max(alpha, -alphabeta(depth - 1, -beta, -alpha);
            unmake move m;
            if (alpha >= beta) break;
        }
        ply--;
        return alpha;
    }

  Now in the example above, the immediate checkmate is seen at ply=1, and gets a score of 999 (WIN-1), while moving the king to e8 forces a win at ply=3, with a score of 997. The program will move to the position maximizing its score, and take the immediate checkmate.
  For some games, such as Othello, there is a natural limit to the length of the game: each move adds a piece to the board, so there can be at most 64 moves before the game finishes. For those games, there is no way to get into the same sort of infinite loop, and we can just use a score of WIN or -WIN without worrying about the ply adjustment.
  There is one further complication with this ply adjustment trick: how does it interact with the hash table? The problem is that the ply may differ between the time we store a move in the hash table, and the time we retrieve it. In order to make the retrieved score's ply adjustment correct, we should store scores in the hash table adjusted relative to the current position, rather than the position at the root of the search.
  That is, when storing a position in the hash table, use something like the following pseudocode, where MAX_PLY is a constant defined to be greater than the maximum depth possible in a search (WIN=1000 and MAX_PLY=100 might be reasonable). The variable x is just the index of the current position in the hash table.
    if (score > WIN - MAX_PLY) hash[x].score = score + ply;
    else if (score < -WIN + MAX_PLY) hash[x].score = score - ply;
    else hash[x].score = score;
  When retrieving a position from the hash table, the opposite adjustment needs to be made:
    if (hash[x].score > WIN - MAX_PLY) score = hash[x].score - ply;
    else if (hash[x].score < -WIN + MAX_PLY) score = hash[x].score + ply;
    else score = hash[x].score;
Шах и Мат

Hashing and Move Ordering

  I didn't really finish describing alpha-beta - my pseudocode included a mysterious "sort list of moves" step that I didn't explain. I'll continue to leave that dangling while I talk about hashing; we'll connect it up in a little while.
  The idea of hashing is very simple. Many games allow transpositions of moves, meaning different sequences of moves that end up leading to the same position. For instance, in chess, the opening moves 1. d4 Nf6 2. c4 and 1. c4 Nf6 2. d4 both give the same position (known as an Indian defense). White's two pawn moves could be made in either order without changing the result. As an example of a more complicated transposition, the moves 1. e4 c6 2. d4 d5 3. ed Qxd5 4. Nc3 Qd6 (Caro-Kann defense), 1. e4 d5 2. ed Qxd5 3. Nc3 Qd6 4. d4 c6 (Scandinavian opening), and 1. e4 Nf6 2. e5 Ng8 3. d4 d6 4. ed Qxd6 5. Nc3 c6 (Alekhine defense) all lead to the same position, after different numbers of moves.
  Because of transpositions, the same positions can show up many places in the alpha-beta search tree. If we store a data structure that remembers what the results of searching each position were, we can look it up rather than searching it again. But... we don't have enough memory to store all the positions we search. And, lookups must be very fast to make it save time over just searching. Fortunately, we have one advantage: it's ok if we sometimes don't find the results from a position we already searched, and search the same position again, as long as it doesn't happen too often.
  The answer: hash tables. Make a big array: as large as possible without blowing out your physical memory (you don't want to eat into virtual memory, it will be slow.)
struct {
    long checksum;	// or long long might be even better
    int depth;
    enum { exact, lower_bound, upper_bound } entry_type;
    double eval;
} hashtable[HASH_TABLE_SIZE];
  For each position you search, compute a "hash value" x indexing into the hash table and another "hash value" y for checking whether you've found the right position.
  Before searching a position, lookup hashtable[x]. If hashtable[x].checksum == y, hashtable[x].entry_type == exact, and hashtable[x].depth is at least the depth you are currently searching, return the eval stored there.
  After searching the position, store y, the current depth, and the eval you just computed, into hashtable[x].
Collapse )
Шах и Мат

Alpha-Beta pruning

Alpha-Beta Search

Shallow Pruning

  Suppose you're doing a minimax search (as we described last time) of the following tree:
ICS180_l5_1
  You've searched F, and found its children's evaluations to be 11, 12, 7, and 9; at this level of search, the first player is to move, and we expect him to choose the best of these values, 12. So, the minimax value of F is 12.
  Now, you start searching G, and its first child returns a value of 15. When this happens, you know because of it that G's value will be at least 15, possibly even higher (if another of the children of G is even better). What this implies is that we don't expect the second player to move to G; from the second player's point of view, F's value of 12 is always better than G's value of 15 or higher. So, we know that G is not on the principal variation. We can prune the remaining children of G, not evaluate them, and return immediately from searching G, since any further work evaluating descendants of G would just be wasted.
  In general, we can prune like we did in node G when one of its children returns a value better (from the point of view of the player whose turn it is at node G) than the previously computed evaluation of one of G's siblings.

Collapse )

Шах и Мат

Game tree evaluation

Minimax and negamax search

Game Trees

  For any game, we can define a rooted tree (the "game tree") in which the nodes correspond to game positions, and the children of a node are the positions that can be reached from it in one move. For instance tic-tac-toe:
ICS180_l4_1.gif
  (Actually, the root of this tree should have nine children, but I've left out some symmetric cases. If the same board configuration arises from two different sequences of moves, we create two separate nodes, so this really is a tree. We'll see later when we talk about hashing how to take advantage of duplicated nodes to speed up the search -- we only need to search one copy of a position, and use the same search results everywhere else that position appears in the tree. We also assume that the players take turns moving, with no multiple moves or skipped turns; these complications can be dealt with by treating an entire sequence of actions by a single player as forming a single move. Finally, we'll assume this tree is finite, so that we're not dealing with a game that can go on forever or has infinitely many choices for a single move.)
  There are three kinds of nodes in the tree:
  1. Internal nodes at even levels correspond to positions in which the first player is to move.

  2. Internal nodes at odd levels correspond to positions in which the second player is to move.

  3. Leaves correspond to positions at which the game has ended. One player or the other has won, or perhaps the game is drawn.

Game Tree Evaluation

  Suppose we have an internal node for which all children are leaves, so that the game is certain to end in one more move. Then we can assume that the player to move is going to pick the best move. If there is a leaf giving a won position for him, he will move to it and win. If not, but some leaf gives a drawn position for him, he will move to it and draw. But, if all leaves give won positions for his opponent, he will lose no matter what happens.
  So, we know what the game outcome should be from nodes one level above the leaves. Once we know that, we can apply the same analysis bottom up, to determine the correct outcome from nodes two levels above the leaves, three levels up, and so on, until we reach the root of the tree. At each node, the position is won for the player on move if he can find a child of that node giving him a won position; it is drawn if he can find a child giving him a draw; and if neither of these holds then it is lost. This gives us an algorithm for playing games perfectly if only we had enough computation time, but for any reasonable game we won't have enough computation time. The trees are too big.
  This also tells thus that a "correct" evaluation function needs to only have three values, win lose and draw. The evaluations we use in computer game programs have a wider range of real-number values only because they're inaccurate. If we represent a first-player win as the value +1, a draw as the value 0, and a second-player win as the value -1, then the value of each internal node in the game tree is the maximum or minimum of its children's values, depending on whether the first or second player is to move respectively.

Collapse )