利用蒙特卡洛树搜索实现五子棋游戏功能（一）

本功能利用Python语言，基于numpy库实现

利用蒙特卡洛树搜索实现五子棋游戏可以说是一个深度学习入门项目，也算是我第一次接触深度学习领域的项目，过程还算“比较”顺利，但是难度上还是要比之前做过的利用Q-learning实现迷宫寻路要上了一个台阶。
这篇博客主要介绍五子棋环境以及行棋动作的实现，对于蒙特卡洛树搜索算法的介绍留到下一篇博客吧。

五子棋棋局环境的搭建

创建一个棋盘类Class Board，方便我们操作棋子，修改棋盘格局。

此类的基本属性包括：
棋盘的宽/长度size，
棋盘的初盘board，
当前玩家cur_player，
属性初始化代码如下：

def __init__(self, board=None, size=8, cur_player=-1):
    self.size = size
    self.board = np.zeros((self.size, self.size), int) if board is None else board     # 棋盘初始状态，若未传入初盘则当作空棋盘处理
    self.cur_player = cur_player   #-1代表黑子，1代表白子

创建一个玩家类Human,方便我们读取落子位置，得到用户落子后棋盘格局。
此类的基本属性为当前玩家player初始化为黑子-1，属性初始化代码如下：

def __init__(self, player=-1):
        self.player = player

这些环境类下有操作棋盘，修改格局的多种功能，分别是：
is_move_legal(self, move_pos)：判断当前动作在当前棋盘下是否合法。
get_legal_pos(self)：获取当前棋盘内所有可行棋的位置。
move(self, move_pos)：按照move_pos的位置修改当前棋盘格局，即在move_pos位置上放置当前玩家的棋子。
board_result(self, move_pos)：判断当前棋盘在当前动作下是否产生输赢或平的局面。
game_over(self, move_pos)：判断当前游戏是否结束，输赢方分别是谁，或是否产生平局局面。
get_action_pos(self, board)：读取用户输入的落子位置。
action(self, board)：获得用户落子后的棋盘格局。

相应代码如下：

is_move_legal(self, move_pos)：判断当前动作在当前棋盘下是否合法。

def is_move_legal(self, move_pos):
    x, y = move_pos[0], move_pos[1]
    if x < 0 or x > self.size or y < 0 or y > self.size: #判断是否溢出棋盘边界
        return False
    if self.board[x, y] != 0:   #判断是否下在已经有棋子的位置上
        return False
    return True

get_legal_pos(self)：获取当前棋盘内所有可行棋的位置。

def get_legal_pos(self):
    indices = np.where(self.board == 0)   #找到棋盘上未落子的位置
    return list(zip(indices[0], indices[1])) #封装成二元组

move(self, move_pos)：按照move_pos的位置修改当前棋盘格局，即在move_pos位置上放置当前玩家的棋子

def move(self, move_pos):
    if not self.is_move_legal(move_pos): 
        raise ValueError("move {0} on board {1} is not legal". format(move_pos, self.board))
    new_board = Board(board=np.copy(self.board), cur_player=-self.cur_player)
    new_board.board[move_pos[0], move_pos[1]] = self.cur_player
    return new_board 

board_result(self, move_pos)：判断当前棋盘在当前动作下是否产生输赢或平的局面。

def board_result(self, move_pos):
    x, y = move_pos[0], move_pos[1]
    player = self.board[x, y]
    direction = list([[self.board[i][y] for i in range(self.size)]])
    direction.append([self.board[x][j] for j in range(self.size)])
    direction.append(self.board.diagonal(y - x))
    direction.append(np.fliplr(self.board).diagonal(self.size - 1 - y - x))
    for v_list in direction:
        count = 0
        for v in v_list:
            if v == player:
                count += 1
                if count == num_in_a_row_will_win:
                    return True
            else:
                count = 0
    return False

game_over(self, move_pos)：判断当前游戏是否结束，输赢方分别是谁，或是否产生平局局面。

def game_over(self, move_pos):
    if self.board_result(move_pos):
        return 'win'
    elif len(self.get_legal_pos()) == 0:
        return 'tie'
    else:
        return None

get_action_pos(self, board)：读取用户输入的落子位置。

def get_action_pos(self, board):
    try:
        location = input("Your move(please use commas to separate the two index): ")
        if isinstance(location, str) and len(location.split(",")) == 2:  
            move_pos = tuple([int(n, 10) for n in location.split(",")])     # 提取出用户输入的位置信息并封装
        else:
            move_pos = -1
    except:
        move_pos = -1

    if move_pos == -1 or move_pos not in board.get_legal_pos():
        print("Invalid Move")
        move_pos = self.get_action_pos(board)
    return move_pos

action(self, board)：获得用户落子后的棋盘格局。

def action(self, board):
    move_pos = self.get_action_pos(board)
    board = board.move(move_pos) 
    return board, move_pos

差不多就这么多了

将上述简单动作组合在一起，这些环境已经可以实现五子棋的基本功能了，接下来就要把目光聚焦到如何利用蒙特卡洛树搜索训练模型并搭建起人机游戏的模型上，这些内容将放在我的下一篇博客中。