![]() ![]() To study effect of an individual setting, we keep it fixed and take average result of varying other settings. Benchmarks were conducted on 16-core, 2.1GHz Linux machine with a Nvidia Titan Xp GPU unless otherwise stated. With smaller engagement distance, players collide with each other more often which requires more computations.īenchmark results. (c) Effect of simulation complexity on FPS. Setting turnframes defines number of frames between turns. Quad-core machine had Nvidia GTX 1080 GPU. (a) Benchmarks on two different machines, with match frames = 1000, turn frames = 10 and engagement distance = 1500. We were not able to include experiments with a server machine with Xvfb, as enabling CPU-rendering requires re-installing Nvidia drivers without OpenGL libraries. We observed notably smaller frame-rate from this setup, and especially he total FPS gain with multiple instances vanishes. As of writing, Nvidia drivers do not work with Xvfb, and CPU-based rendering must be used. One way is to use virtual screen buffers like Xvfb. Creating valid screen buffers on headless servers and/or over SSH connections requires special setups. Toribash uses OpenGL to render the game, which requires a valid display where screen buffer can be created. Settings of Toribash can be defined on creation and episode reset (not shown in the code). make( ”Toribash-DestroyUke-v0”)įigure 3: Example snippets of Python code running random agent on Toribash with ToriLLE, with default interface (left) and Gym environment (right). Listing 2: Gym API ⬇ import gym # Registers environments from torille import envs e = gym. Playing Toribash from a separate program Listing 1: Python API ⬇ import torille from torille import utils toribash = torille. The major challenge remaining is learning to win unseen tactics, while the only opponent provided by Toribash is an immobile ”Uke” opponent. Without any modifications to the learning method or reward shaping, the agent learns to complete the task with ease. The experiments conducted in this paper include the off-the-shelf learning agent learning to fight against an immobile opponent. One game of Toribash lasts only hundreds of steps, so use of winning/losing as a reward is also feasible. A random agent is able to receive this reward reliably, since players start at close proximity to each other. Toribash provides a dense reward in form of damage inflicted on enemy and penalty in form of damage received. Controller replies to this by sending a list of joint states to be executed, and Toribash progresses by one turn. For each turn, game sends an observation vector to the controller. We also provide a Python library for easier use out-of-the-box, but any language that supports sockets can be used to control Toribash instance. ToriLLE uses Toribash’s Lua scripts to communicate with outside controllers over TCP/IP. įigure 2: Overview of the Toribash Learning Environment (ToriLLE). In fact, our work was inspired by and partly based on a code using genetic algorithms to successfully damage immobile opponent. Outside academia, users from Toribash forums have experimented with similar approaches in same task by using neuroevolution of augmented topologies. The agent was capable of improving its attacks on the opponent’s character over time. A related publication used genetic algorithms to train computer to attack an immobile opponent, and the authors analyzed how said algorithm explores for better moves. Winner of the game is the one who received least amount of damage or the one who did not touch the ground with other than feet or hands.Īlthough we present Toribash as a novel environment for machine learning, this is not the first time Toribash is used in this context. These states define how the joint behaves for the next simulated timesteps. Two players control their respective characters by changing the state of the joints in the character’s body. įigure 1: An in-game image of Toribash with two characters fighting each other. As it stands, training such super-human agents in video games requires computer-cluster level of computing resources and manual tuning of the reward, actions and observations. These challenges must be addressed before agent can start learning tactics to beat another opponents aiming for a higher score, like human players. Jaśkowski2018]: The agents have to learn to move themselves efficiently, explore options like shooting at enemies and re-experience positive feedback many times for learning to happen. ![]() For example, ViZDoom does support playing against other players, but learning to do so is difficult due to many problems it presents [ Wydmuch, Kempka, and
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |