Libratus Poker Github

Return to site

Libratus Poker Github

Please visit the wiki page for documentation how to run the bot This pokerbot plays automatically on Partypoker and Pokerstars. It works with image recognition, montecarlo simulation and a basic genetic algorithm. The mouse is moved automatically and the bot can play for hours.
Libratus Poker Github Games
Libratus Poker Github Game
Libratus Poker Github Bot
Libratus Poker Github App
Libratus beat each of the players individually in the two-player game and collectively amassed more than $1.8 million in chips. Measured in milli-big blinds per hand (mbb/hand), a standard used by imperfect-information game AI researchers, Libratus decisively defeated the humans by 147 mmb/hand. In poker lingo, this is 14.7 big blinds per game.
Libratus eventually won by a staggering 14.7 big blinds per 100 hands, trouncing the world's top poker professionals with 99.98% statistical significance. This was the first AI agent to beat professional players in heads-up no-limit Texas hold 'em.
First AI to beat humans in six-players no-limit Texas hold'em poker
Poker has always been the poster child for imperfect/hidden information games in game theory and AI. Prior breakthroughs in Poker AI, like Libratus, and games in general (Starcraft, Go, Dota, etc.) have been limited to two-player zero-sum games. While such AI typically approximates a Nash equilibrium strategy, however, finding or even approximating such equilibria in more-than-two-player games can be extremely hard and possibly not even worth it.
Tree search algorithms, like ABP and MCTS, have also been used extensively in prior algorithms. However, using such algorithms in hidden information games lead to intractable search spaces.
Nash equilibriumA Nash equilibrium is reached when all players of a game are assumed to know the equilibrium strategies of the other players, and no player can gain by changing their own strategy. An example of this is the lemonade stand game, where every player must stay as far from every other player as possible.
source: ai.facebook.com
In this game, there are infinite ways the equilibrium can be reached. However, if every player computes this equilibrium separately, it is very unlikely the players will end up equally spaced. In six-player poker, the authors decided that the algorithm should not try to find such equilibrium, and instead focus on creating an AI that beats humans empirically.
Hidden informationA successful poker-playing AI must reason about the game's information and pick the optimal strategy. However, Poker involves bluffing, where the optimal strategy might not only depend on your cards. It is, therefore, necessary to balance the probability of your opponent having a strong hand versus them bluffing. Perfect information games can use tree search algorithms such as Alpha-Beta Pruining to estimate future action outcomes. In imperfect information games, the value of an action is dependant on the probability of it being taken.
Pluribus was trained via self-play, where the agent plays against copies of itself, with a modified version of Monte-Carlo Counterfactual Regret Minimization(MCCFRM) to update its policy.
MCCFRM explores the decision tree up to a certain depth and then backtracks and explores what would have happened if other actions would have been selected. A 'regret' is then calculated based on how high the value of other paths are. The regret then influences the probability of selecting actions in the future. The video below shows how Pluribus updates its policy.
source: ai.facebook.com
Because the agent is playing against itself, it can know what would have happened if a different action had been chosen. As you can imagine, maintaining regret for every action in a game would be intractable. Therefore, the authors used 'bucketing', where similar actions and states are grouped together. For example, raising 200$ or 201$ dollar will provide similar outcomes, and an opponent having a 10-straight or a 9-straight will usually act the same.
The self-play training provides only a 'blueprint' strategy that influences only the first betting round. Afterward, the agent uses MCCFRM to adapt its strategy to opponents. Moreso, tree search algorithms usually presume that the opponent will stick to a single strategy throughout the whole game, which is not something poker players do. Pluribus instead uses an approach where it will assume that each player may act according to different strategies beyond the leaf nodes. With Pluribus, the different strategies are variations of the blueprint, each more biased towards folding, calling or raising.
Finally, in Poker, the optimal strategy depends on how the opponents perceive the player's hand. If the player never bluffs, the opponents will know to fold in response to a big raise. To add uncertainty, Pluribus computes the probability of reaching the current situation regardless of the hand it is holding. Once the strategy balanced across all hands is computed, it executes an action for the hand it is actually holding.
DataPluribus was trained for eight days on a 4-core server and required less than 512GB of RAM. No GPU were used.
source: ai.facebook.com
The score is reported against the final snapshot of the bot, and 'BB/100' means the number of big blinds won per 100 games. Limping is a strategy often employed by beginner players where the player bets the absolute minimum to stay in a hand. More advanced players prefer to be more aggressive.
Pluribus was tested against five poker professionals over 12 days where 10 000 hands were played. A prize of 50 000$ was divided amongst the human pros to incentivize them to play their best. Pluribus was estimated to have a win rate of 5 BB/100, which means an average of 5$ per hand.
source: ai.facebook.com
And here it is in action
source: ai.facebook.com
Unless it was not clear, Pluribus does not use neural networks and its policy is solely modified through CFR
Pluribus tends to do 'donk betting', where a player starts a round with a bet after ending the other with a call, which is contrarian to the folk wisdom saying it is not a good idea to do so.
On the opposite, it confirmed that limping is not a good strategy.
Human pros generally say that Pluribus plays very differently from humans
A lot of info is available in the supplementary material, including the MCCFR algorithm
Supplementary materials (includes pseudo code and a most of the technical stuff)Blog postHN discussion
It can do it for you … before it does you in

Types of machine learning
Tools for Machine Learning
Data representation
To give you an idea of the pervasiveness of AI and Machine Learning, Google CEO Sundar Pichai said in 2016:

'Machine learning is a core, transformative way by which we're rethinking how we're doing everything. We are thoughtfully applying it across all our products, be it search, ads, YouTube, or Play. And we're in early days, but you will see us — in a systematic way — apply machine learning in all these areas.'
Skill-building from gamesIn 1997, when Deep Blue beat world chess champion Gary Kasparov, it did so by 'brute force', by using a supercomputer to analyze the outcome of every possible move, looking further ahead than any human possibly could.That wasn't Machine Learning or AI.
But In 2016, IBM's Watson software beat top Jeopardy game championsby 'learning' from books and encyclopedias. IBM only created the program that enables the computer to learn.The software makes use of a 'model' from example inputs to make predictionsvs. following strictly static program instructions (logic defined by human developers).

Types of machine learning
Tools for Machine Learning
Data representation
To give you an idea of the pervasiveness of AI and Machine Learning, Google CEO Sundar Pichai said in 2016:
'Machine learning is a core, transformative way by which we're rethinking how we're doing everything. We are thoughtfully applying it across all our products, be it search, ads, YouTube, or Play. And we're in early days, but you will see us — in a systematic way — apply machine learning in all these areas.'
Skill-building from gamesIn 1997, when Deep Blue beat world chess champion Gary Kasparov, it did so by 'brute force', by using a supercomputer to analyze the outcome of every possible move, looking further ahead than any human possibly could.That wasn't Machine Learning or AI.
But In 2016, IBM's Watson software beat top Jeopardy game championsby 'learning' from books and encyclopedias. IBM only created the program that enables the computer to learn.The software makes use of a 'model' from example inputs to make predictionsvs. following strictly static program instructions (logic defined by human developers).
Machine learning is a type of AI (Artificial Intelligence) that enables computers to do things without being explicitly programmed by human developers. Rather than explicit programming, Machine Learning algorithms identify rules through 'training' based on many examples.

The photo above is the Todai Robot in scored among the upper 20% of students in Japan's university entrance exams. Writing essays using a pen on paper. In Japanese. It knows 8,000 Japanese words, 2,000 mathematical axioms, and uses 'symbolic computation' for 'automatic reasoning' on 15 billion sentences. Back in 2014.
In 2017, the top ranked player in the Chinese game Go was defeated by Google's AlphaGo, which is based on Google's DeepMind acquisition.The software made moves that many considered illogical.BTW, Go is considered the most complex game ever invented.Whereas chess players have, at any given turn, an average of 35 possible moves,on a Go board's 19-by-19 grid, there are 250 possible moves.
Also in 2017, all top-ranked poker players werebested by software named Libratus from Tuomas Sandholm at CMU.The software adjusted its strategies during the tournament.And its algorithms for strategy and negotiation are game-independent, meaning they're not just about poker, but a range of adversarial problems.
Libratus Poker Github GamesRather than neat rows structured in fixed columns and rows within tables, AI computers deal with less structured data, such as (natural language text, images, and videos).
Libratus Poker Github Game21 August 2017, Elon Musk tweets: 'OpenAI first ever to defeat world's best players in competitive eSports [dota2]. Vastly more complex than traditional board games like Chess & Go'. VIDEO.
Use CasesWhat ordinary people might appreciate:
Estimate the price of a house given real estate data (multiple regression), so you don't waste time on properties that don't fit your criteria.
Classify movie reviews from imdb.com into positive and negative categories (such as 'revenge'), to spend time only on movies you want to see.
Classify news wire articles by topic (multi-class classification), to save time avoiding skimming articles not of interest to you specifically.
For small businesses:
Sort vegetables using computer vision
For enterprises:
Data Security - identify malware by detecting minute variations in file signatures.
Fraud detection - fight money laundering by finding anomalies in transactions.
Financial trading
Health care - understand risk factors and spot patterns in medical info
Marketing - personalize ads based on interests identified for individuals
Smart cars
Insurance - identify risks in smaller populations
Libratus Poker Github BotAlgorithmsLibratus Poker Github AppUse of hard-coded (static) 'rules' crafted by human programmers iscalled 'symbolic AI', used in 'expert systems' fashionable during the 1980s.
Machine learning algorithms identify information from data fed through 'generic' (general purpose) algorithms which build their own logic from detecting patterns within the data.
Patterns are recognized by neural network algorithms. This cycle of 'learning' is implicit in a definition of Machine Learning by Mitchell (in 1997): 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E'.
The 'network' in a neural network has multiple 'layers' or>function google_search(){ window.location = 'http://www.google.com/search?q=site:+' + encodeURIComponent(document.getElementById('q').value);}