Overclock.net › Forums › Industry News › Software News › [Verge] An unbeatable computer program has finally solved two-player limit Texas hold'em poker
New Posts  All Forums:Forum Nav:

[Verge] An unbeatable computer program has finally solved two-player limit Texas hold'em poker - Page 5

post #41 of 60
Quote:
Originally Posted by RiverOfIce View Post

And?????

Here is the problem.

"There are three types of lies, white lies, damn lies, and statistics." -Mark Twain.

The computer will always break even or win if it plays you for an unlimited a number of times. NOT if it plays you a set number of times. Because statistics being what they are, the computer will have to deal with what it deal with.

Watching professional players play poker, you start to learn that no matter how good the player is, a full house will always beat two of a kind.

The computer still plays with the same statistical average as for each hand. If it is a good hand or a bad hand, it is the hand that was dealt.

Yes, long term over the course of millions of games, it could be a player, but the player could also create the same database it is using and always cause a draw.

PS. AI is not about making long databases and selecting the best outcome. So please stop calling it an AI, it is just a very large data base.
Given a set number of hands.... the computer will still most likely tie or win. That doesn't really change anything... the probability just increases over time.

The machine learning part is that the system itself figures out the best strategies. It is not simply a massive database of all possible combinations.
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #42 of 60
Quote:
Originally Posted by DuckieHo View Post

Given a set number of hands.... the computer will still most likely tie or win. That doesn't really change anything... the probability just increases over time.

The machine learning part is that the system itself figures out the best strategies. It is not simply a massive database of all possible combinations.

No there is no learning.
Quote:
Originally Posted by original article 
The way the program works is actually pretty simple: all it has to do during a game is search its database of pre-computed game situations to find the most optimal move at any given moment.

At no point can you consider a machine that just looks up already played hands to deal with what is in front of it, "learning". Learning is not used in the whole article. Learned was used once to reflect the gained insight of the person that created the program learned about the game of poker.

It is completely a massive database of pre-computed game situations. I would re-read the article and look up more information about the code.
The Guppy
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 980x Ga-x58a-ud3r rev.2 460 gtx Sli 16gb 
Hard DrivePowerCase
Samsung f3 Corsair TX950W Haf 932 
  hide details  
Reply
The Guppy
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 980x Ga-x58a-ud3r rev.2 460 gtx Sli 16gb 
Hard DrivePowerCase
Samsung f3 Corsair TX950W Haf 932 
  hide details  
Reply
post #43 of 60
Quote:
Originally Posted by RiverOfIce View Post

No there is no learning.
At no point can you consider a machine that just looks up already played hands to deal with what is in front of it, "learning". Learning is not used in the whole article. Learned was used once to reflect the gained insight of the person that created the program learned about the game of poker.

It is completely a massive database of pre-computed game situations. I would re-read the article and look up more information about the code.
Except you would be wrong and may want to look deeper yourself. You are getting your information from a non-technical article for the masses.

1) How was this database built? The system itself developed the strategy.... hence, machine learning. They were using reinforcement learning and the cluster to build the database and develop their CFR+ algorithm.
Quote:
"We had this training phase where the program started off playing uniform random against itself," meaning that "it had no idea what it was doing other than following the rules of the game," explains Michael Johanson, a computer scientist also at the University of Alberta, and a co-author of the study. But as the computer played itself, it got better and updated its strategy.

2) Here's Cepheus page: http://poker.srv.ualberta.ca/about Smells like machine learning to me....
Quote:
Cepheus accomplished this goal with no human expert help, only being given the rules of the game. It was trained against itself, playing the equivalent of more than a billion, billion hands of poker. With each hand it improved its play, refining itself closer and closer to the perfect solution.

3) Understanding the complexities of imperfect game theory.... this stuff is VERY hard. It would be completely ignorant to believe that they could map out every single possible combination and imperatively program something to solve that. How many possible permutations are there? Go do the math.... how many digits are we look at? The number of possible combinations being played even at 1s is something on the order of older than the universe for a perfect game. For this system, they just used more games over poker than the whole of humanity ever played.

4) While the paper is behind a paywall, I found an older 2012 document by the same authors: https://webdocs.cs.ualberta.ca/~johanson/publications/poker/2012-aaai-cfr-br/2012-aaai-cfr-br-poster.pdf If they are using Counterfactual Regret Minimization, that's game theory. If the system is running it, it is in fact learning.

5) Professor Michael Bowling: http://webdocs.cs.ualberta.ca/~bowling/
Quote:
My research focuses on machine learning, games, and robotics, and I'm particularly fascinated by the problem of how computers can learn to play games through experience. I am the leader of the Computer Poker Research Group, which has built some of the best poker playing programs on the planet. The programs have won international AI competitions as well as being the first to beat top professional players in a meaningful competition. I am also a principal investigator in the Reinforcement Learning and Artificial Intelligence (RLAI) group and the Alberta Ingenuity Centre for Machine Learning (AICML). I completed my Ph.D. at Carnegie Mellon University, where my dissertation was focused on multiagent learning and I was extensively involved in the RoboCup initiative.

6) Michael Bradley Johanson http://webdocs.cs.ualberta.ca/~johanson/
Quote:
I'm a seventh year Ph.D. student at the University of Alberta in Edmonton, Canada, studying artificial intelligence in the Department of Computing Science. I like to study problems that humans find challenging and intriguing, and find ways to program (or, more accurately, train) computers to perform as well or better than the best human experts. Games are great examples of these types of problems. Human experts can spend years studying a game like poker or chess and play the game at a high level of skill, and the clearly defined rules and goals allow us to make computer programs that can compete against them. By pitting human intelligence against artificial intelligence, we can directly measure the progress of our research towards producing computer agents that make good decisions.

Edited by DuckieHo - 1/14/15 at 8:42pm
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #44 of 60
BTW... you can actually play against the system: http://poker.srv.ualberta.ca/play
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #45 of 60
i can beat it wink.gif
post #46 of 60
Quote:
Originally Posted by DuckieHo View Post

3) Understanding the complexities of imperfect game theory.... this stuff is VERY hard. It would be completely ignorant to believe that they could map out every single possible combination and imperatively program something to solve that. How many possible permutations are there? Go do the math.... how many digits are we look at? The number of possible combinations being played even at 1s is something on the order of older than the universe for a perfect game. For this system, they just used more games over poker than the whole of humanity ever played.

I think you are giving out way too much credit in the terms of how the system is being trained.

From reading the article and the general information about it, what it looks like is the hard part was training it to learn the general rules, and once it was playing against itself for build the almost whole optional hands possible, it was making statistics on what better hand on what situation is better to play out of billion on billions out out comes.

The basic idea behind it isn't overly complicated like you make it sound. You play hands, you mark each hand statistically whether it lead eventually to a a win, lose or tie (which is the reason why they played billions of self run games in the first place) and over time you refine that statistics. Yes there is a lot of math involved, but as all the data is in a database completely, there math isn't as complicated as it stats to be based on more selecting the highest chance to win play.
If a specific hand play keeps bringing a loss, it will stop playing it, but it will not learn if the person playing against him used that hand specifically to push the system into another hand or not.

The system doesn't "learn" how to play better. It updates its statistics when what gave better or worse outcome. Updating statistics isn't really learning, at least in how I see it.
It would lean if it could look at the player he is playing against, and learn his moves, learn how to beat that person specifically. This is how good poker players play. Not by just statistics.
Just plain statistics and outcomes isn't even close to an AI.
Main system
(16 items)
 
Editing PC
(8 items)
 
 
CPUGraphicsGraphicsRAM
E5-1680v2 AMD FirePro D700 AMD FirePro D700 64GB 1866mhz 
Hard DriveOSMonitorCase
1TB PCIE SSD OSX 10.10.x Dell U2713H Mac Pro 
  hide details  
Reply
Main system
(16 items)
 
Editing PC
(8 items)
 
 
CPUGraphicsGraphicsRAM
E5-1680v2 AMD FirePro D700 AMD FirePro D700 64GB 1866mhz 
Hard DriveOSMonitorCase
1TB PCIE SSD OSX 10.10.x Dell U2713H Mac Pro 
  hide details  
Reply
post #47 of 60
Quote:
Originally Posted by Defoler View Post

I think you are giving out way too much credit in the terms of how the system is being trained.

From reading the article and the general information about it, what it looks like is the hard part was training it to learn the general rules, and once it was playing against itself for build the almost whole optional hands possible, it was making statistics on what better hand on what situation is better to play out of billion on billions out out comes.

The basic idea behind it isn't overly complicated like you make it sound. You play hands, you mark each hand statistically whether it lead eventually to a a win, lose or tie (which is the reason why they played billions of self run games in the first place) and over time you refine that statistics. Yes there is a lot of math involved, but as all the data is in a database completely, there math isn't as complicated as it stats to be based on more selecting the highest chance to win play.
If a specific hand play keeps bringing a loss, it will stop playing it, but it will not learn if the person playing against him used that hand specifically to push the system into another hand or not.

The system doesn't "learn" how to play better. It updates its statistics when what gave better or worse outcome. Updating statistics isn't really learning, at least in how I see it.
It would lean if it could look at the player he is playing against, and learn his moves, learn how to beat that person specifically. This is how good poker players play. Not by just statistics.
Just plain statistics and outcomes isn't even close to an AI.
Except... I am not. This is an imperfect information game... things are NOT simple. The odds are not easily calculated. If I have two cards cards.... all I know is the other does NOT have the same cards. So... the possible outcome is 52 cards - 2 cards that I have - 5 board cards.... so that would be 45! (or 1.1962222e+56) possible combinations. There are optimization techniques and you have to consider the moves to get to that point. However, it's a massive number. Humans cannot program the solution directly.

You are vastly under-estimating the complexity. The system absolutely learns how to play better. What is learning? "Act of acquiring new, or modifying and reinforcing, existing knowledge, behaviors, skills, values, or preferences". It is generally accepted that machines can learn.

Poker players have to rely on observing people because humans don't have the capability of process enough information.

Updating statistics alone is not learning. Updating statistics and modifying behavior is obviously learning.

Gather information, evaluate information, modify if necessary.... that's learning in a nutshell.

You forget that humans also really on statistics when making rational decisions.... always. "Based on my prior experience and current observations, I believe the outcome will be... "... that's statistics.
Edited by DuckieHo - 1/14/15 at 10:55pm
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #48 of 60
now we just need a computer that learns and will eventually make the perfect game. thumb.gif
     
CPUMotherboardGraphicsRAM
AMD Phenom FX-8320 4.1Ghz @stock Volts Gigabyte 970A-UD3P Sapphire 7870XT //GPU1.150Ghz//MEM1.5Ghz//10% G.Skil DDR3 8gb 1600mhz 
Hard DriveCoolingCoolingCooling
Seagate 1TB HD 5x 120MM Fans Coolmaster TX-3 1x 200mm Fan 
OSMonitorKeyboardPower
Windows 8.1 64Bit Samsung 32" LCD TV Cyborg Keyboard Seasonic 520W 
CaseMouse
Antec Unknown Gigabyte M6980X 
CPUMotherboardGraphicsRAM
Intel Core 2 Duo Q6600 2.4 Ghz ASUS IPIBL-LA (Berkeley) GT 430 3 GB DDR2 Samsung Sticks 
OSPowerCase
Windows 7 32bit Bestec 300W Hp M9040N 
  hide details  
Reply
     
CPUMotherboardGraphicsRAM
AMD Phenom FX-8320 4.1Ghz @stock Volts Gigabyte 970A-UD3P Sapphire 7870XT //GPU1.150Ghz//MEM1.5Ghz//10% G.Skil DDR3 8gb 1600mhz 
Hard DriveCoolingCoolingCooling
Seagate 1TB HD 5x 120MM Fans Coolmaster TX-3 1x 200mm Fan 
OSMonitorKeyboardPower
Windows 8.1 64Bit Samsung 32" LCD TV Cyborg Keyboard Seasonic 520W 
CaseMouse
Antec Unknown Gigabyte M6980X 
CPUMotherboardGraphicsRAM
Intel Core 2 Duo Q6600 2.4 Ghz ASUS IPIBL-LA (Berkeley) GT 430 3 GB DDR2 Samsung Sticks 
OSPowerCase
Windows 7 32bit Bestec 300W Hp M9040N 
  hide details  
Reply
post #49 of 60
Quote:
Originally Posted by Sadmoto View Post

now we just need a computer that learns and will eventually make the perfect game. thumb.gif
The Settlers of Catan?
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #50 of 60
The human knows that the computer is learning the computer is not concious of this , it means you can potentially manipulate it , for exemple teaching him wrong move for long time then change style of play etc etc.
if he learned prior to the game there is nothing an human can do , the computer will not feel tired or hungry or frustrated , he just apply his formula , it doesn't know impatience or exitation , it doesn't make mistake unlike humans and it has a perfect memory.
HAF X - main rig
(24 items)
 
HAF XB
(14 items)
 
 
CPUMotherboardGraphicsRAM
E8500 ASUS P5E3 ASUS dcuii TOP GTX 670 OCZ reaper x hpc  
Hard DriveOptical DriveCoolingOS
samsung 500 gb samsung dvd antec H2O 920 windows vista 64b 
MonitorKeyboardPowerCase
LG 23 passive 3d logitech g15 CORSAIR GS800 haf xb 
MouseAudio
logitech g500 logitech Z 2300 
  hide details  
Reply
HAF X - main rig
(24 items)
 
HAF XB
(14 items)
 
 
CPUMotherboardGraphicsRAM
E8500 ASUS P5E3 ASUS dcuii TOP GTX 670 OCZ reaper x hpc  
Hard DriveOptical DriveCoolingOS
samsung 500 gb samsung dvd antec H2O 920 windows vista 64b 
MonitorKeyboardPowerCase
LG 23 passive 3d logitech g15 CORSAIR GS800 haf xb 
MouseAudio
logitech g500 logitech Z 2300 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Software News
Overclock.net › Forums › Industry News › Software News › [Verge] An unbeatable computer program has finally solved two-player limit Texas hold'em poker