(Btw, reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers!)
vessenes 3 days ago [-]
Cool!
From the prompt it looks like you don’t give the llms a harness to step through games or simulate - is that correct? If so I’d suggest it’s not a level playing field vs human written bots - if the humans are allowed to watch some games that is.
levmiseri 3 days ago [-]
That’s true, I’m trying to figure out a better testing environment with a feedback loop.
I did try letting the models iterate on the bot code based on a summary of an end-of-game ‘report’, but that showed only marginal improvements vs. zero-shot
vessenes 3 days ago [-]
In my mind, I’d give it the following:
Step(n) - up to n steps forward
RunTil(movement|death|??) - iterate until something happens
Board(n) - board at end of step n
BoardAscii(n) - ascii rep of same
Log(m,n) - log of what happened between step m and n
Probably all this could be accomplished with a state structure and a rendering helper.
Do you let humans review opposing team’s code?
javadhu 4 days ago [-]
Cool project, this is my first time seeing such project using LLMs. Took me a while to understand what's happening on the home page.
A question though, why such powerful bots like Gemini 3.1 failed against Clowder bot? Is it because of inefficient code or the LLMs did not handle edge cases? Or they are not as good as humans when it comes to strategy.
levmiseri 4 days ago [-]
I’m not sure honestly. It could be some combination of bad spatial reasoning of the LLMs and lack of any training data for this specific challenge.
You can see replays for all of the matches if you hover over the cells in the table.
neondude 2 days ago [-]
You should check out codingame.com
It has similar battle based objectives
DeathArrow 3 days ago [-]
LLMs need to have feedback of the outcomes. Just like a human does.
Show HN: Yare 2 – Programmable RTS game - https://news.ycombinator.com/item?id=32394902 - Aug 2022 (26 comments)
Show HN: Yare.io – game where you control units with JavaScript - https://news.ycombinator.com/item?id=27365961 - June 2021 (64 comments)
(Btw, reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers!)
From the prompt it looks like you don’t give the llms a harness to step through games or simulate - is that correct? If so I’d suggest it’s not a level playing field vs human written bots - if the humans are allowed to watch some games that is.
I did try letting the models iterate on the bot code based on a summary of an end-of-game ‘report’, but that showed only marginal improvements vs. zero-shot
Step(n) - up to n steps forward
RunTil(movement|death|??) - iterate until something happens
Board(n) - board at end of step n
BoardAscii(n) - ascii rep of same
Log(m,n) - log of what happened between step m and n
Probably all this could be accomplished with a state structure and a rendering helper.
Do you let humans review opposing team’s code?
A question though, why such powerful bots like Gemini 3.1 failed against Clowder bot? Is it because of inefficient code or the LLMs did not handle edge cases? Or they are not as good as humans when it comes to strategy.
You can see replays for all of the matches if you hover over the cells in the table.