OpenBench

OpenBench Testing Framework

Finished
Styxdo	Reckless	world_min_deviation_test	diff	8.0+0.08	LLR: -2.28 (-2.25, 2.89) [0.00, 4.00] Games: 19844 W: 4953 L: 4998 D: 9893 Ptnml(0-2): 108, 2434, 4870, 2415, 95	World's 2 lines, retrain v25
Styxdo	Reckless	world_min_deviation_test	diff	N=25000	LLR: -2.59 (-2.25, 2.89) [0.00, 4.00] Games: 48502 W: 16093 L: 16054 D: 16355 Ptnml(0-2): 1802, 5371, 9844, 5454, 1780	World's 2 lines, retrain v25
Styxdo	Reckless	weights_s2_same_as_s1	diff	8.0+0.08	LLR: -2.33 (-2.25, 2.89) [0.00, 4.00] Games: 20566 W: 4989 L: 5033 D: 10544 Ptnml(0-2): 77, 2557, 5057, 2517, 75	S2{ 640: 1.0, 1536: 0.5, OB: 1.0}
Styxdo	Reckless	weights_s1_v25_s2_equal	diff	8.0+0.08	LLR: -2.26 (-2.25, 2.89) [0.00, 4.00] Games: 6152 W: 1475 L: 1558 D: 3119 Ptnml(0-2): 16, 792, 1549, 697, 22	Same S1 as v25, S2 equal weights for 640 and 1536
Styxdo	Reckless	weights_s1_v25_s2_equal	diff	N=25000	LLR: -2.33 (-2.25, 2.89) [0.00, 4.00] Games: 11852 W: 3838 L: 3934 D: 4080 Ptnml(0-2): 443, 1325, 2454, 1293, 411	Same S1, S2 same weight for 640 and 1536
Styxdo	Reckless	v25-f71908d1	diff	40.0+0.40	LLR: 2.89 (-2.25, 2.89) [0.00, 4.00] Games: 31488 W: 7463 L: 7250 D: 16775 Ptnml(0-2): 28, 3602, 8267, 3823, 24	S1 Higher weight to 640, S2 higher weight to 1536 (S1: 640=1, 1536 =0.5 S2: 640=0.5, 1536=1)
Styxdo	Reckless	v25-f71908d1	diff	8.0+0.08	LLR: 2.96 (-2.25, 2.89) [0.00, 4.00] Games: 11194 W: 2845 L: 2675 D: 5674 Ptnml(0-2): 45, 1291, 2764, 1443, 54	S1 Higher weight to 640, S2 higher weight to 1536 (S1: 640=1, 1536 =0.5 S2: 640=0.5, 1536=1)
Styxdo	Reckless	s1_lr_warmup	diff	8.0+0.08	LLR: -2.25 (-2.25, 2.89) [0.00, 4.00] Games: 27434 W: 6827 L: 6847 D: 13760 Ptnml(0-2): 121, 3372, 6770, 3314, 140	LR Warmup over 1024 batches
Styxdo	Reckless	s1_wdl_warmup_25_5	diff	8.0+0.08	LLR: -2.25 (-2.25, 2.89) [0.00, 4.00] Games: 21814 W: 5409 L: 5446 D: 10959 Ptnml(0-2): 91, 2688, 5396, 2631, 101	S1 WDL Warmup, Includes v24 changes.
Styxdo	Reckless	1024_test_branch	diff	40.0+0.40	LLR: 2.91 (-2.25, 2.89) [0.00, 4.00] Games: 10256 W: 2522 L: 2365 D: 5369 Ptnml(0-2): 6, 1189, 2593, 1322, 18	LTC
Styxdo	Reckless	1024_test_branch	diff	N=25000	Elo: 9.17 +- 3.43 (95%) [N=20000] Games: 20662 W: 7235 L: 6690 D: 6737 Ptnml(0-2): 723, 2111, 4248, 2396, 853	Fixed Node
Styxdo	Reckless	1024_test_branch	diff	8.0+0.08	LLR: 3.00 (-2.25, 2.89) [0.00, 4.00] Games: 14796 W: 3797 L: 3612 D: 7387 Ptnml(0-2): 64, 1751, 3604, 1894, 85	STC
Styxdo	Reckless	896_test_branch	diff	8.0+0.08	LLR: 2.96 (-2.25, 2.89) [0.00, 4.00] Games: 7204 W: 1865 L: 1705 D: 3634 Ptnml(0-2): 26, 827, 1761, 937, 51	Retrain same model, run 1 (for PR message)
Styxdo	Reckless	896_test_branch	diff	40.0+0.40	LLR: 2.90 (-2.25, 2.89) [0.00, 4.00] Games: 7728 W: 1864 L: 1717 D: 4147 Ptnml(0-2): 10, 836, 2027, 979, 12	SF Batchsize standardization, best STC this far on short training, testing LTC
Styxdo	Reckless	896_test_branch	diff	8.0+0.08	LLR: 2.92 (-2.25, 2.89) [0.00, 4.00] Games: 8102 W: 2137 L: 1978 D: 3987 Ptnml(0-2): 33, 911, 2020, 1038, 49	Short Training with SF batch standardization
Styxdo	Reckless	896_test_branch	diff	8.0+0.08	LLR: 2.92 (-2.25, 2.89) [0.00, 4.00] Games: 10020 W: 2633 L: 2465 D: 4922 Ptnml(0-2): 55, 1149, 2439, 1307, 60	896, short training, no wdl warmup, S2 CosineDecay
Styxdo	Reckless	896_test_branch	diff	8.0+0.08	LLR: 2.89 (-2.25, 2.89) [0.00, 4.00] Games: 7060 W: 1874 L: 1717 D: 3469 Ptnml(0-2): 29, 811, 1711, 932, 47	L1=896, this time on the chonky training
Styxdo	Reckless	896_test_branch	diff	8.0+0.08	LLR: 2.91 (-2.25, 2.89) [0.00, 4.00] Games: 36144 W: 9296 L: 9049 D: 17799 Ptnml(0-2): 205, 4231, 8969, 4446, 221	Consistent S1 WDL Scheduler with previous training
Styxdo	Reckless	896_test_branch	diff	8.0+0.08	LLR: 2.99 (-2.25, 2.89) [0.00, 4.00] Games: 103882 W: 26251 L: 25789 D: 51842 Ptnml(0-2): 559, 12548, 25322, 12896, 616	Update nnue.rs
Styxdo	Reckless	896_test_branch	diff	N=25000	LLR: 3.06 (-2.25, 2.89) [0.00, 4.00] Games: 4828 W: 1754 L: 1538 D: 1536 Ptnml(0-2): 158, 486, 992, 538, 240	Update nnue.rs
Styxdo	Reckless	pre_batches_s1_s2	diff	8.0+0.08	LLR: -2.27 (-2.25, 2.89) [0.00, 4.00] Games: 11382 W: 2916 L: 2988 D: 5478 Ptnml(0-2): 93, 1380, 2799, 1344, 75	Added a few pre-s1 and pre-s2 batches. Model branch: styx_warmup_expt1
Styxdo	Reckless	tb-support-3	diff	40.0+0.40	LLR: 2.90 (-2.25, 2.89) [0.00, 4.00] Games: 11862 W: 2805 L: 2659 D: 6398 Ptnml(0-2): 10, 1126, 3508, 1282, 5	UHO LTC?
Styxdo	Reckless	reduced_training_time_expt4	diff	8.0+0.08	LLR: -1.04 (-2.25, 2.89) [0.00, 4.00] Games: 430 W: 89 L: 142 D: 199 Ptnml(0-2): 8, 72, 103, 29, 3	Since partial blending seemed to work, this is full blend on same slim model that was -1 elo.
Styxdo	Reckless	s0_lite_test2	diff	8.0+0.08	LLR: -2.26 (-2.25, 2.89) [0.00, 4.00] Games: 4366 W: 1051 L: 1143 D: 2172 Ptnml(0-2): 24, 578, 1072, 484, 25	Lowered the S0 batches to 15.
Styxdo	Reckless	s0_lite_test1	diff	8.0+0.08	LLR: -2.26 (-2.25, 2.89) [0.00, 4.00] Games: 3682 W: 885 L: 981 D: 1816 Ptnml(0-2): 26, 501, 877, 417, 20	Introduction of S0 as discussed. This is 'lite' version, full training on 3080 to follow.

1 2 3 16 17 18 19 20