Winter Marathon 0 inaccuracies game

@ #19
"""My point is that the inaccuracy/mistake/blunder count from the analysis is a rough guide, because it excludes moves with centipawn loss less than 50."""

That is wrong, blunder is when you lose >= 300 cp, is that a rough guide to you? When you face SF with that error SF probably will get 99% win probability. I don't believe it is rough.

Take a CPL >= 100, it is a mistake, SF is still very dangerous with that score difference.

Take a CPL >= 50 SF is still dangerous with that score difference.

It is just fine that Lichess did not consider categorizing a CPL < 50 cp because they are analyzing every move with a limited time, this depends on the hardware too.

OneOfTheQ

#22

@RealKool: Well, if you're going to completely misread what I write, then there's not much I can do for you :)

I said multiple times that the inaccuracy/mistake/blunder count is useful for finding big mistakes.

It is only a rough guide to game quality because a game can be quite poorly played and be 0/0/0.

Yes, if it shows that you have big mistakes, that is helpful. If it doesn't, though, that doesn't necessarily mean you played all that well.

To put it another way, as a test for a high-quality game, the inaccuracy/mistake/blunder count is susceptible to false positives.

False negatives are incredibly rare (if it says you made a big mistake, you almost certainly did), so it's useful for detecting games that were played very poorly by engine-standards.

Again, my point is that the inference from "The game was 0/0/0" to "The game was well-played" is a very weak one. That is why it's a rough guide.

I have never said that the big mistakes (over 50 cp) are not important. Why you are trying to convince me they are is baffling :)

RealKool

#23

@ #22
I notice you are changing colors now.

First you say very rough guide.
Then rough guide.

An now ...
"""inaccuracy/mistake/blunder count is useful for finding big mistakes"""

OneOfTheQ

#24

Heh, I'm not changing colors at all. I don't much care whether we call it a "very rough guide" or a "rough guide". Both work for me.

Also, nothing in my post #22 was different than what I said in my previous post.

In post #19 I already said "The inaccuracy/mistake/blunder count is important..." and "That latter measure is quite useful for other reasons, like finding your biggest mistakes, but for overall quality of play it is a very rough guide."

So, as I pointed out in #22, I've already multiple times said that the inaccuracy/mistake/blunder count is important and useful. It's not my fault if you didn't read those parts :)

At any rate, it seems you just misunderstood my claim, and I don't think we actually disagree here.

We both agree that the inaccuracy/mistake/blunder count is important, and I've pointed out that while important, a fairly poorly played game can still get 0/0/0, so it's only a rough (or very rough, I don't care what wording we pick) guide for quality of play.

It's quite good at pointing out especially bad moves, but there are a lot of bad moves it misses, so once the quality of play rises above a certain level, that count gets less useful for discerning differences in quality of play.

As I said, our "disagreement" seems more just a misunderstanding of the claims being advanced, and not a disagreement about the facts.

Cheers!

RealKool

#25

@ #24

Note I reacted mainly from your post #19, you seem to disregard the importance of Blunder/Mistake/Inaccuracy.

I have a simple question to clarify this matter.

Do you still consider that a Blunder (CPL >= 300) is a very rough guide as one of the measures in determining the quality of play?

RealKool

#26

@ OneOfTheQ

I have to correct my post #25.

I reacted mainly from your post #11, and not #19.

OneOfTheQ

#27

@RealKool: Yes, I consider the blunder count especially to be a very rough guide.

If analysis shows that a game contains blunders (centipawn loss >=300), then that it is a strong indication of poor game quality. In that case it is very useful.

However, if the count shows 0 moves with centipawn loss over 300, then I don't really know much about the game quality.

It could be fantastic game of amazing quality, or it could be a really terrible game where both players traded 150 centipawn mistakes the entire game.

The same applies to the smaller centipawn losses of mistakes and inaccuracies. When a mistake or inaccuracy is present, that's a strong indication of poor moves, and then the count is very useful.

When the count is 0, we don't really know what the game quality is like. The players might have just traded 40 centipawn mistakes the whole game, in which case the game could be 0/0/0 and of very low quality. Alternatively, it might have been a fantastic game where neither player really had any centipawn loss at all.

The fact that the count is 0 doesn't let us conclude much at all about the quality of the game.

That's been my point the whole time. When those counts are >0, then they're very useful, since we can be quite sure that some big mistakes were made.

When those counts are 0, though, we can't really infer anything from that.

Essentially, we can get two sorts of results from the counts:

1) A given count (inaccuracy, mistake, or blunder) is >0.

2) A given count (inaccuracy, mistake, or blunder) is 0.

If 1), then that gives us a strong reason to think the game was poorly played, and the test is useful.

If 2), then for the reasons I've given, we can't really conclude whether the game was well played or not. It could be terrible, or it could be great.

I think the issue was you were only thinking of possibility 1). If we are only talking about times when a count is >0, then that count is very useful in identifying those big mistakes.

The problem is that there are lots of serious mistakes that the count misses because of the high threshold, so if it says 0, then you don't know much.

That's where the average centipawn loss becomes helpful, although it too is not perfect, of course.

Cheers!

OneOfTheQ

#28

Also, just for a fun illustration of my point, check out this game:

http://en.lichess.org/t5o6r0tV#36

White is 0/0/0 and black is 1/0/0. White's played better, right? :)

Toadofsky

#29

#28 I copied this game as http://en.dugovic.mooo.com/1pqeu8is which uses thresholds of 250 / 100 / 40 instead of 300 / 100 / 50. What do you think?

OneOfTheQ

#30

Yeah, that result makes a bit more sense.

Honestly, I usually just use average centipawn loss as my bird's eye view of how well a side played, since it's a lot more reliable for that purpose.

Of course, it doesn't point out specific moves to focus on, but I find it easier to eyeball the graph for points where the eval dips or rises.

In short, I'm not so concerned with "fixing" the thresholds for those categories. Any choice of threshold will be arbitrary, and not really "right" or "wrong".

It's more a matter of making sure people understand what those numbers mean so they can use them appropriately.

Having said that, the thresholds you suggest do (in my opinion, at least) make more sense than the current ones, which are a bit on the high side for my tastes.

I know that the thresholds can't be adjusted too far down, because otherwise you start seeing a lot of noise, especially with the relatively shallow depths used.

Having said that, I think the size of centipawn loss that starts to matter is smaller than a lot of us would think.

It's not in SF anymore, I think, but I once held a big bullet tournament with SF compiled with different grain size settings.

That setting basically determines the precision of the evaluations; my original intention was to make a lazy man's tactics finder by making it only "care" about larger evaluations.

An interesting thing that came out of the tournament was how quickly performance deteriorated in game play. With evaluation precision at 1,2,4 (default), and 8 centipawns, results were all within the margin of error. At 16 centipawns, performance dropped off fairly substantially, and continued to drop off as the evaluations became more coarse (the coarsest-grained evaluation I used was a precision of 128 centipawns; that version of SF did not do well against the others).

I had always assumed that evaluation differences of 8-16 centipawns were probably mostly noise, but the fact that washing out those evals resulted in a significant drop in strength suggests otherwise (at least for players like SF, who have largely eliminated big tactical mistakes).

At the end of the day, all of these metrics are just tools, and they each have their uses. Average centipawn loss and the eval graph are my favorites here, and I rarely use the inaccuracy/mistake/blunder count, but that's a matter of personal preference.

We just have to understand each metric's limitations and use them appropriately. :)

This topic has been archived and can no longer be replied to.