Web lists-archives.com

Re: [PATCH] t4062: stop using repetition in regex




René Scharfe <l.s.r@xxxxxx> writes:

> There could be any characters except NUL and LF between the 4096 zeros
> and "0$" for the latter to match wrongly, no?  So there are 4095
> opportunities for the misleading pattern in a page, with probabilities
> like this:
>
>   0$                          1/256 * 2/256
>   .0$         254/256       * 1/256 * 2/256
>   ..0$       (254/256)^2    * 1/256 * 2/256
>   .{3}0$     (254/256)^3    * 1/256 * 2/256
>
>   .{4094}0$  (254/256)^4094 * 1/256 * 2/256
>
> That sums up to ca. 1/256 (did that numerically).  Does that make
> sense?

Yes, thanks.  I think the number would be different for "^0*$" (the
above is for "0$") and moves it down to ~1/30000, but as I said,
allowing additional false success rate is unnecessary (even if it is
miniscule enough to be acceptable), so let's take the 64*64 patch.

>> So we are saying that we accept ~1/100 false success rate, but
>> additional ~1/30000 is unacceptable.
>> 
>> I do not know if I buy that argument, but I do think that additional
>> false success rate, even if it is miniscule, is unnecessary.  So as
>> long as everybody's regexp library is happy with "^0{64}{64}$",
>> let's use that.
>
> The parentheses are necessary ("^(0{64}){64}$"), at least on OpenBSD.

Sorry, what I wrote was merely a typo; the one from you I applied
did have the parens so we are good.

Thanks.