Web lists-archives.com

Re: [PHP] Catch line indetation




OK, in the interest of science I implemented a test:

The difference in performance is absolutely minimal:
0.07233524322509943 micro seconds on a million iterations (0.795%). The
regex seemed to be winning with less iterations which I didn't expect, even
if regexes would be faster.

*Conclusion: Just use regexes as string handling code can get complicated
fast and I think more complex regexes outperform complicated string
handling. Even just adding a character that is not trimmed by ltrim to this
makes the string handling really hard.*

Note also, that the regex here might be compiled every time, in which case
regexes are the clear winner. I think PHP might cache the compiled regex
though.

Result:
php regex.php
time for string: 9.1010239124298
time for regex: 9.1733591556549

Code:
<?php

define('LINE_LENGTH', 80);
define('ITERATIONS', 1000000);
function getSpaced() {
$amount = rand(0, LINE_LENGTH);
$nonSpaceAmount = rand(0, LINE_LENGTH);
$spaces = str_repeat(' ', $amount);
$rest = '';
for ($i = 0; $i < $nonSpaceAmount; ++$i) {
$rest .= chr(rand(ord('A'), ord('z')));
}
return $spaces . $rest;
}
$start = microtime(true);
for ($i = 0; $i < ITERATIONS; ++$i) {
$string = getSpaced();
substr($string, 0, strlen($string) - strlen(ltrim($string)));
}
echo "time for string: " . (microtime(true) - $start) . "\n";

$start = microtime(true);
for ($i = 0; $i < ITERATIONS; ++$i) {
$string = getSpaced();
preg_match('/^( *)/', $string, $matches);
//$matches[1];
}
echo "time for regex: " . (microtime(true) - $start) . "\n";


On Sat, 29 Oct 2016 at 15:37 German Geek <geek.de@xxxxxxxxx> wrote:

> String functions are very fast. Regexes have to be compiled under the hood
> to take advantage of their speed. PHP does this behind the scenes. So, if
> you are only looking for spaces it's going to run faster in my humble
> opinion.
>
> However, I agree that regexes are probably better in any case, because
> they are much more powerful and for someone who understands them, just as
> easy to read if not easier, especially in this example.
>
> The difference in performance is probably not noticeable, especially not
> nowadays. Saving developer time is more important and I would use regexes
> as well.
>
> I could be wrong about regexes being slower. It's just what I read
> somewhere. I guess one would have to do the test on a large input to verify
> on a case by case basis. As far as I understand regexes have to perform
> string functions also, which I think are probably more complicated than in
> this example. Again, something to test.
>
> I would want to know, just out of interest though. :-)
>
> On Sat, 29 Oct 2016 at 12:40 Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx>
> wrote:
>
>
>
> On 28 October 2016 23:33:00 BST, German Geek <geek.de@xxxxxxxxx> wrote:
> >regex is nicer, because it is less code and you can detect any white
> >space
> >etc.
> >
> >However!
> >
> >substring etc will be faster and more understandable to others who do
> >not
> >know much about regexes.
> >
> >On Sat, 29 Oct 2016 at 02:21 Christoph M. Becker <cmbecker69@xxxxxx>
> >wrote:
> >
> >> On 28.10.2016 at 14:51, Richard wrote:
> >>
> >> >> Date: Friday, October 28, 2016 12:09:31 +0100
> >> >> From: Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx>
> >> >>
> >> >> On 28 October 2016 12:01:16 BST, Narcis Garcia
> >> >> <informatica@xxxxxxxxx> wrote:
> >> >>
> >> >>> Hello, I have a string (I quote here only) as:
> >> >>>
> >> >>> '   <table>...</table>'
> >> >>>
> >> >>> As you can see there are 3 spaces at the beginning, but it could
> >> >>> be 0 or
> >> >>> 4 or any number of spaces.
> >> >>> How can I get a string with only the initial spaces part?
> >> >>>
> >> >>> '   <table>...</table>' -> '   '
> >> >>> 'hello' -> ''
> >> >>> ' hello' -> ' '
> >> >>>
> >> >>> Thanks.
> >> >>
> >> >> Have you tried regular expressions? Something like:
> >> >>
> >> >> ^( )*[^ ]
> >> >>
> >> >> The first captured match is the number of spaces, from 0 to any
> >> >> amount. Not the space between the brackets and before the closing
> >> >> square bracket
> >> >
> >> > You need to take into consideration that "whitespace" can be
> >created
> >> > by more than the simple "space" (ascii 32) character. A
> >"[horizontal]
> >> > tab" (ascii 9) is common, but also look at the top of php trim
> >> > function documentation:
> >> >
> >> >   <http://php.net/manual/en/function.trim.php>
> >> >
> >> > to see the characters that it handles as "whitespace".
> >>
> >> If general whitespace should be detected with a regexp, \s could be
> >used.
> >>
> >> > While "trim"
> >> > does the opposite of what you're after, […]
> >>
> >> Indeed, so one could do something like
> >>
> >>   substr($string, 0, strlen($string) - strlen(ltrim($string)))
> >>
> >> I'd prefer a regexp solution, though.
> >>
> >> --
> >> Christoph M. Becker
> >>
> >>
> >> --
> >> PHP General Mailing List (http://www.php.net/)
> >> To unsubscribe, visit: http://www.php.net/unsub.php
> >>
> >>
>
> I really don't think performing two strlen() calls, a substr(), & an
> ltrim() is going to be faster than a regular expression.
>
> I don't think you should avoid regex's because some people don't
> understand them. It's a very simple regular expression. You wouldn't tell
> someone to avoid PDO and use mysql_* functions because PDO is too
> complicated for some people would you?
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>