Web lists-archives.com

Re: [PHP] RegOops




Well, I found my answer and thought I'd share here.
It was explained to me that...

[  The regex engine is trying to satisfy the conditions I applied as greedily as possible.
`\w` matches digits or letters, and I gave it a greedy quantifier: +
This means grab as much as you can and give back if necessary. 
So it grabbed `vehicle10` but then next token wanted at least one digit, 
so it backtracked and matched `vehicle1`  ]

(credit : firasdib)

Thank you all for your contributions.

Best,

Karl DeSaulniers
Design Drumm
http://designdrumm.com



On Oct 23, 2015, at 8:06 PM, Karl DeSaulniers <karl@xxxxxxxxxxxxxxx> wrote:

> This is where I tested at.
> 
> http://www.phpliveregex.com
> 
> Could it be their parser doing this or is this correct protocol?
> Just seems odd to me.
> 
> Best,
> 
> Karl DeSaulniers
> Design Drumm
> http://designdrumm.com
> 
> 
> 
> On Oct 23, 2015, at 8:04 PM, Karl DeSaulniers <karl@xxxxxxxxxxxxxxx> wrote:
> 
>> Correct. 
>> 
>> So with my case in needing to grab just words,  
>> 
>> [\D]  will grab all words, dashes, hyphens etc.
>> Same with [\S]. 
>> 
>> In essence grabbing all words if there is nothing else to grab except a number.
>> However, the shorthand \w does not and would seem (to me) that it should by definition only capture words and not the number 1.
>> Franks explanation makes some sense to me, but how come it didn't grab the number 0 then? 
>> If you notice, the 10 got split up.
>> 
>> Best,
>> 
>> Karl DeSaulniers
>> Design Drumm
>> http://designdrumm.com
>> 
>> 
>> 
>> On Oct 23, 2015, at 7:54 PM, German Geek <geek.de@xxxxxxxxx> wrote:
>> 
>>> In regular expressions a backslash capital letter means the opposite. So, \D is NON-digits, \W is NON-word characters and \S is NON-whitespace. You can also do [A-z]* to get all letters in the English language plus the characters between them like ^ and literal \.
>>> 
>>> I believe you can also do Unicode ranges with the respective \usomehex, but I haven't tried that yet.
>>> 
>>> Tim
>>> 
>>> On Sat, 24 Oct 2015 at 12:38 Karl DeSaulniers <karl@xxxxxxxxxxxxxxx> wrote:
>>> On Oct 23, 2015, at 7:54 AM, Frank Arensmeier <farensmeier@xxxxxxxxx> wrote:
>>> 
>>> >
>>> >> 23 okt 2015 kl. 14:44 skrev Karl DeSaulniers <karl@xxxxxxxxxxxxxxx>:
>>> >>
>>> >> Hello all,
>>> >> With the given string..
>>> >>
>>> >> vehicle10-vehicle-name
>>> >>
>>> >> Running regex in a preg_match like
>>> >>
>>> >> "/(\w+)([0-9+]+)-(.*)/"
>>> >>
>>> >> I am getting.
>>> >>
>>> >> array(
>>> >>      0       =>      vehicle10-vehicle-name
>>> >>      1       =>      vehicle1
>>> >>      2       =>      0
>>> >>      3       =>      vehicle-name
>>> >> )
>>> >>
>>> >> If I change it to.
>>> >>
>>> >> "/(\D+)([0-9+]+)-(.*)/"
>>> >>
>>> >> it works as expected.
>>> >>
>>> >> array(
>>> >>      0       =>      vehicle10-vehicle-name
>>> >>      1       =>      vehicle
>>> >>      2       =>      10
>>> >>      3       =>      vehicle-name
>>> >> )
>>> >>
>>> >> Why is the \w directive including a digit?
>>> >> Since when is the number 1 a word??
>>> >>
>>> >> If anyone could enlighten me, I would greatly appreciate it.
>>> >>
>>> >> TIA
>>> >>
>>> >> Best,
>>> >>
>>> >> Karl DeSaulniers
>>> >> Design Drumm
>>> >> http://designdrumm.com
>>> >>
>>> >
>>> > Hi Karl!
>>> >
>>> > I am not able to pinpoint the exact definition in the official PCRE documentation right now (http://www.pcre.org). But the short hand \w does in deed include numbers. As you can read here for example (https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html),
>>> >
>>> > \w    A word character: [a-zA-Z_0-9]
>>> >
>>> > Although its already Friday, your pattern is working as expected.
>>> >
>>> > /frank
>>> >
>>> 
>>> OH, ok, so the \w basically is the shorthand of [a-zA-Z_0-9]?
>>> That would make sense, however I think it is misleading as there are \D and \S which denote grabbing word and or digits respectfully.
>>> I thought that \w meant one 'word' character (not digit or special characters or space or new line, just a word),
>>> or at least that is what I have read in my searches, hence the question here.
>>> 
>>> Thank for your response!
>>> 
>>> Best,
>>> 
>>> Karl DeSaulniers
>>> Design Drumm
>>> http://designdrumm.com
>>> --
>>> PHP General Mailing List (http://www.php.net/)
>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>> 
>> 
>