Web lists-archives.com

[PHP] Counting File Lines in XMLReader with a Large File


As a new PHP user, I've recently completed a PHP program that extracts a bunch of data from a relatively unstructured XML file. The file has roughly 500,000 lines and I have no control over its generation.

The file generally has one XML tag like <foo> per line, but sometimes lines are more complicated.

After a lot of reading and experimenting, I found that XMLReader was the tool for getting the data.

As part of my debugging process, I used the function LineNumber = $reader->expand()->getLineNo(); (after doing $reader->open( "InputFileName" ); ) to get the file line number that the XMLReader cursor was pointing to. Eventually I found that files larger than about 65535 lines returned wrong line numbers. Again after some online searching, I found a discussion from about 2006 between a PHP user and a developer that pretty much explained what was going on: the XMLReader program uses a 16-bit integer to count file line numbers, which of course is limited to 65535. The developer said he would not fix this, for various reasons.

I ended up splitting the original XML file into smaller pieces under 65535 lines each, and concatenating the results.

It appears that this line numbering issue remains today. Are there any plans to make file line numbering work with larger files?

One of the PHP developer's points was that XML does not necessarily include Newlines that would result in file lines, but that all content could be in one giant string. True in principle, but not in practice where human readers are involved. I know that I would have been hard put to debug my PHP code without being able to correlate file lines with XMLReader cursor positions.



PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php