[PHP] Counting File Lines in XMLReader with a Large File
- Date: Wed, 30 Aug 2017 11:18:51 -0600
- From: Alan Feuerbacher <alanf00@xxxxxxxxxxx>
- Subject: [PHP] Counting File Lines in XMLReader with a Large File
As a new PHP user, I've recently completed a PHP program that extracts a
bunch of data from a relatively unstructured XML file. The file has
roughly 500,000 lines and I have no control over its generation.
The file generally has one XML tag like <foo> per line, but sometimes
lines are more complicated.
After a lot of reading and experimenting, I found that XMLReader was the
tool for getting the data.
As part of my debugging process, I used the function LineNumber =
$reader->expand()->getLineNo(); (after doing $reader->open(
"InputFileName" ); ) to get the file line number that the XMLReader
cursor was pointing to. Eventually I found that files larger than about
65535 lines returned wrong line numbers. Again after some online
searching, I found a discussion from about 2006 between a PHP user and a
developer that pretty much explained what was going on: the XMLReader
program uses a 16-bit integer to count file line numbers, which of
course is limited to 65535. The developer said he would not fix this,
for various reasons.
I ended up splitting the original XML file into smaller pieces under
65535 lines each, and concatenating the results.
It appears that this line numbering issue remains today. Are there any
plans to make file line numbering work with larger files?
One of the PHP developer's points was that XML does not necessarily
include Newlines that would result in file lines, but that all content
could be in one giant string. True in principle, but not in practice
where human readers are involved. I know that I would have been hard put
to debug my PHP code without being able to correlate file lines with
XMLReader cursor positions.
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php