Web lists-archives.com

Re: Review Request 129703: [baloo_file_extractor] Limit CPU usage

This is an automatically generated e-mail. To reply, visit: https://git.reviewboard.kde.org/r/129703/

On Декември 27th, 2016, 4:29 след обяд EET, Michael Stemle wrote:

src/tools/balooctl/indexer.cpp (Diff revision 2)
        break; // we don't want a file to be extracted more than once

This may be a dumb comment, but if there are multiple extractors, each potentially pulling metadata in a different way (say, one pulls demographics of the file, its type, its size, etc) and the other pulls metadata from the file itself, wouldn't we want that to be supported?

This loop only appears to be running multiple extractions in the event that there are multiple extractors for the mime-type, each potentially sticking information into different parts of the result.

Does that make sense? It may be a dumb point, but I'm curious to see where I'm wrong.

Look at extractors -> https://github.com/KDE/kfilemetadata/tree/master/src/extractors they report for supported mimetypes and potentially on well-known mimetype you will get only one extractor, the dumpass is to get all extractors when mimetype is unknown i.e. svg' mimetype is "image/svg+xml" there is no extractor for it, so we iterate over all available - huh, why? We can add more flexible code see: get all extractors and test if someome can satisfy "inherit" rules mimetype e.g. svg is text/plain it can be extracted via plaintextextractor.

- Anthony

On Декември 27th, 2016, 7:34 преди обяд EET, Anthony Fieroni wrote:

Review request for Baloo and Vishesh Handa.
By Anthony Fieroni.

Updated Дек. 27, 2016, 7:34 преди обяд

Repository: baloo


Processing large directories, +5000 files, can be CPU eater. Large file, itself, can be another issue.


  • src/file/extractor/app.cpp (97332469)
  • src/tools/balooctl/indexer.cpp (45e42c1c)

View Diff