Web lists-archives.com

Re: Review Request 129703: [baloo_file_extractor] Limit CPU usage




This is an automatically generated e-mail. To reply, visit: https://git.reviewboard.kde.org/r/129703/

On January 3rd, 2017, 12:51 a.m. EET, Albert Astals Cid wrote:

Without knowing anything about baloo this looks totally wrong

QList<KFileMetaData::Extractor*> exList = m_extractorCollection.fetchExtractors(mimetype);

why would not you want to iterate over all the iterators that support a given mimetype?

On January 3rd, 2017, 7:01 a.m. EET, Anthony Fieroni wrote:

It's a waste of time. Extractor should store file content in DB for fast access when file content search is performed, so if more than one extractor performs a file it will result in high cpu usage and huge transaction size in DB, basically file content * num of extractors, at least we loose time and disk size for nothing.

On January 3rd, 2017, 1:46 p.m. EET, Jan Kundrát wrote:

Do you have some numbers as a result of profiling? Have you checked that the existing extractors are in fact redundant? Is the order of their presence in the returned list of extractors deterministic and is the most specific one returned first?

One small example, there is a generic plantext extractor which returns a number of lines in any file with the text/* MIME type. Your patch changes that.

On January 3rd, 2017, 1:58 p.m. EET, Anthony Fieroni wrote:

  1. No
  2. Yes
  3. No About me it's better to make some flag, or whatever, to indicate a parser has done his work and we can safety stop iteration. At least this patch tries to reduce CPU usage, it's not a panacea

On January 3rd, 2017, 2:48 p.m. EET, Stefan Brüns wrote:

  1. You claim it is useful to reduce CPU usage, but fail to provide any data points.
  2. Please provide a list of redundant extractors
  3. An extractor knows if itself has extracted any data, it can not know if a different extractor may find any data. Extractors may be orthogonal and provide different data.

I claim that there's no redundant extractor, but depend on mimetype (if there's no known extractors) they can be more that one, where i see potential problem. I haven't any plans to test this feature, i expose my point of view. I try to correct and other side https://git.reviewboard.kde.org/r/129720/


- Anthony


On January 3rd, 2017, 1:43 p.m. EET, Anthony Fieroni wrote:

Review request for Baloo, Boudhayan Gupta, Pinak Ahuja, and Vishesh Handa.
By Anthony Fieroni.

Updated Jan. 3, 2017, 1:43 p.m.

Repository: baloo

Description

Processing large directories, +5000 files, can be CPU eater. Large file, itself, can be another issue.

Diffs

  • src/file/extractor/app.cpp (97332469)
  • src/tools/balooctl/indexer.cpp (45e42c1c)

View Diff