Re: [RFC PATCH 1/2] softirq: Account time and iteration stats per vector

On Fri, Jan 12, 2018 at 10:12:32AM -0800, Linus Torvalds wrote:
> On Fri, Jan 12, 2018 at 6:34 AM, Frederic Weisbecker
> <frederic@xxxxxxxxxx> wrote:
> >
> > That's right. But I thought it was bit large for the stack:
> >
> >       struct {
> >           u64 time;
> >           u64 count;
> >       } [NR_SOFTIRQS]
> Note that you definitely don't want "u64" here.
> Both of these values had better be very limited. The "count" is on the
> order of 10 - it fits in 4 _bits_ without any overflow.
> And 'time' is on the order of 2ms, so even if it's in nanoseconds, we
> already know that we want to limit it to a single ms or so (yes, yes,
> right now our limit is 2ms, but I think that's long). So even that
> doesn't need 64-bit.


> Finally, I think you can join them. If we do a "time or count" limit,
> let's just make the "count" act as some arbitrary fixed time, so that
> we limit things that way.
> Say, if we want to limit it to 2ms, consider one count to be 0.2ms. So
> instead of keeping track of count at all, just say "make each softirq
> call count as at least 200,000ns even if the scheduler clock says it's
> less". End result: we'd loop at most ten times.
> So now you only need one value, and you know it can't be bigger than 2
> million, so it  can be a 32-bit one. Boom. Done.


Now I believe that the time was added as a limit because count alone was not
reliable enough to diagnose a softirq overrun. But if everyone is fine with
keeping the count as a single metric, I would be much happier because that
means less overhead, no need to fetch the clock, etc...

> Also, don't you want these to be percpu, and keep accumulating them
> until you decide to either age them away (just clear it in timer
> interrupt?) or if the value gets so big that you want o fall back to
> the thread instead (and then the thread can clear it every iteration,
> so you don't need to track whether the thread is active or not).
> I don't know. I'm traveling today, so I didn't actually have time to
> really look at the patches, I'm just reacting to Eric's reaction.

Clearing the accumulation on tick and flush, that sounds like a good plan.
Well I'm probably not going to use the tick for that because of nohz (again)
but I can check if jiffies changed since we started the accumulation and
reset it if so.

I'm going to respin, thanks!