Web lists-archives.com

Re: [RFC PATCH v1] telemetry design overview (part 1)




On Sat, Jun 09 2018, Duy Nguyen wrote:

> On Sat, Jun 9, 2018 at 12:22 AM Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
>>
>>
>> On Fri, Jun 08 2018, Johannes Sixt wrote:
>>
>> > Am 08.06.2018 um 18:00 schrieb Thomas Braun:
>> >> I for my part would much rather prefer that to be a compile time
>> >> option so that I don't need to check on every git update on windows
>> >> if  this is now enabled or not.
>> >
>> > This exactly my concern, too! A compile-time option may make it a good
>> > deal less worrisome.
>>
>> Can you elaborate on how someone who can maintain inject malicious code
>> into your git package + config would be thwarted by this being some
>> compile-time option, wouldn't they just compile it in?
>
>
> Look at this from a different angle. This is driven by the needs to
> collect telemetry in _controlled_ environment (mostly server side, I
> guess) and it should be no problem to make custom builds there for
> you.

Let's say you're in a corporate environment with Linux, OSX and Windows
boxes, but all of whom have some shared mounts provisioned & ability to
ship an /etc/gitconfig (wherever that lives on Windows).

It's much easier to just do that than figure out how to build a custom
Git on all three platforms.

I guess you might make the argument that "that's good", because in
practice that'll mean that it's such a hassle that fewer administrators
will turn this on.

But I think that would be a loss, because that's taking the default view
that people with the rights (i.e. managed config access) to turn on
something like this by default have nefarious motives, and we should do
what we can to stop them.

I don't think that's true, e.g. what I intend to use this for is:

 a) Getting aggregate data on what commands/switches are used, for
    purposes of training and prioritizing my upstream contributions.

 b) Aggregate performance data to figure out what hotspots to tackle in
    the code.

That's things that'll both benefit the users I'm responsible for, and
the wider git community.

> Not making it a compile-time option could force [1] linux distro
> to carry this function to everybody even if they don't use it (and
> it's kinda dangerous to misuse if you don't anonymize the data
> properly). I also prefer this a compile time option.

Setting GIT_TRACE to a filename that you published is also similarly
dangerous, so would setting up a trivial 4-line shell alias to wrap
"git" and log what it's doing.

> [1] Of course many distros can choose to patch it out. But it's the
> same argument as bringing this option in in the first place: you guys
> already have that code in private and now want to put it in stock git
> to reduce maintenance cost, why add extra cost on linux distro
> maintenance?

Because:

1) I really don't see the basis for this argument that they'd need to
   patch it out, they're not patching out e.g. GIT_TRACE now, which has
   all the same sort of concerns, it's just a format that's more of a
   hassle to parse than this proposed format.

2) I think you and Johannes are just seeing the "telemetry" part of
   this, but if you look past that all this *really* is is "GIT_TRACE
   facility that doesn't suck to parse".

   There's a lot of use-cases for that which have nothing to do with
   what this facility is originally written for, for example, a novice
   git user could turn it on and have it log in ~ somewhere, and then
   run some contrib script which analyzes his git usage and spews out
   suggestions ("you use this command/option, but not this related
   useful command/option").

   Users asking for help on the #git IRC channel or on this mailing list
   could turn this on if they have a problem, and paste it into some
   tool they could show to others to see exactly what they're doing /
   where it went wrong.