Web lists-archives.com

Re: [PATCH v9 6/8] convert: check for detectable errors in UTF encodings




On Sun, Mar 4, 2018 at 3:14 PM,  <lars.schneider@xxxxxxxxxxxx> wrote:
> Check that new content is valid with respect to the user defined
> 'working-tree-encoding' attribute.
>
> Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx>
> ---
> diff --git a/convert.c b/convert.c
> @@ -266,6 +266,53 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
> +static int validate_encoding(const char *path, const char *enc,
> +                     const char *data, size_t len, int die_on_error)
> +{
> +       if (!memcmp("UTF-", enc, 4)) {
> +               [...]
> +               if (has_prohibited_utf_bom(enc, data, len)) {
> +                       [...]
> +                       if (die_on_error)
> +                               die(error_msg, path, enc);
> +                       else {
> +                               return error(error_msg, path, enc);
> +                       }
> +               } [...]
> +       return 0;
> +}
> @@ -291,6 +338,9 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,
> +       if (validate_encoding(path, enc, src, src_len, die_on_error))
> +               return 0;

There could be a cleaner separation of responsibilities here which, as
a nice side-effect, would eliminate the repeated "if (die_on_error)
die(...); else return error(...);" pattern. Rather than passing a
'die_on_error' flag to validate_encoding(), it could accept a 'strbuf
*err' to populate in case of error:

    static int validate_encoding(..., struct strbuf *err)
    {
        if validation error:
            populate 'err'
            return -1;
        return 0
    }

and let the caller be responsible for deciding how to handle failure:

    struct strbuf err = STRBUF_INIT;
    ...
    if (validate_encoding(..., &err)) {
        if (die_on_error)
            die(err.buf);
        else {
            error(err.buf);
            strbuf_release(&err);
            return 0;
        }
    }

Not necessarily worth a re-roll, but perhaps a cleanup someone could
submit at some point if interested.

>         dst = reencode_string_len(src, src_len, default_encoding, enc,
>                                   &dst_len);