Web lists-archives.com

Re: List words separated by comma and without duplicates




-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, Apr 29, 2018 at 07:43:34PM +0200, Rodolfo Medina wrote:
> Hi all.
> 
> Suppose I have a file made by a list of names and surnames, e.g.:
> 
> Arvo Part
> Harold Pinter
> Lucio Battisti
> Antonio Amurri
> Eduardo De Filippo
> Eduardo De Filippo
> 
> and that I want them listed all on one line, separated by a comma and without
> duplicates, i.e.:
> 
> Arvo Part, Harold Pinter, Lucio Battisti, Antonio Amurri, Eduardo De Filippo
> 
> How can I do that with proper Unix commands?

The "classical":

  tomas@trotzki:/tmp$ cat foo
  Arvo Part
  Harold Pinter
  Lucio Battisti
  Antonio Amurri
  Eduardo De Filippo
  Eduardo De Filippo

tomas@trotzki:/tmp$ sort -u < foo | sed -e 's/$/,/g' | tr '\n' ' '
Antonio Amurri, Arvo Part, Eduardo De Filippo, Harold Pinter, Lucio Battisti, 

Note that (a) the list ends with ", " (comma-space) and has no newline at
the end, but I preferred to present a simple proto-solution easier for you
to munge.

The tr is there because sed goes line-wise and thus can't replace newlines.
Except that it can; you pay for it, though [1], [2].

Finally, all this assumes that you're OK with the result being sorted
lexicographically: if you want to keep the original order, you'll probably
have to resort to awk or perl (unless you want to earn some obfuscated
Unix ninja price or something ;-)

For a Perl example which doesn't sort (sorry for the long line):

  tomas@trotzki:/tmp$ perl -e 'while(<STDIN>) {chop; push(@result, $_) unless $seen{$_}++;} print(join(", ", @result), "\n");' < foo
Arvo Part, Harold Pinter, Lucio Battisti, Antonio Amurri, Eduardo De Filippo

Note that the sort/sed/tr example above might cope well with a file
wich doesn't fit in your RAM, while the perl example... less so.
Times have changed :-)

(to be fair, the quoted examples below seem also to rely on sucking
all the whole file into a (potentially giant) line).

Cheers
[1] https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed
[2] https://unix.stackexchange.com/questions/114943/can-sed-replace-new-line-characters

- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlrmDLwACgkQBcgs9XrR2kbqlACdEvT/8Mdf58Y2X+liEyVAScNC
DxoAnjsB0+cE5eLzLc7+RjjbRP0BhNVw
=fpGE
-----END PGP SIGNATURE-----