Web lists-archives.com

Re: [Samba] Character encoding mystery




On Thu, Apr 26, 2018 at 07:29:41PM +0200, Emmanuel Florac via samba wrote:
> Hi everyone,
> 
> I have a very annoying character encoding problem. Have a look to this:
> 
> #  ls -l M*mo-1.*
> -rw-rw-rw- 1 root root 8417218  6 sept.  2013 Mémo-1.aif
> -rwxr--r-- 1 hope hope 8417218  6 sept.  2013 Mémo-1.aif
> -rw-rw-rw- 1 root root  363175  6 sept.  2013 Mémo-1.m4a
> -rwxr--r-- 1 hope hope  363175  6 sept.  2013 Mémo-1.m4a
> 
> Yes, it looks like two files have exactly the same name, but actually
> they're different: one as "é" encoded as 0xCC81, and the other one (the
> "good one") as 0xC3A9. Of course similar problems occur for all accented
> letters.

0xC3A9 is utf-8 of é, so that's correct.

> So here's the setup: I have a very weird proprietary system (DDP
> server), probably running internally some ancient version of Samba.
> People copied these files to this old server from Mac workstations. So
> far so good.
> 
> I have a new server, running CentOS 7.3 and Samba 4.6. I mounted the
> CIFS exports from the DDP server :
> 
> # mount | grep temp
> 
> //192.168.5.150/w-rushes-temp on /mnt/w-rushes-temp type cifs
> (ro,relatime,vers=1.0,cache=strict,username=admin,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.5.150,soft,unix,posixpaths,serverino,mapposix,acl,rsize=1048576,wsize=65536,echo_interval=60,actimeo=1)
> 
> Listing the files on this mount everything looks good at first glance:
> 
> # ls -l M*mo-1.*
> 
> -rw-rw-rw- 1 root root 8417218  6 sept.  2013 Mémo-1.aif
> -rw-rw-rw- 1 root root  363175  6 sept.  2013 Mémo-1.m4a
> 
> Now I copy the files from the old system to the new one, using cp -a,
> or rsync.
> 
> Then when connecting with the Mac to the new server using SMB, you
> can't see any of the files with accented characters in the name. But
> they're here, though invisible from the Mac Finder (they look fine when
> listed from the terminal, as you've seen before).
> 
> If I copy the file from the Mac Finder, or I create a new file with
> "touch héhohàhù" they appear perfectly fine, with accents and all.
> 
> What can be the cause of this weird encoding effect? 

I'm guessing this is compose character effect. MacOSX uses
unicode "compose" characters to stich together an accent
onto an existing chacter. I think MacOSX is the only system
that uses this as standard.

I think you should be able to fix this using iconv, although
you might want to do this carefully on the server.

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba