[macemacsjp-english 197] Encoding problems (this time with the attachments!)

Zurück zum Archiv-Index

Jose Figueroa-O'Farrill j.m.f****@ed*****
Tue Sep 13 19:20:28 JST 2005


Hi,

I am having some problems with encoding which I hope someone in this
list can help me with.  I am not sure that they are Carbon Emacs
specific, hence if you think this post is off-topic, please direct me
to a more appropriate list.

Let me preface by stating that despite previous NeXTStep experiences
in the early 90's, I have been using Mac OS X (and the "Japanese"
Carbon Emacs) for only a couple of months.  I have to say that it's on
the whole a very satisfying working environment and I am very grateful
to the maintainers.  Domo arigato gozaimass!

Now, since late 2001 and until my switch to Mac OS X I was using XEmacs
(native) on a notebook running Windows XP [I'm not proud, but there
you go: there were good reasons at the time].  This XEmacs was
major version 21 and did not have MULE support.  As a result my
expertise with coding systems is minimal and hence my present woes.  I
have been using iso-accents-mode with default encoding iso-8859-1.  As
Carbon Emacs reminds me once in a while, iso-accents-mode is now
deprecated, but I'm not sure what exactly has replaced it.  In any
case, on occasion I have used the Mac keyboard sequences (Option-E e
for é, for example) and my problems persist.

Here are two problems that I'm experiencing:

Problem 1
---------


Suppose I type accented characters (like é,ü,î,...) either using their
Mac keyboard sequences or the ones in iso-accents-mode into a Carbon
Emacs buffer.

If I then copy and paste them from the Carbon Emacs buffer into,
e.g., Mail.app then I get this

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 1.png
Type: image/png
Size: 9913 bytes
Desc: not available
Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment.png 
-------------- next part --------------

If I now copy this from Mail.app and paste it back into a Carbon Emacs
buffer I get this:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 2.png
Type: image/png
Size: 5092 bytes
Desc: not available
Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment-0001.png 
-------------- next part --------------


which is not what I started with.  It is not just Mail.app this
happens with, but also with my university's webmail using browsers
like Camino or Safari.

Problem 2
---------

I often edit HTML files which reside in a remote host.  These files
are encoded using utf-8.  When I used XEmacs under Windows XP, the
files were saved by XEmacs in iso-8859-1 and then I used 'iconv' to
convert them to utf-8.  When I edit these files now using Carbon
Emacs, I don't see the accented letters.  For example, instead of

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 4.png
Type: image/png
Size: 2748 bytes
Desc: not available
Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment-0002.png 
-------------- next part --------------


I see this.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 3.png
Type: image/png
Size: 2947 bytes
Desc: not available
Url : http://lists.sourceforge.jp/mailman/archives/macemacsjp-english/attachments/20050913/9b65a55d/attachment-0003.png 
-------------- next part --------------


It seems to me that Emacs is assuming that this is an 8-bit encoding
and finding a 16-bit character breaks it up into 2 characters.

Shouldn't Emacs recognise the encoding of the file and act
accordingly?  Or is this happening because I'm somehow preventing it
from doing so, which brings me to my last question: what should I have
in the .emacs file?

Right now, the relevant lines in my .emacs file seem to be the
following:

;; coding system nightmare

(custom-set-variables
 '(enable-multibyte-characters t)
 '(keyboard-coding-system (quote mac-roman))
 '(utf-translate-cjk-mode nil)
)

(setq unibyte-display-via-language-environment t)
(set-language-environment 'latin-1)
(set-buffer-file-coding-system 'iso-8859-1)
(setq default-buffer-file-coding-system 'iso-8859-1)
(set-default-coding-systems 'iso-8859-1)
(modify-coding-system-alist 'file "\\.html\\'" 'utf-8)
(modify-coding-system-alist 'file "\\.tex\\'" 'iso-8859-1)
(modify-coding-system-alist 'file "\\.xml\\'" 'iso-8859-1)

I have played with many such settings and the only reason I have
chosen iso-8859-1 as default is that this seems to work, more or less.
I forget exactly where I got many of these lines.  (Although I've been
using some Emacs or other since 1980, I have to admit that I'm very
much an Emacs consumer, to quote a recent post to this list.)

In principle I would like to understand encoding in Emacs (or in
general) since I suspect I have many misconceptions, but at this
point, with term starting in less than a week, I would settle for a
fix :-)

Many thanks in advance for your attention,

José

-- 
Prof José M Figueroa-O'Farrill  | Phone: +44 (0) 131 6505066
School of Mathematics           | Fax: +44 (0) 131 6506553
University of Edinburgh         | Mobile: +44 (0) 7870 239186
Edinburgh EH9 3JZ, Scotland, UK | URL: http://www.maths.ed.ac.uk/~jmf


More information about the macemacsjp-english mailing list
Zurück zum Archiv-Index