关于python源码字符编码的定义

运行如下Python打印语句:
print u'I "said" do not touch “this.""'
其中包含一个中文的双引号,python解释器报错。报错信息如下:

[wangy@bogon 文档]$ python ex1.py
  File "ex1.py", line 7
SyntaxError: Non-ASCII character '\xe2' in file ex1.py on line 7, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

查看链接 http://www.python.org/peps/pep-0263.html


主要内容如下:

在Python2.1版本中,源码文件仅仅支持Latin-1,西欧国家的字符编码,从而给亚洲的编程爱
好者造成很大的困扰,必须使用“unicode-escape”编码来表示Unicode literals。


解决的方法就是为了让解释器了解源代码的编码,必须对源码文件的编码进行声明。


定义编码的方式:
Python will default to ASCII as standard encoding if no other encoding hints are given.

To define a source code encoding, a magic comment must be placed into the source 

files either as first or second line in the file, such as:


# coding=
or (using formats recognized by popular editors):


#!/usr/bin/python
# -*- coding: -*-
or:


#!/usr/bin/python
# vim: set fileencoding= :




最好使用第一种或者第二种。


文中特别提到在windows平台下,增加Unicode BOM标记在Unicode文件头,因此不需要特别声明文件编码,同理也会在UTF-8文件头增加UTF-8标记,故亦不需要声明。

如果源文件使用 both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'. Any other encoding will cause an 
error.


Examples
These are some examples to clarify the different styles for defining the source code encoding at the top of a Python source file:

With interpreter binary and using Emacs style file encoding comment:

#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...


#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys
...

#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys
...
Without interpreter line, using plain text:
# This Python file uses the following encoding: utf-8
import os, sys
...
Text editors might have different ways of defining the file's encoding, e.g.:
#!/usr/local/bin/python
# coding: latin-1
import os, sys
...
Without encoding comment, Python's parser will assume ASCII text:
#!/usr/local/bin/python
import os, sys
...
Encoding comments which don't work:
Missing "coding:" prefix:
#!/usr/local/bin/python
# latin-1
import os, sys
...
Encoding comment not on line 1 or 2:
#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys
...
Unsupported encoding:
#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys
...

修改源代码,以UTF-8保存,编辑器使用了Linux下的gedit
# -*- coding: utf-8 -*-
print "hello world!"
print "hello Again"
print "I like trying this"
print "This is fun"
print 'Yay! Printing'
print "I'd much rather you 'not'."
print u'I "said" 这里有中文双引号 “this.""'

正常打印





网页标题:关于python源码字符编码的定义
文章URL:http://azwzsj.com/article/gsgsge.html