下面是一个简单的转换方法,还原其为正常的utf-8编码:
>>> a = '测试'
>>> a
'\xb2\xe2\xca\xd4'
>>> b = a.decode('gbk').encode('utf-8')
>>> b
'\xe6\xb5\x8b\xe8\xaf\x95'
>>> c = u'\xe6\xb5\x8b\xe8\xaf\x95'
>>> c
u'\xe6\xb5\x8b\xe8\xaf\x95'
>>> arr = array.array('B')
>>> arr.fromlist([ord(i) for i in c])
>>> print arr.tostring().decode('utf-8').encode('gbk')
测试
>>> a
'\xb2\xe2\xca\xd4'
>>> b = a.decode('gbk').encode('utf-8')
>>> b
'\xe6\xb5\x8b\xe8\xaf\x95'
>>> c = u'\xe6\xb5\x8b\xe8\xaf\x95'
>>> c
u'\xe6\xb5\x8b\xe8\xaf\x95'
>>> arr = array.array('B')
>>> arr.fromlist([ord(i) for i in c])
>>> print arr.tostring().decode('utf-8').encode('gbk')
测试
最终的代码是这样:
import string
import array
s = u'
\xe6\xb5\x8b\xe8\xaf\x95 '
arr = array.array('B')
arr.fromlist([ord(i) for i in s])
print arr.tostring().decode('utf-8').encode('gbk')
如果读者知道更好的转换方法,希望不吝赐教。