如何从 JSON 获取字符串对象而不是 Unicode-IT科技

摘要：问题描述：我正在使用Python 2从ASCII 编码的文本文件中解析 JSON 。json使用或加载这些文件时simplejson，我的所有字符串值都会转换为 Unicode 对象而不是字符串对象。问题是，我必须将数据与一些仅接受字符串对象的库一起使用。我无法更改库或更新它们。是否有可能获取字符串对象而不是...

问题描述：

我正在使用Python 2从ASCII 编码的文本文件中解析 JSON 。

json使用或加载这些文件时simplejson，我的所有字符串值都会转换为 Unicode 对象而不是字符串对象。问题是，我必须将数据与一些仅接受字符串对象的库一起使用。我无法更改库或更新它们。

是否有可能获取字符串对象而不是 Unicode 对象？

例子

>>> import json
>>> original_list = ['a', 'b']
>>> json_list = json.dumps(original_list)
>>> json_list
'["a", "b"]'
>>> new_list = json.loads(json_list)
>>> new_list
[u'a', u'b']  # I want these to be of type `str`, not `unicode`

（2017 年的一个简单而干净的解决方案是使用最新版本的 Python — 即Python 3及更高版本。）

解决方案 1：

虽然这里有一些不错的答案，但我最终使用PyYAML来解析我的 JSON 文件，因为它将键和值作为str类型字符串而不是类型给出。因为 JSON 是YAMLunicode的子集，所以它工作得很好：

>>> import json
>>> import yaml
>>> list_org = ['a', 'b']
>>> list_dump = json.dumps(list_org)
>>> list_dump
'["a", "b"]'
>>> json.loads(list_dump)
[u'a', u'b']
>>> yaml.safe_load(list_dump)
['a', 'b']

笔记

但需要注意以下几点：

我获取的是字符串对象，因为我的所有条目都是ASCII 编码的。如果我使用 Unicode 编码的条目，我会将它们作为unicode 对象返回— 无需转换！
您应该（可能总是）使用 PyYAML 的safe_load函数；如果您使用它来加载 JSON 文件，则无论如何都不需要该load函数的“附加功能”。
如果您想要一个对规范的 1.2 版本有更多支持的 YAML 解析器（并且正确解析非常低的数字）请尝试Ruamel YAML：pip install ruamel.yaml并且它import ruamel.yaml as yaml满足我测试中的全部需求。

转换

如上所述，没有任何转换！如果您不能确定只处理 ASCII 值（并且大多数时候您都不能确定），最好使用转换函数：

我已经用过Mark Amery 的那个函数几次了，效果很好，而且非常容易使用。你也可以使用类似的函数object_hook，因为它可能会提高大文件的性能。请参阅Mirec Miskuf 的稍微复杂的答案。

解决方案 2：

没有内置选项可以让json模块函数返回字节字符串而不是 Unicode 字符串。但是，这个简短而简单的递归函数会将任何解码的 JSON 对象从使用 Unicode 字符串转换为 UTF-8 编码的字节字符串：

def byteify(input):
    if isinstance(input, dict):
        return {byteify(key): byteify(value)
                for key, value in input.iteritems()}
    elif isinstance(input, list):
        return [byteify(element) for element in input]
    elif isinstance(input, unicode):
        return input.encode('utf-8')
    else:
        return input

json.load只需在从或调用获得的输出上调用它即可json.loads。

几点说明：

为了支持 Python 2.6 或更早版本，请将其替换return {byteify(key): byteify(value) for key, value in input.iteritems()}为return dict([(byteify(key), byteify(value)) for key, value in input.iteritems()])，因为直到 Python 2.7 才支持字典推导。
由于此答案遍历整个解码对象，因此它具有一些不良的性能特征，可以通过非常谨慎地使用object_hook或object_pairs_hook参数来避免。Mirec Miskuf 的答案是迄今为止唯一能够正确实现这一点的答案，尽管因此，它比我的方法复杂得多。

解决方案 3：

解决方案`object_hook`

它适用于 Python 2.7和3.x。

import json

def json_load_byteified(file_handle):
    return _byteify(
        json.load(file_handle, object_hook=_byteify),
        ignore_dicts=True
    )

def json_loads_byteified(json_text):
    return _byteify(
        json.loads(json_text, object_hook=_byteify),
        ignore_dicts=True
    )

def _byteify(data, ignore_dicts = False):
    if isinstance(data, str):
        return data

    # If this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item, ignore_dicts=True) for item in data ]
    # If this is a dictionary, return dictionary of byteified keys and values
    # but only if we haven't already byteified it
    if isinstance(data, dict) and not ignore_dicts:
        return {
            _byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
            for key, value in data.items() # changed to .items() for Python 2.7/3
        }

    # Python 3 compatible duck-typing
    # If this is a Unicode string, return its string representation
    if str(type(data)) == "<type 'unicode'>":
        return data.encode('utf-8')

    # If it's anything else, return it in its original form
    return data

使用示例：

>>> json_loads_byteified('{"Hello": "World"}')
{'Hello': 'World'}
>>> json_loads_byteified('"I am a top-level string"')
'I am a top-level string'
>>> json_loads_byteified('7')
7
>>> json_loads_byteified('["I am inside a list"]')
['I am inside a list']
>>> json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> json_load_byteified(open('somefile.json'))
{'more json': 'from a file'}

它是如何工作的以及我为什么要使用它？

Mark Amery 的函数比这些函数更短更清晰，那么它们有什么用呢？你为什么要使用它们？

纯粹是为了提高性能。Mark 的答案首先使用 Unicode 字符串完全解码 JSON 文本，然后递归遍历整个解码值以将所有字符串转换为字节字符串。这有几个不良影响：

在内存中创建整个解码结构的副本
如果你的 JSON 对象嵌套很深（500 层或更多），那么你将达到 Python 的最大递归深度

该答案通过使用和的object_hook参数来缓解这两个性能问题。来自文档：json.load`json.loads`

object_hook是一个可选函数，将使用任何对象文字解码的结果 (a) 来调用dict。将使用 object_hook 的返回值代替dict。此功能可用于实现自定义解码器

由于嵌套在其他字典中的多层字典object_hook 在解码时会被传递，因此我们可以在此时对其中的任何字符串或列表进行字节化，并避免以后进行深度递归的需要。

Mark 的答案不适合用作object_hook，因为它会递归到嵌套字典中。我们在这个答案中使用参数ignore_dictsto来阻止这种递归_byteify，该参数会一直传递给它，除非将object_hook新的传递给它dict进行字节化。该ignore_dicts标志指示_byteify忽略dicts ，因为它们已被字节化。

最后，我们的实现json_load_byteified和json_loads_byteified调用_byteify（使用ignore_dicts=True）对json.load或返回的结果进行处理正在解码的 JSON 文本在顶层json.loads没有的情况。dict

解决方案 4：

您可以使用object_hook参数 forjson.loads来传入转换器。事后您不必进行转换。模块json将始终object_hook只传递字典，并且它将递归传递嵌套字典，因此您不必自己递归到嵌套字典中。我认为我不会像Wells 展示的那样将 Unicode 字符串转换为数字。如果它是 Unicode 字符串，则它在 JSON 文件中被引用为字符串，因此它应该是一个字符串（或者文件有问题）。

另外，我会尽量避免str(val)对unicode对象执行类似操作。您应该使用value.encode(encoding)有效的编码，具体取决于外部库的期望。

例如：

def _decode_list(data):
    rv = []
    for item in data:
        if isinstance(item, unicode):
            item = item.encode('utf-8')
        elif isinstance(item, list):
            item = _decode_list(item)
        elif isinstance(item, dict):
            item = _decode_dict(item)
        rv.append(item)
    return rv

def _decode_dict(data):
    rv = {}
    for key, value in data.iteritems():
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        elif isinstance(value, list):
            value = _decode_list(value)
        elif isinstance(value, dict):
            value = _decode_dict(value)
        rv[key] = value
    return rv

obj = json.loads(s, object_hook=_decode_dict)

解决方案 5：

这是因为json()对字符串对象和 Unicode 对象没有区别。它们都是 JavaScript 中的字符串。

我认为JSON 返回 Unicode 对象是正确的。事实上，我不会接受任何更少的东西，因为 JavaScript 字符串实际上是unicode对象（即 JSON（JavaScript）字符串可以存储任何类型的 Unicode 字符），因此在从 JSON 转换字符串时创建unicode对象是有意义的。纯字符串不适合，因为库必须猜测您想要的编码。

最好unicode到处都使用字符串对象。因此，最好的选择是更新库，以便它们能够处理 Unicode 对象。

但如果您确实想要字节串，只需将结果编码为您选择的编码：

>>> nl = json.loads(js)
>>> nl
[u'a', u'b']
>>> nl = [s.encode('utf-8') for s in nl]
>>> nl
['a', 'b']

解决方案 6：

有一个简单的解决方法。

TL;DR - 使用ast.literal_eval()而不是json.loads()。ast和都json在标准库中。

虽然这不是一个“完美”的答案，但如果你打算完全忽略 Unicode，那么它已经足够完美了。在 Python 2.7 中

import json, ast
d = { 'field' : 'value' }
print "JSON Fail: ", json.loads(json.dumps(d))
print "AST Win:", ast.literal_eval(json.dumps(d))

给出：

JSON Fail:  {u'field': u'value'}
AST Win: {'field': 'value'}

当某些对象实际上是 Unicode 字符串时，情况会变得更加棘手。完整的答案很快就会变得棘手。

解决方案 7：

Mike Brennan 的答案很接近，但没有任何理由重新遍历整个结构。如果使用object_hook_pairs(Python 2.7+) 参数：

object_pairs_hook是一个可选函数，将使用对的有序列表解码的任何对象文字的结果进行调用。object_pairs_hook将使用的返回值代替dict。此功能可用于实现依赖于键和值对解码顺序的自定义解码器（例如，collections.OrderedDict将记住插入顺序）。如果object_hook也定义了，object_pairs_hook则优先。

有了它，您就可以获得每个 JSON 对象，因此您无需递归即可进行解码：

def deunicodify_hook(pairs):
    new_pairs = []
    for key, value in pairs:
        if isinstance(value, unicode):
            value = value.encode('utf-8')
        if isinstance(key, unicode):
            key = key.encode('utf-8')
        new_pairs.append((key, value))
    return dict(new_pairs)

In [52]: open('test.json').read()
Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'

In [53]: json.load(open('test.json'))
Out[53]:
{u'1': u'hello',
 u'abc': [1, 2, 3],
 u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
 u'def': {u'hi': u'mom'}}

In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
Out[54]:
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

请注意，我永远不必递归调用钩子，因为当您使用时，每个对象都会被传递给钩子object_pairs_hook。您确实需要关心列表，但正如您所见，列表中的对象将被正确转换，并且您不必递归即可实现它。

一位同事指出 Python2.6 没有object_hook_pairs。你仍然可以在 Python2.6 中使用它，只需进行很小的更改即可。在上面的钩子中，更改：

for key, value in pairs:

到

for key, value in pairs.iteritems():

然后使用object_hook而不是object_pairs_hook：

In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
Out[66]:
{'1': 'hello',
 'abc': [1, 2, 3],
 'boo': [1, 'hi', 'moo', {'5': 'some'}],
 'def': {'hi': 'mom'}}

使用object_pairs_hook会导致 JSON 对象中每个对象实例化的字典少一个，如果您正在解析一个巨大的文档，这可能是值得的。

解决方案 8：

恐怕没有任何方法可以在simplejson库中自动实现这一点。

simplejson中的扫描器和解码器旨在生成 Unicode 文本。为此，库使用一个名为的函数c_scanstring（如果可用，以提高速度），或者py_scanstring如果没有 C 版本。simplejson的scanstring几乎每个例程都会多次调用该函数来解码可能包含文本的结构。您必须对simplejson.decoder中的值进行monkey patch，或者对可能包含文本的任何内容进行子类化并提供几乎您自己的整个实现。scanstring`JSONDecoder`

但是， simplejson输出 Unicode的原因是JSON 规范特别提到“字符串是零个或多个 Unicode 字符的集合”......对 Unicode 的支持被视为格式本身的一部分。simplejson的实现scanstring甚至会扫描和解释 Inicode 转义（甚至对格式错误的多字节字符集表示进行错误检查），因此它能够可靠地将值返回给您的唯一方法是使用 Unicode。

如果您有一个需要的旧库str，我建议您在解析后费力地搜索嵌套数据结构（我承认您明确表示要避免这种情况……抱歉），或者将您的库包装在某种外观中，您可以在其中更精细地调整输入参数。如果您的数据结构确实是深度嵌套的，第二种方法可能比第一种方法更易于管理。

解决方案 9：

正如Mark (Amery) 正确指出的那样：仅当您只有 ASCII 时，在 JSON 转储上使用PyYAML的反序列化器才有效。至少开箱即用。

关于 PyYAML 方法的两点简短评论：

切勿对字段中的数据使用yaml.load()
您可以通过以下方式使其也适用于非 ASCII：

 def to_utf8(loader, node):
     return loader.construct_scalar(node).encode('utf-8')
 yaml.add_constructor(u'tag:yaml.org,2002:str', to_utf8)

但从性能角度来看，它与 Mark Amery 的答案没有可比性：

将一些深层嵌套的样本字典放到这两种方法上，我得到了这个（dt[j] = json.loads(json.dumps(m)) 的时间增量）：

     dt[yaml.safe_load(json.dumps(m))] =~ 100 * dt[j]
     dt[byteify recursion(Mark Amery)] =~   5 * dt[j]

因此，反序列化（包括完整遍历树和编码）完全在 JSON 基于 C 的实现的数量级之内。我发现它非常快，而且在深度嵌套结构中也比yaml加载更强大。而且，从 yaml.load 来看，它不太容易出现安全错误。

=> 尽管我很欣赏指向仅基于 C 的转换器的指针，但byteify 函数应该是默认答案。

如果您的 JSON 结构来自包含用户输入的字段，则尤其如此。因为那时您可能无论如何都需要遍历您的结构 - 独立于您想要的内部数据结构（仅“unicode 三明治”或字节字符串）。

为什么？

Unicode规范化。对于不了解的人：吃点止痛药，读读这个。

因此，使用 byteify 递归可以一举两得：

从嵌套的 JSON 转储中获取字节串
使用户输入值标准化，以便您在存储中找到内容。

在我的测试中，结果表明，用 unicodedata.normalize( 'NFC', input).encode('utf-8') 替换 input.encode('utf-8')比不使用 NFC 更快 - 但我猜这在很大程度上取决于样本数据。

解决方案 10：

问题在于simplejson和json是两个不同的模块，至少在处理 Unicode 的方式上是不同的。您使用json的是 Python 2.6+，这会为您提供 Unicode 值，而simplejson返回字符串对象。

只需在您的环境中尝试 easy_install-ing simplejson并查看是否有效。对我来说，它是有效的。

解决方案 11：

只需使用pickle而不是json进行转储和加载，如下所示：

    import json
    import pickle

    d = { 'field1': 'value1', 'field2': 2, }

    json.dump(d,open("testjson.txt","w"))

    print json.load(open("testjson.txt","r"))

    pickle.dump(d,open("testpickle.txt","w"))

    print pickle.load(open("testpickle.txt","r"))

它产生的输出是（字符串和整数被正确处理）：

    {u'field2': 2, u'field1': u'value1'}
    {'field2': 2, 'field1': 'value1'}

解决方案 12：

我有一个 JSON 字典作为字符串。键和值是 Unicode 对象，如下例所示：

myStringDict = "{u'key':u'value'}"

我可以使用byteify上面建议的函数，通过将字符串转换为dict对象ast.literal_eval(myStringDict)。

解决方案 13：

所以，我遇到了同样的问题。

因为我需要将所有数据传递给PyGTK，所以 Unicode 字符串对我来说也没什么用。所以我有另一种递归转换方法。它实际上也是类型安全的 JSON 转换所必需的 - json.dump() 会放弃任何非文字，例如 Python 对象。但它不会转换字典索引。

# removes any objects, turns Unicode back into str
def filter_data(obj):
        if type(obj) in (int, float, str, bool):
                return obj
        elif type(obj) == unicode:
                return str(obj)
        elif type(obj) in (list, tuple, set):
                obj = list(obj)
                for i,v in enumerate(obj):
                        obj[i] = filter_data(v)
        elif type(obj) == dict:
                for i,v in obj.iteritems():
                        obj[i] = filter_data(v)
        else:
                print "invalid object in data, converting to string"
                obj = str(obj)
        return obj

解决方案 14：

使用钩子支持 Python 2 和 3（来自Mirec Miskuf 的答案）：

import requests
import six
from six import iteritems

requests.packages.urllib3.disable_warnings()  # @UndefinedVariable
r = requests.get("http://echo.jsontest.com/key/value/one/two/three", verify=False)

def _byteify(data):
    # If this is a Unicode string, return its string representation
    if isinstance(data, six.string_types):
        return str(data.encode('utf-8').decode())

    # If this is a list of values, return list of byteified values
    if isinstance(data, list):
        return [ _byteify(item) for item in data ]

    # If this is a dictionary, return dictionary of byteified keys and values,
    # but only if we haven't already byteified it
    if isinstance(data, dict):
        return {
            _byteify(key): _byteify(value) for key, value in iteritems(data)
        }
    # If it's anything else, return it in its original form
    return data

w = r.json(object_hook=_byteify)
print(w)

 {'three': '', 'key': 'value', 'one': 'two'}

解决方案 15：

使用 Python 3.6 时，有时我仍然会遇到此问题。例如，当从REST API 获取响应并将响应文本加载到 JSON 时，我仍然会得到 Unicode 字符串。使用 json.dumps() 找到了一个简单的解决方案。

response_message = json.loads(json.dumps(response.text))
print(response_message)

解决方案 16：

我构建了这个递归转换程序。它满足了我的需求，而且我认为它相对完整。

def _parseJSON(self, obj):
    newobj = {}

    for key, value in obj.iteritems():
        key = str(key)

        if isinstance(value, dict):
            newobj[key] = self._parseJSON(value)
        elif isinstance(value, list):
            if key not in newobj:
                newobj[key] = []
                for i in value:
                    newobj[key].append(self._parseJSON(i))
        elif isinstance(value, unicode):
            val = str(value)
            if val.isdigit():
                val = int(val)
            else:
                try:
                    val = float(val)
                except ValueError:
                    val = str(val)
            newobj[key] = val

    return newobj

只需向其传递一个 JSON 对象即可，如下所示：

obj = json.loads(content, parse_float=float, parse_int=int)
obj = _parseJSON(obj)

我将其作为一个类的私有成员，但您可以根据需要重新利用该方法。

解决方案 17：

我重写了Wells 的 _parse_json()来处理json对象本身是数组的情况（我的用例）。

def _parseJSON(self, obj):
    if isinstance(obj, dict):
        newobj = {}
        for key, value in obj.iteritems():
            key = str(key)
            newobj[key] = self._parseJSON(value)
    elif isinstance(obj, list):
        newobj = []
        for value in obj:
            newobj.append(self._parseJSON(value))
    elif isinstance(obj, unicode):
        newobj = str(obj)
    else:
        newobj = obj
    return newobj

解决方案 18：

这是一个用 C 编写的递归编码器：
https://github.com/axiros/nested_encode

与json.loads()相比，“平均”结构的性能开销约为 10％。

python speed.py
  json loads            [0.16sec]: {u'a': [{u'b': [[1, 2, [u'xd6ster..
  json loads + encoding [0.18sec]: {'a': [{'b': [[1, 2, ['xc3x96ster.
  time overhead in percent: 9%

使用这个测试结构：

import json, nested_encode, time

s = """
{
  "firstName": "Jos\/u0301",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "\/u00d6sterreich",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null,
  "a": [{"b": [[1, 2, ["\/u00d6sterreich"]]]}]
}
"""


t1 = time.time()
for i in xrange(10000):
    u = json.loads(s)
dt_json = time.time() - t1

t1 = time.time()
for i in xrange(10000):
    b = nested_encode.encode_nested(json.loads(s))
dt_json_enc = time.time() - t1

print "json loads            [%.2fsec]: %s..." % (dt_json, str(u)[:20])
print "json loads + encoding [%.2fsec]: %s..." % (dt_json_enc, str(b)[:20])

print "time overhead in percent: %i%%"  % (100 * (dt_json_enc - dt_json)/dt_json)

解决方案 19：

我也遇到了这个问题，并且必须处理 JSON，我想出了一个将 Unicode 键转换为字符串的小循环。（GAEsimplejson上不返回字符串键。）

obj是从 JSON 解码的对象：

if NAME_CLASS_MAP.has_key(cls):
    kwargs = {}
    for i in obj.keys():
        kwargs[str(i)] = obj[i]
    o = NAME_CLASS_MAP[cls](**kwargs)
    o.save()

kwargs是我传递给 GAE 应用程序的构造函数的内容（它不喜欢中的 Unicode 键**kwargs）。

它不如Wells 的解决方案那么强大，但是要小得多。

解决方案 20：

我改编了Mark Amery的答案中的代码，特别是为了摆脱鸭子类型的优点。isinstance

编码是手动完成的，并且ensure_ascii已被禁用。Python 文档中json.dump说：

如果ensure_ascii为True（默认值），则输出中的所有非ASCII字符都将使用/uXXXX序列进行转义

免责声明：在 doctest 中我使用了匈牙利语。一些值得注意的与匈牙利语相关的字符编码是：，例如在DOScp852中使用的 IBM/OEM 编码（有时称为ASCII。我认为是错误的，因为它取决于代码页设置）。例如在 Windows 中使用Windows-1250（有时称为 ANSI，取决于区域设置），以及有时在 HTTP 服务器上使用的ISO 8859-1 。

测试文本Tüskéshátú kígyóbűvölő归功于Koltai László（本机人名形式），来自维基百科。

# coding: utf-8
"""
This file should be encoded correctly with utf-8.
"""
import json

def encode_items(input, encoding='utf-8'):
    u"""original from: https://stackoverflow.com/a/13101776/611007
    adapted by SO/u/611007 (20150623)
    >>>
    >>> ## run this with `python -m doctest <this file>.py` from command line
    >>>
    >>> txt = u"Tüskéshátú kígyóbűvölő"
    >>> txt2 = u"T\/u00fcsk\/u00e9sh\/u00e1t\/u00fa k\/u00edgy\/u00f3b\/u0171v\/u00f6l\/u0151"
    >>> txt3 = u"uúuutifu"
    >>> txt4 = b'u\úuutifu'
    >>> # txt4 shouldn't be 'u\Ã\ºuutifu', string content needs double backslash for doctest:
    >>> assert u'\/u0102' not in b'u\úuutifu'.decode('cp1250')
    >>> txt4u = txt4.decode('cp1250')
    >>> assert txt4u == u'u\úuutifu', repr(txt4u)
    >>> txt5 = b"u\Ã\ºuutifu"
    >>> txt5u = txt5.decode('utf-8')
    >>> txt6 = u"u\/u251c\/u2551uutifu"
    >>> there_and_back_again = lambda t: encode_items(t, encoding='utf-8').decode('utf-8')
    >>> assert txt == there_and_back_again(txt)
    >>> assert txt == there_and_back_again(txt2)
    >>> assert txt3 == there_and_back_again(txt3)
    >>> assert txt3.encode('cp852') == there_and_back_again(txt4u).encode('cp852')
    >>> assert txt3 == txt4u,(txt3,txt4u)
    >>> assert txt3 == there_and_back_again(txt5)
    >>> assert txt3 == there_and_back_again(txt5u)
    >>> assert txt3 == there_and_back_again(txt4u)
    >>> assert txt3.encode('cp1250') == encode_items(txt4, encoding='utf-8')
    >>> assert txt3.encode('utf-8') == encode_items(txt5, encoding='utf-8')
    >>> assert txt2.encode('utf-8') == encode_items(txt, encoding='utf-8')
    >>> assert {'a':txt2.encode('utf-8')} == encode_items({'a':txt}, encoding='utf-8')
    >>> assert [txt2.encode('utf-8')] == encode_items([txt], encoding='utf-8')
    >>> assert [[txt2.encode('utf-8')]] == encode_items([[txt]], encoding='utf-8')
    >>> assert [{'a':txt2.encode('utf-8')}] == encode_items([{'a':txt}], encoding='utf-8')
    >>> assert {'b':{'a':txt2.encode('utf-8')}} == encode_items({'b':{'a':txt}}, encoding='utf-8')
    """
    try:
        input.iteritems
        return {encode_items(k): encode_items(v) for (k,v) in input.iteritems()}
    except AttributeError:
        if isinstance(input, unicode):
            return input.encode(encoding)
        elif isinstance(input, str):
            return input
        try:
            iter(input)
            return [encode_items(e) for e in input]
        except TypeError:
            return input

def alt_dumps(obj, **kwargs):
    """
    >>> alt_dumps({'a': u"T\/u00fcsk\/u00e9sh\/u00e1t\/u00fa k\/u00edgy\/u00f3b\/u0171v\/u00f6l\/u0151"})
    '{"a": "T\Ã\¼sk\Ã\©sh\Ã\¡t\Ã\º k\Ã\gy\Ã\³b\Å\±v\Ã\¶l\Å\"}'
    """
    if 'ensure_ascii' in kwargs:
        del kwargs['ensure_ascii']
    return json.dumps(encode_items(obj), ensure_ascii=False, **kwargs)

我还想强调一下Jarret Hardie的回答，它引用了JSON 规范，并引用：

字符串是零个或多个 Unicode 字符的集合

在我的用例中，我有一些包含 JSON 内容的文件。它们是UTF-8编码的文件。ensure_ascii结果是正确转义但可读性较差的 JSON 文件，这就是为什么我调整了 Mark Amery 的答案以满足我的需求。

文档测试不是特别周到，但我分享代码，希望它对某些人有用。

解决方案 21：

看看这个类似问题的答案，其中指出

u-前缀仅表示您有一个 Unicode 字符串。当您真正使用该字符串时，它不会出现在您的数据中。不要被打印的输出所困扰。

例如，尝试这个：

print mail_accounts[0]["i"]

您不会看到u。