如何在使用 JSON 模块进行漂亮打印时实现自定义缩进?
- 2025-02-27 09:07:00
- admin 原创
- 56
问题描述:
因此我使用 Python 2.7,使用json
模块对以下数据结构进行编码:
'layer1': {
'layer2': {
'layer3_1': [ long_list_of_stuff ],
'layer3_2': 'string'
}
}
我的问题是我使用漂亮的打印方式打印出所有内容,如下所示:
json.dumps(data_structure, indent=2)
这很好,但我想缩进所有内容,除了内容之外"layer3_1"
- 这是一个列出坐标的庞大字典,因此,在每个坐标上设置一个值使得漂亮的打印创建一个包含数千行的文件,示例如下:
{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}
我真正想要的是类似以下内容的东西:
{
"layer1": {
"layer2": {
"layer3_1": [{"x":1,"y":7},{"x":0,"y":4},{"x":5,"y":3},{"x":6,"y":9}],
"layer3_2": "string"
}
}
}
我听说可以扩展json
模块:是否可以将其设置为仅在对象内部时关闭缩进"layer3_1"
?如果可以,有人能告诉我怎么做吗?
解决方案 1:
(注意:
此答案中的代码仅适用于json.dumps()
返回 JSON 格式的字符串,但不适用于json.dump()
直接写入类似文件的对象。在我对问题“将二维列表写入 JSON 文件”的回答中,有一个可与两者一起使用的修改版本。)
更新
以下是我最初答案的一个版本,经过多次修改。与原始版本不同,我发布原始版本只是为了展示如何使 JFSebastian答案中的第一个想法发挥作用,并且像他的一样,返回对象的非缩进字符串表示。最新更新版本返回孤立格式化的 Python 对象 JSON。
dict
根据 OP 的评论,每个坐标的键将按排序顺序显示,但前提是sort_keys=True
在驱动该过程的初始json.dumps()
调用中指定了关键字参数,并且它不再在此过程中将对象的类型更改为字符串。换句话说,现在保留了“包装”对象的实际类型。
我认为不理解我帖子的初衷导致许多人投了反对票——因此,主要出于这个原因,我多次“修正”并改进了我的答案。当前版本是我原始答案的混合体,结合了@Erik Allik 在他的答案中使用的一些想法,以及此答案下方评论中显示的其他用户的有用反馈。
以下代码似乎在 Python 2.7.16 和 3.7.4 中均可以正常运行。
from _ctypes import PyObj_FromPtr
import json
import re
class NoIndent(object):
""" Value wrapper. """
def __init__(self, value):
self.value = value
class MyEncoder(json.JSONEncoder):
FORMAT_SPEC = '@@{}@@'
regex = re.compile(FORMAT_SPEC.format(r'(d+)'))
def __init__(self, **kwargs):
# Save copy of any keyword argument values needed for use here.
self.__sort_keys = kwargs.get('sort_keys', None)
super(MyEncoder, self).__init__(**kwargs)
def default(self, obj):
return (self.FORMAT_SPEC.format(id(obj)) if isinstance(obj, NoIndent)
else super(MyEncoder, self).default(obj))
def encode(self, obj):
format_spec = self.FORMAT_SPEC # Local var to expedite access.
json_repr = super(MyEncoder, self).encode(obj) # Default JSON.
# Replace any marked-up object ids in the JSON repr with the
# value returned from the json.dumps() of the corresponding
# wrapped Python object.
for match in self.regex.finditer(json_repr):
# see https://stackoverflow.com/a/15012814/355230
id = int(match.group(1))
no_indent = PyObj_FromPtr(id)
json_obj_repr = json.dumps(no_indent.value, sort_keys=self.__sort_keys)
# Replace the matched id string with json formatted representation
# of the corresponding Python object.
json_repr = json_repr.replace(
'"{}"'.format(format_spec.format(id)), json_obj_repr)
return json_repr
if __name__ == '__main__':
from string import ascii_lowercase as letters
data_structure = {
'layer1': {
'layer2': {
'layer3_1': NoIndent([{"x":1,"y":7}, {"x":0,"y":4}, {"x":5,"y":3},
{"x":6,"y":9},
{k: v for v, k in enumerate(letters)}]),
'layer3_2': 'string',
'layer3_3': NoIndent([{"x":2,"y":8,"z":3}, {"x":1,"y":5,"z":4},
{"x":6,"y":9,"z":8}]),
'layer3_4': NoIndent(list(range(20))),
}
}
}
print(json.dumps(data_structure, cls=MyEncoder, sort_keys=True, indent=2))
输出:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}, {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25}],
"layer3_2": "string",
"layer3_3": [{"x": 2, "y": 8, "z": 3}, {"x": 1, "y": 5, "z": 4}, {"x": 6, "y": 9, "z": 8}],
"layer3_4": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
}
}
}
解决方案 2:
有点儿不着调,但是一旦您从 dumps() 中获得了字符串,您就可以对其进行正则表达式替换,前提是您确定其内容的格式。类似于以下内容:
s = json.dumps(data_structure, indent=2)
s = re.sub('s*{s*"(.)": (d+),s*"(.)": (d+)s*}(,?)s*', r'{"":,"":}', s)
解决方案 3:
以下解决方案似乎在 Python 2.7.x 上可以正常工作。它使用Python 2.7 中的自定义 JSON 编码器的解决方法,通过使用基于 UUID 的替换方案插入纯 JavaScript 代码,以避免自定义编码对象最终在输出中显示为 JSON 字符串。
class NoIndent(object):
def __init__(self, value):
self.value = value
class NoIndentEncoder(json.JSONEncoder):
def __init__(self, *args, **kwargs):
super(NoIndentEncoder, self).__init__(*args, **kwargs)
self.kwargs = dict(kwargs)
del self.kwargs['indent']
self._replacement_map = {}
def default(self, o):
if isinstance(o, NoIndent):
key = uuid.uuid4().hex
self._replacement_map[key] = json.dumps(o.value, **self.kwargs)
return "@@%s@@" % (key,)
else:
return super(NoIndentEncoder, self).default(o)
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in self._replacement_map.iteritems():
result = result.replace('"@@%s@@"' % (k,), v)
return result
然后这个
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": NoIndent([{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}])
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
产生以下输出:
{
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]
}
}
}
它还正确地传递了所有选项(除了indent
),例如sort_keys=True
传递给嵌套json.dumps
调用。
obj = {
"layer1": {
"layer2": {
"layer3_1": NoIndent([{"y": 7, "x": 1, }, {"y": 4, "x": 0}, {"y": 3, "x": 5, }, {"y": 9, "x": 6}]),
"layer3_2": "string",
}
}
}
print json.dumps(obj, indent=2, sort_keys=True, cls=NoIndentEncoder)
正确输出:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}],
"layer3_2": "string"
}
}
}
它还可以与例如结合collections.OrderedDict
:
obj = {
"layer1": {
"layer2": {
"layer3_2": "string",
"layer3_3": NoIndent(OrderedDict([("b", 1), ("a", 2)]))
}
}
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)
输出:
{
"layer1": {
"layer2": {
"layer3_3": {"b": 1, "a": 2},
"layer3_2": "string"
}
}
}
更新:在 Python 3 中,没有iteritems
。您可以encode
用以下代码替换:
def encode(self, o):
result = super(NoIndentEncoder, self).encode(o)
for k, v in iter(self._replacement_map.items()):
result = result.replace('"@@%s@@"' % (k,), v)
return result
解决方案 4:
这产生了 OP 的预期结果:
import json
class MyJSONEncoder(json.JSONEncoder):
def iterencode(self, o, _one_shot=False):
list_lvl = 0
for s in super(MyJSONEncoder, self).iterencode(o, _one_shot=_one_shot):
if s.startswith('['):
list_lvl += 1
s = s.replace('
', '').rstrip()
elif 0 < list_lvl:
s = s.replace('
', '').rstrip()
if s and s[-1] == ',':
s = s[:-1] + self.item_separator
elif s and s[-1] == ':':
s = s[:-1] + self.key_separator
if s.endswith(']'):
list_lvl -= 1
yield s
o = {
"layer1":{
"layer2":{
"layer3_1":[{"y":7,"x":1},{"y":4,"x":0},{"y":3,"x":5},{"y":9,"x":6}],
"layer3_2":"string",
"layer3_3":["aaa
bbb","ccc
ddd",{"aaa
bbb":"ccc
ddd"}],
"layer3_4":"aaa
bbb",
}
}
}
jsonstr = json.dumps(o, indent=2, separators=(',', ':'), sort_keys=True,
cls=MyJSONEncoder)
print(jsonstr)
o2 = json.loads(jsonstr)
print('identical objects: {}'.format((o == o2)))
解决方案 5:
为我和 Python 3 用户解答
import re
def jsonIndentLimit(jsonString, indent, limit):
regexPattern = re.compile(f'
({indent}){{{limit}}}(({indent})+|(?=(}}|])))')
return regexPattern.sub('', jsonString)
if __name__ == '__main__':
jsonString = '''{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}'''
print(jsonIndentLimit(jsonString, ' ', 3))
'''print
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
"layer3_2": "string"
}
}
}'''
解决方案 6:
您可以尝试:
将不应缩进的列表替换为
NoIndentList
:
class NoIndentList(list):
pass
重写json.Encoder.default 方法来生成非缩进的字符串表示形式
NoIndentList
。
您可以将其转换回列表并调用 json.dumps() 而无需indent
获取一行
看来上述方法对 json 模块不起作用:
import json
import sys
class NoIndent(object):
def __init__(self, value):
self.value = value
def default(o, encoder=json.JSONEncoder()):
if isinstance(o, NoIndent):
return json.dumps(o.value)
return encoder.default(o)
L = [dict(x=x, y=y) for x in range(1) for y in range(2)]
obj = [NoIndent(L), L]
json.dump(obj, sys.stdout, default=default, indent=4)
它产生无效的输出(列表被序列化为字符串):
[
"[{\"y\": 0, \"x\": 0}, {\"y\": 1, \"x\": 0}]",
[
{
"y": 0,
"x": 0
},
{
"y": 1,
"x": 0
}
]
]
如果可以使用yaml
则该方法有效:
import sys
import yaml
class NoIndentList(list):
pass
def noindent_list_presenter(dumper, data):
return dumper.represent_sequence(u'tag:yaml.org,2002:seq', data,
flow_style=True)
yaml.add_representer(NoIndentList, noindent_list_presenter)
obj = [
[dict(x=x, y=y) for x in range(2) for y in range(1)],
[dict(x=x, y=y) for x in range(1) for y in range(2)],
]
obj[0] = NoIndentList(obj[0])
yaml.dump(obj, stream=sys.stdout, indent=4)
它产生:
- [{x: 0, y: 0}, {x: 1, y: 0}]
- - {x: 0, y: 0}
- {x: 0, y: 1}
即,第一个列表使用序列化[]
并且所有项目都在一行上,第二个列表每个项目使用一行。
解决方案 7:
如果您有太多不同类型的对象构成 JSON,无法尝试 JSONEncoder 方法,并且有太多不同的类型无法使用正则表达式,则可以使用以下后处理解决方案。此函数会在指定级别后折叠空格,而无需了解数据本身的具体信息。
def collapse_json(text, indent=12):
"""Compacts a string of json data by collapsing whitespace after the
specified indent level
NOTE: will not produce correct results when indent level is not a multiple
of the json indent level
"""
initial = " " * indent
out = [] # final json output
sublevel = [] # accumulation list for sublevel entries
pending = None # holder for consecutive entries at exact indent level
for line in text.splitlines():
if line.startswith(initial):
if line[indent] == " ":
# found a line indented further than the indent level, so add
# it to the sublevel list
if pending:
# the first item in the sublevel will be the pending item
# that was the previous line in the json
sublevel.append(pending)
pending = None
item = line.strip()
sublevel.append(item)
if item.endswith(","):
sublevel.append(" ")
elif sublevel:
# found a line at the exact indent level *and* we have sublevel
# items. This means the sublevel items have come to an end
sublevel.append(line.strip())
out.append("".join(sublevel))
sublevel = []
else:
# found a line at the exact indent level but no items indented
# further, so possibly start a new sub-level
if pending:
# if there is already a pending item, it means that
# consecutive entries in the json had the exact same
# indentation and that last pending item was not the start
# of a new sublevel.
out.append(pending)
pending = line.rstrip()
else:
if pending:
# it's possible that an item will be pending but not added to
# the output yet, so make sure it's not forgotten.
out.append(pending)
pending = None
if sublevel:
out.append("".join(sublevel))
out.append(line)
return "
".join(out)
例如,使用此结构作为 json.dumps 的输入,缩进级别为 4:
text = json.dumps({"zero": ["first", {"second": 2, "third": 3, "fourth": 4, "items": [[1,2,3,4], [5,6,7,8], 9, 10, [11, [12, [13, [14, 15]]]]]}]}, indent=4)
以下是该函数在各个缩进级别的输出:
>>> print collapse_json(text, indent=0)
{"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]}
>>> print collapse_json(text, indent=4)
{
"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]
}
>>> print collapse_json(text, indent=8)
{
"zero": [
"first",
{"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}
]
}
>>> print collapse_json(text, indent=12)
{
"zero": [
"first",
{
"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]],
"second": 2,
"fourth": 4,
"third": 3
}
]
}
>>> print collapse_json(text, indent=16)
{
"zero": [
"first",
{
"items": [
[1, 2, 3, 4],
[5, 6, 7, 8],
9,
10,
[11, [12, [13, [14, 15]]]]
],
"second": 2,
"fourth": 4,
"third": 3
}
]
}
解决方案 8:
最佳性能代码(10MB文本耗时1s):
import json
def dumps_json(data, indent=2, depth=2):
assert depth > 0
space = ' '*indent
s = json.dumps(data, indent=indent)
lines = s.splitlines()
N = len(lines)
# determine which lines to be shortened
is_over_depth_line = lambda i: i in range(N) and lines[i].startswith(space*(depth+1))
is_open_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i+1)
is_close_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i-1)
#
def shorten_line(line_index):
if not is_open_bracket_line(line_index):
return lines[line_index]
# shorten over-depth lines
start = line_index
end = start
while not is_close_bracket_line(end):
end += 1
has_trailing_comma = lines[end][-1] == ','
_lines = [lines[start][-1], *lines[start+1:end], lines[end].replace(',','')]
d = json.dumps(json.loads(' '.join(_lines)))
return lines[line_index][:-1] + d + (',' if has_trailing_comma else '')
#
s = '
'.join([
shorten_line(i)
for i in range(N) if not is_over_depth_line(i) and not is_close_bracket_line(i)
])
#
return s
更新:这是我的解释:
首先我们使用 json.dumps 来获取已经缩进的 json 字符串。例如:
>>> print(json.dumps({'0':{'1a':{'2a':None,'2b':None},'1b':{'2':None}}}, indent=2))
[0] {
[1] "0": {
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
[6] "1b": {
[7] "2": null
[8] }
[9] }
[10] }
如果我们设置indent=2
和depth = 2
,那么深度线也以 6 个空格开头
我们有四种类型的线路:
法线
左括号行 (2,6)
超出深度线 (3,4,7)
右括号行 (5,8)
我们将尝试将一系列行(类型 2 + 3 + 4)合并为一行。示例:
[2] "1a": {
[3] "2a": null,
[4] "2b": null
[5] },
将被合并到:
[2] "1a": {"2a": null, "2b": null},
注意:右括号行可能有尾随逗号
解决方案 9:
我知道这个问题在时间和 Python 版本方面都相当古老,但是在搜索类似问题时,我遇到了compact-json
它只是起作用了......
> compact-json -l 80 sample.txt
{
"layer1": {
"layer2": {
"layer3_1": [ {"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9} ],
"layer3_2": "string"
}
}
}
并且可以在脚本中同样轻松地运行。
import json
from compact_json import Formatter
str = """
{
"layer1": {
"layer2": {
"layer3_1": [
{
"x": 1,
"y": 7
},
{
"x": 0,
"y": 4
},
{
"x": 5,
"y": 3
},
{
"x": 6,
"y": 9
}
],
"layer3_2": "string"
}
}
}"""
json_str = json.loads(str)
print(Formatter().serialize(json_str)) # same result as above
解决方案 10:
确实,YAML 比 JSON 更好。
我无法让 NoIndentEncoder 工作......但我可以在 JSON 字符串上使用正则表达式......
def collapse_json(text, list_length=5):
for length in range(list_length):
re_pattern = r'[' + (r's*(.+)s*,' * length)[:-1] + r']'
re_repl = r'[' + ''.join(r'{}, '.format(i+1) for i in range(length))[:-2] + r']'
text = re.sub(re_pattern, re_repl, text)
return text
问题是,如何在嵌套列表上执行此操作?
前:
[
0,
"any",
[
2,
3
]
]
后:
[0, "any", [2, 3]]
解决方案 11:
如果您想要以不同的方式缩进数组,可以使用另一种方法,如下所示:
import json
# Should be unique and never appear in the input
REPLACE_MARK = "#$ONE_LINE_ARRAY_{0}$#"
example_json = {
"test_int": 3,
"test_str": "Test",
"test_arr": [ "An", "Array" ],
"test_obj": {
"nested_str": "string",
"nested_arr": [{"id": 1},{"id": 2}]
}
}
# Replace all arrays with the indexed markers.
a = example_json["test_arr"]
b = example_json["test_obj"]["nested_arr"]
example_json["test_arr"] = REPLACE_MARK.format("a")
example_json["test_obj"]["nested_arr"] = REPLACE_MARK.format("b")
# Generate the JSON without any arrays using your pretty print.
json_data = json.dumps(example_json, indent=4)
# Generate the JSON arrays without pretty print.
json_data_a = json.dumps(a)
json_data_b = json.dumps(b)
# Insert the flat JSON strings into the parent at the indexed marks.
json_data = json_data.replace(f"\"{REPLACE_MARK.format('a')}\"", json_data_a)
json_data = json_data.replace(f"\"{REPLACE_MARK.format('b')}\"", json_data_b)
print(json_data)
您可以将其概括为一个函数,该函数将遍历 JSON 对象的每个元素,扫描数组并动态执行替换。
优点:
简单且可扩展
不使用正则表达式
没有自定义 JSON 编码器
缺点:
请注意用户输入不要包含替换占位符。
对于包含大量数组的 JSON 结构,可能无法发挥其性能。
可以使用以下方法进行优化:用字符串数组替换字符串
这个解决方案的动机是固定格式的动画帧生成,其中数组的每个元素都是一个整数索引。这个解决方案对我来说效果很好,而且很容易调整。
以下是更通用且优化的版本:
import json
import copy
REPLACE_MARK = "#$ONE_LINE_ARRAY_$#"
def dump_arrays_single_line(json_data):
# Deep copy prevent modifying original data.
json_data = copy.deepcopy(json_data)
# Walk the dictionary, putting every JSON array into arr.
def walk(node, arr):
for key, item in node.items():
if type(item) is dict:
walk(item, arr)
elif type(item) is list:
arr.append(item)
node[key] = REPLACE_MARK
else:
pass
arr = []
walk(json_data, arr)
# Pretty format but keep arrays on single line.
# Need to escape '{' and '}' to use 'str.format()'
json_data = json.dumps(json_data, indent=4).replace('{', '{{').replace('}', '}}').replace(f'"{REPLACE_MARK}"', "{}", len(arr)).format(*arr)
return json_data
example_json = {
"test_int": 3,
"test_str": "Test",
"test_arr": [ "An", "Array" ],
"test_obj": {
"nested_str": "string",
"nested_arr": [{"id": 1},{"id": 2}]
}
}
print(dump_arrays_single_line(example_json))
解决方案 12:
我发现本页上的其他答案都不够完善。它们要么需要更改数据源对象(例如添加NoIndent
包装器),要么使用静态包装策略(例如不包装所有列表,不包装某些键)。@Thell 有最好的通用解决方案,它根据输出长度动态包装每个字段。不幸的是,它的性能很差。
compact_json
功能很棒,完全按照应有的方式解决了一般问题。行会根据行长度和许多其他可配置标准(例如对象复杂性(想想嵌套的字典/列表级别))进行换行或合并。这是正确的方法。但它的性能很糟糕。
表现
主要问题是compact_json
速度很慢。我们说的速度比 stdlibjson
编码慢 50 倍。以下是使用 4 MB json 文件进行的测试:
# stdlib json
> python3 -m timeit -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'json.dumps (data)'
2 loops, best of 5: 164 msec per loop
# compact_json
> python3 -m timeit -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'fmt.serialize (data)'
1 loop, best of 5: 7.85 sec per loop
compact_json
转储 4 MB 需要 8 秒,而 stdlib 需要 165 毫秒json
。如果您的数据大于玩具大小,请小睡一会儿 - 需要一段时间。对于具有大量数据的应用程序,compact_json
将不起作用。
解决方案
我找到了一个性能更好的解决方案:CompactJSONEncoder。虽然功能少了很多compact_json
,但解决了包装问题,速度也快了很多。
使用方法很简单。只需将其作为cls
参数传递给 stdlib 即可:json.dumps (data, cls = CompactJSONEncoder)
。以下是相同的 4 MB 测试:
# CompactJSONEncoder
> python3 -m timeit -s 'import json ; data = json.load (open ("test.json", 'r'))' -c 'json.dumps (data, cls = CompactJSONEncoder)'
1 loop, best of 5: 1.58 sec per loop
仅比 stdlib 慢 10 倍json
。这是简单的实现,通过优化可能可以将其降低到 5 倍或更少。并且不需要外部库:只需一个简短的类和 stdlib json
。
代码
这是CompactJSONEncoder
上面链接的类,略作修改。常规版本仅在列表/字典未嵌套时将其展平。它给出了良好的结果。但 OP 希望将整个layer3_1
条目放在一行上。为此,只需像_primitives_only
我下面所做的那样删除测试,任何不超过 MAX_WIDTH 个字符的对象都将被展平。
class CompactJSONEncoder (json.JSONEncoder) :
'''A JSON Encoder that puts small containers on single lines.'''
CONTAINER_TYPES = (list, tuple, dict)
'''Container datatypes include primitives or other containers.'''
MAX_WIDTH = 70
'''Maximum width of a container that might be put on a single line.'''
MAX_ITEMS = 12
'''Maximum number of items in container that might be put on single line.'''
def __init__ (me, *args, **kwargs) :
super ().__init__ (*args, **kwargs)
me.indentation_level = 0
def encode (me, o) :
'''Encode JSON object *o* with respect to single line lists.'''
if isinstance (o, (list, tuple)) :
return me._encode_list (o)
if isinstance (o, dict) :
return me._encode_object (o)
if isinstance (o, float) : # Use scientific notation for floats
return format (o, 'g')
return json.dumps (
o,
skipkeys = me.skipkeys,
ensure_ascii = me.ensure_ascii,
check_circular = me.check_circular,
allow_nan = me.allow_nan,
sort_keys = me.sort_keys,
indent = me.indent,
separators = (me.item_separator, me.key_separator),
default = me.default if hasattr (me, 'default') else None,
)
def _encode_list (me, o) :
if me._put_on_single_line (o) :
return '[' + ', '.join (me.encode (el) for el in o) + ']'
me.indentation_level += 1
output = [me.indent_str + me.encode (el) for el in o]
me.indentation_level -= 1
return '[
' + ',
'.join (output) + '
' + me.indent_str + ']'
def _encode_object (me, o) :
if not o :
return '{}'
# ensure keys are converted to strings
o = {str (k) if k is not None else 'null' : v for k, v in o.items ()}
if me.sort_keys :
o = dict (sorted (o.items (), key=lambda x : x[0]))
if me._put_on_single_line (o) :
return ('{ ' +
', '.join (f'{json.dumps (k)} : {me.encode (el)}' for k, el in o.items ())
+ ' }'
)
me.indentation_level += 1
output = [
f'{me.indent_str}{json.dumps (k)} : {me.encode (v)}' for k, v in o.items ()
]
me.indentation_level -= 1
return '{
' + ',
'.join (output) + '
' + me.indent_str + '}'
def iterencode (me, o, **kwargs) :
'''Required to also work with `json.dump`.'''
return me.encode (o)
def _put_on_single_line (me, o) :
return (
#me._primitives_only (o) and ## changed for OP's requirements
len (o) <= me.MAX_ITEMS
and len (str (o)) - 2 <= me.MAX_WIDTH
)
#def _primitives_only (me, o : list | tuple | dict) : # remove useless type annotations
def _primitives_only (me, o) :
if isinstance (o, (list, tuple)) :
return not any (isinstance (el, me.CONTAINER_TYPES) for el in o)
elif isinstance (o, dict) :
return not any (isinstance (el, me.CONTAINER_TYPES) for el in o.values ())
@property
def indent_str (me) -> str :
if isinstance (me.indent, int) :
return ' ' * (me.indentation_level * me.indent)
elif isinstance (me.indent, str) :
return me.indentation_level * me.indent
else :
raise ValueError (
f'indent must either be of type int or str (is : {type (me.indent)})'
)
解决方案 13:
这个解决方案不像其他解决方案那么优雅和通用,您不会从中学到很多东西,但它快速而简单。
def custom_print(data_structure, indent):
for key, value in data_structure.items():
print "
%s%s:" % (' '*indent,str(key)),
if isinstance(value, dict):
custom_print(value, indent+1)
else:
print "%s" % (str(value)),
使用和输出:
>>> custom_print(data_structure,1)
layer1:
layer2:
layer3_2: string
layer3_1: [{'y': 7, 'x': 1}, {'y': 4, 'x': 0}, {'y': 3, 'x': 5}, {'y': 9, 'x': 6}]
解决方案 14:
附注一下,该网站具有内置 JavaScript,当 JSON 字符串中的行少于 70 个字符时,它将避免换行:
http://www.csvjson.com/json_beautifier
(使用JSON-js的修改版本实现)
选择“内联短数组”
非常适合快速查看复制缓冲区中的数据。
解决方案 15:
这是一个相当老的问题,但以下是一个解决方案,它将 JSON 缩进到最大嵌套深度。如果对象嵌套深度超过indent_max_depth
,则输出 JSON 是扁平的。
该代码是对 cpython/Lib/json/encoder.py 文件的修改。 抱歉,它有点长。
import json
from json.encoder import encode_basestring, encode_basestring_ascii, INFINITY
class JSONMaxDepthEncoder(json.JSONEncoder):
def __init__(
self,
*,
skipkeys: bool=False,
ensure_ascii: bool=True,
check_circular: bool=True,
allow_nan: bool=True,
sort_keys: bool=False,
indent: int|str=None,
separators: tuple[str,str]=None,
default: callable=None,
indent_max_depth: int=3
) -> None:
"""
JSON encoder that indents upto indent_max_depth.
"""
super().__init__(
skipkeys=skipkeys,
ensure_ascii=ensure_ascii,
check_circular=check_circular,
allow_nan=allow_nan,
sort_keys=sort_keys,
indent=indent,
separators=separators,
default=default,
)
self.indent_max_depth = indent_max_depth
self._level = 0
def iterencode(self, o, _one_shot=False):
"""Encode the given object and yield each string
representation as available.
For example::
for chunk in JSONEncoder().iterencode(bigobject):
mysocket.write(chunk)
"""
if self.check_circular:
markers = {}
else:
markers = None
if self.ensure_ascii:
_encoder = encode_basestring_ascii
else:
_encoder = encode_basestring
def floatstr(o, allow_nan=self.allow_nan,
_repr=float.__repr__, _inf=INFINITY, _neginf=-INFINITY):
# Check for specials. Note that this type of test is processor
# and/or platform-specific, so do tests which don't depend on the
# internals.
if o != o:
text = 'NaN'
elif o == _inf:
text = 'Infinity'
elif o == _neginf:
text = '-Infinity'
else:
return _repr(o)
if not allow_nan:
raise ValueError(
"Out of range float values are not JSON compliant: " +
repr(o))
return text
_iterencode = _make_iterencode(
markers, self.default, _encoder, self.indent, floatstr,
self.key_separator, self.item_separator, self.sort_keys,
self.skipkeys, _one_shot, self.indent_max_depth)
return _iterencode(o, 0)
def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
_key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
indent_max_depth,
## HACK: hand-optimized bytecode; turn globals into locals
ValueError=ValueError,
dict=dict,
float=float,
id=id,
int=int,
isinstance=isinstance,
list=list,
str=str,
tuple=tuple,
_intstr=int.__repr__,
):
if _indent is not None and not isinstance(_indent, str):
_indent = ' ' * _indent
def _iterencode_list(lst, current_indent_level, indent_max_depth):
if not lst:
yield '[]'
return
if markers is not None:
markerid = id(lst)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = lst
buf = '['
if _indent is not None:
current_indent_level += 1
newline_indent = (
'
' + _indent * current_indent_level
if current_indent_level <= indent_max_depth
else ''
)
separator = _item_separator + newline_indent
buf += newline_indent
else:
newline_indent = None
separator = _item_separator
first = True
for value in lst:
if first:
first = False
else:
buf = separator
if isinstance(value, str):
yield buf + _encoder(value)
elif value is None:
yield buf + 'null'
elif value is True:
yield buf + 'true'
elif value is False:
yield buf + 'false'
elif isinstance(value, int):
yield buf + _intstr(value)
elif isinstance(value, float):
yield buf + _floatstr(value)
else:
yield buf
if isinstance(value, (list, tuple)):
chunks = _iterencode_list(value, current_indent_level, indent_max_depth)
elif isinstance(value, dict):
chunks = _iterencode_dict(value, current_indent_level, indent_max_depth)
else:
chunks = _iterencode(value, current_indent_level, indent_max_depth)
yield from chunks
if newline_indent is not None:
current_indent_level -= 1
if current_indent_level < indent_max_depth:
yield '
' + _indent * current_indent_level
yield ']'
if markers is not None:
del markers[markerid]
def _iterencode_dict(dct, current_indent_level, indent_max_depth):
if not dct:
yield '{}'
return
if markers is not None:
markerid = id(dct)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = dct
yield '{'
if _indent is not None:
current_indent_level += 1
newline_indent = (
'
' + _indent * current_indent_level
if current_indent_level <= indent_max_depth
else ''
)
item_separator = _item_separator + newline_indent
yield newline_indent
else:
newline_indent = None
item_separator = _item_separator
first = True
if _sort_keys:
items = sorted(dct.items())
else:
items = dct.items()
for key, value in items:
if isinstance(key, str):
pass
elif isinstance(key, float):
key = _floatstr(key)
elif key is True:
key = 'true'
elif key is False:
key = 'false'
elif key is None:
key = 'null'
elif isinstance(key, int):
key = _intstr(key)
elif _skipkeys:
continue
else:
raise TypeError(f'keys must be str, int, float, bool or None, '
f'not {key.__class__.__name__}')
if first:
first = False
else:
yield item_separator
yield _encoder(key)
yield _key_separator
if isinstance(value, str):
yield _encoder(value)
elif value is None:
yield 'null'
elif value is True:
yield 'true'
elif value is False:
yield 'false'
elif isinstance(value, int):
yield _intstr(value)
elif isinstance(value, float):
yield _floatstr(value)
else:
if isinstance(value, (list, tuple)):
chunks = _iterencode_list(value, current_indent_level, indent_max_depth)
elif isinstance(value, dict):
chunks = _iterencode_dict(value, current_indent_level, indent_max_depth)
else:
chunks = _iterencode(value, current_indent_level, indent_max_depth)
yield from chunks
if newline_indent is not None:
current_indent_level -= 1
if current_indent_level < indent_max_depth:
yield '
' + _indent * current_indent_level
yield '}'
if markers is not None:
del markers[markerid]
def _iterencode(o, current_indent_level, indent_max_depth=indent_max_depth):
if isinstance(o, str):
yield _encoder(o)
elif o is None:
yield 'null'
elif o is True:
yield 'true'
elif o is False:
yield 'false'
elif isinstance(o, int):
yield _intstr(o)
elif isinstance(o, float):
yield _floatstr(o)
elif isinstance(o, (list, tuple)):
yield from _iterencode_list(o, current_indent_level, indent_max_depth)
elif isinstance(o, dict):
yield from _iterencode_dict(o, current_indent_level, indent_max_depth)
else:
if markers is not None:
markerid = id(o)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = o
o = _default(o)
yield from _iterencode(o, current_indent_level, indent_max_depth)
if markers is not None:
del markers[markerid]
return _iterencode
使用方法如下:
data = {
'layer1': {
'layer2': {
'layer3_1': [
{'x': 1, 'y': 7},
{'x': 0, 'y': 4},
{'x': 5, 'y': 3},
{'x': 6, 'y': 9}
],
'layer3_2': 'string'
}
}
}
encoder = JSONMaxDepthEncoder(indent=2, indent_max_depth=3)
print(encoder.encode(data))
# prints:
{
"layer1": {
"layer2": {
"layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
"layer3_2": "string"
}
}
}
直接写入文件:
with open('data.json', 'w') as fp:
for chunk in encoder.iterencode(data):
fp.write(chunk)
扫码咨询,免费领取项目管理大礼包!