如何将 JSON 转换为 CSV?
- 2024-11-28 08:38:00
- admin 原创
- 210
问题描述:
我有一个 JSON 文件,想将其转换为 CSV 文件。如何使用 Python 来实现?
我试过:
import json
import csv
f = open('data.json')
data = json.load(f)
f.close()
f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
csv_file.writerow(item)
f.close()
但是,它不起作用。我正在使用 Django,收到的错误是:
`file' object has no attribute 'writerow'`
然后我尝试了以下操作:
import json
import csv
f = open('data.json')
data = json.load(f)
f.close()
f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
f.writerow(item) # ← changed
f.close()
然后我收到错误:
`sequence expected`
示例 json 文件:
[
{
"pk": 22,
"model": "auth.permission",
"fields": {
"codename": "add_logentry",
"name": "Can add log entry",
"content_type": 8
}
},
{
"pk": 23,
"model": "auth.permission",
"fields": {
"codename": "change_logentry",
"name": "Can change log entry",
"content_type": 8
}
},
{
"pk": 24,
"model": "auth.permission",
"fields": {
"codename": "delete_logentry",
"name": "Can delete log entry",
"content_type": 8
}
},
{
"pk": 4,
"model": "auth.permission",
"fields": {
"codename": "add_group",
"name": "Can add group",
"content_type": 2
}
},
{
"pk": 10,
"model": "auth.permission",
"fields": {
"codename": "add_message",
"name": "Can add message",
"content_type": 4
}
}
]
解决方案 1:
有了这个pandas
库,这就像使用两个命令一样简单!
df = pd.read_json()
read_json将 JSON 字符串转换为 pandas 对象(系列或数据框)。然后:
df.to_csv()
它可以返回字符串或直接写入 csv 文件。请参阅to_csv的文档。
根据以前答案的详细程度,我们都应该感谢熊猫提供的捷径。
对于非结构化 JSON,请参阅此答案。
编辑:有人要求提供一个可行的最小示例:
import pandas as pd
with open('jsonfile.json', encoding='utf-8') as inputfile:
df = pd.read_json(inputfile)
df.to_csv('csvfile.csv', encoding='utf-8', index=False)
解决方案 2:
首先,您的 JSON 具有嵌套对象,因此通常无法直接转换为 CSV。您需要将其更改为如下内容:
{
"pk": 22,
"model": "auth.permission",
"codename": "add_logentry",
"content_type": 8,
"name": "Can add log entry"
},
......]
下面是我从中生成 CSV 的代码:
import csv
import json
x = """[
{
"pk": 22,
"model": "auth.permission",
"fields": {
"codename": "add_logentry",
"name": "Can add log entry",
"content_type": 8
}
},
{
"pk": 23,
"model": "auth.permission",
"fields": {
"codename": "change_logentry",
"name": "Can change log entry",
"content_type": 8
}
},
{
"pk": 24,
"model": "auth.permission",
"fields": {
"codename": "delete_logentry",
"name": "Can delete log entry",
"content_type": 8
}
}
]"""
x = json.loads(x)
f = csv.writer(open("test.csv", "wb+"))
# Write CSV Header, If you dont need that, remove this line
f.writerow(["pk", "model", "codename", "name", "content_type"])
for x in x:
f.writerow([x["pk"],
x["model"],
x["fields"]["codename"],
x["fields"]["name"],
x["fields"]["content_type"]])
您将获得以下输出:
pk,model,codename,name,content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
解决方案 3:
我假设您的 JSON 文件将解码为一个字典列表。首先,我们需要一个将 JSON 对象展平的函数:
def flattenjson(b, delim):
val = {}
for i in b.keys():
if isinstance(b[i], dict):
get = flattenjson(b[i], delim)
for j in get.keys():
val[i + delim + j] = get[j]
else:
val[i] = b[i]
return val
在 JSON 对象上运行此代码片段的结果:
flattenjson({
"pk": 22,
"model": "auth.permission",
"fields": {
"codename": "add_message",
"name": "Can add message",
"content_type": 8
}
}, "__")
是
{
"pk": 22,
"model": "auth.permission",
"fields__codename": "add_message",
"fields__name": "Can add message",
"fields__content_type": 8
}
将此函数应用于 JSON 对象输入数组中的每个字典后:
input = map(lambda x: flattenjson( x, "__" ), input)
并找到相关的列名:
columns = [x for row in input for x in row.keys()]
columns = list(set(columns))
通过 csv 模块运行这并不难:
with open(fname, 'wb') as out_file:
csv_w = csv.writer(out_file)
csv_w.writerow(columns)
for i_r in input:
csv_w.writerow(map(lambda x: i_r.get(x, ""), columns))
解决方案 4:
使用json_normalize
自pandas
:
在名为 的文件中,使用来自 OP 的样本数据
test.json
。encoding='utf-8'
已经在这里使用过,但在其他情况下可能没有必要。以下代码利用了该
pathlib
库。.open
是 的一种方法pathlib
。也适用于非 Windows 路径。
用于
pandas.to_csv(...)
将数据保存到 csv 文件。
import pandas as pd
from pathlib import Path
import json
# set path to file
p = Path(r'c:some_path_to_file est.json')
# read json
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# create dataframe
df = pd.json_normalize(data)
# dataframe view
pk model fields.codename fields.name fields.content_type
22 auth.permission add_logentry Can add log entry 8
23 auth.permission change_logentry Can change log entry 8
24 auth.permission delete_logentry Can delete log entry 8
4 auth.permission add_group Can add group 2
10 auth.permission add_message Can add message 4
# save to csv
df.to_csv('test.csv', index=False, encoding='utf-8')
CSV 输出:
pk,model,fields.codename,fields.name,fields.content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
4,auth.permission,add_group,Can add group,2
10,auth.permission,add_message,Can add message,4
更多嵌套 JSON 对象的资源:
所以答案:
使用 python 展平 JSON 数组
如何使用 flatten_json 递归展平嵌套 JSON
如何对包含 NaN 的列进行 json_normalize
使用 Pandas 将一列字典拆分/分解为单独的列
查看json-规范化标签以标记其他相关问题。
解决方案 5:
JSON 可以表示各种各样的数据结构——JS“对象”大致类似于 Python 字典(带有字符串键),JS“数组”大致类似于 Python 列表,并且可以嵌套它们,只要最终的“叶”元素是数字或字符串即可。
CSV 本质上只能表示一个二维表格——可选地带有第一行“标题”,即“列名”,这可以使表格可解释为字典列表,而不是通常的解释,即列表列表(再次,“叶”元素可以是数字或字符串)。
因此,在一般情况下,您无法将任意 JSON 结构转换为 CSV。在少数特殊情况下,您可以这样做(没有进一步嵌套的数组数组;所有对象数组都具有完全相同的键)。哪种特殊情况(如果有)适用于您的问题?解决方案的细节取决于您遇到的特殊情况。鉴于您甚至没有提到哪种情况适用这一令人惊讶的事实,我怀疑您可能没有考虑到约束,实际上两种可用情况都不适用,您的问题无法解决。但请澄清一下!
解决方案 6:
将任何平面对象的 json 列表转换为 csv 的通用解决方案。
将 input.json 文件作为命令行上的第一个参数传递。
import csv, json, sys
input = open(sys.argv[1])
data = json.load(input)
input.close()
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values())
解决方案 7:
假设您的 JSON 数据位于名为 的文件中,此代码应该适合您data.json
。
import json
import csv
with open("data.json") as file:
data = json.load(file)
with open("data.csv", "w") as file:
csv_file = csv.writer(file)
for item in data:
fields = list(item['fields'].values())
csv_file.writerow([item['pk'], item['model']] + fields)
解决方案 8:
使用起来很简单csv.DictWriter()
,具体实现如下:
def read_json(filename):
return json.loads(open(filename).read())
def write_csv(data,filename):
with open(filename, 'w+') as outf:
writer = csv.DictWriter(outf, data[0].keys())
writer.writeheader()
for row in data:
writer.writerow(row)
# implement
write_csv(read_json('test.json'), 'output.csv')
请注意,这假设所有 JSON 对象都具有相同的字段。
以下是可能对您有帮助的参考资料。
解决方案 9:
我对Dan 提出的解决方案感到困惑,但这对我有用:
import json
import csv
f = open('test.json')
data = json.load(f)
f.close()
f=csv.writer(open('test.csv','wb+'))
for item in data:
f.writerow([item['pk'], item['model']] + item['fields'].values())
其中“test.json”包含以下内容:
[
{"pk": 22, "model": "auth.permission", "fields":
{"codename": "add_logentry", "name": "Can add log entry", "content_type": 8 } },
{"pk": 23, "model": "auth.permission", "fields":
{"codename": "change_logentry", "name": "Can change log entry", "content_type": 8 } }, {"pk": 24, "model": "auth.permission", "fields":
{"codename": "delete_logentry", "name": "Can delete log entry", "content_type": 8 } }
]
解决方案 10:
这是@MikeRepass答案的修改。此版本将CSV写入文件,适用于Python 2和Python 3。
import csv,json
input_file="data.json"
output_file="data.csv"
with open(input_file) as f:
content=json.load(f)
try:
context=open(output_file,'w',newline='') # Python 3
except TypeError:
context=open(output_file,'wb') # Python 2
with context as file:
writer=csv.writer(file)
writer.writerow(content[0].keys()) # header row
for row in content:
writer.writerow(row.values())
解决方案 11:
Alec 的回答很棒,但在存在多层嵌套的情况下不起作用。这是一个支持多层嵌套的修改版本。如果嵌套对象已经指定了自己的键(例如 Firebase Analytics / BigTable / BigQuery 数据),它还会使标头名称更美观一些:
"""Converts JSON with nested fields into a flattened CSV file.
"""
import sys
import json
import csv
import os
import jsonlines
from orderedset import OrderedSet
# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
if val is None:
val = {}
if isinstance( b, dict ):
for j in b.keys():
flattenjson(b[j], prefix + delim + j, delim, val)
elif isinstance( b, list ):
get = b
for j in range(len(get)):
key = str(j)
# If the nested data contains its own key, use that as the header instead.
if isinstance( get[j], dict ):
if 'key' in get[j]:
key = get[j]['key']
flattenjson(get[j], prefix + delim + key, delim, val)
else:
val[prefix] = b
return val
def main(argv):
if len(argv) < 2:
raise Error('Please specify a JSON file to parse')
print "Loading and Flattening..."
filename = argv[1]
allRows = []
fieldnames = OrderedSet()
with jsonlines.open(filename) as reader:
for obj in reader:
# print 'orig:
'
# print obj
flattened = flattenjson(obj)
#print 'keys: %s' % flattened.keys()
# print 'flattened:
'
# print flattened
fieldnames.update(flattened.keys())
allRows.append(flattened)
print "Exporting to CSV..."
outfilename = filename + '.csv'
count = 0
with open(outfilename, 'w') as file:
csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
csvwriter.writeheader()
for obj in allRows:
# print 'allRows:
'
# print obj
csvwriter.writerow(obj)
count += 1
print "Wrote %d rows" % count
if __name__ == '__main__':
main(sys.argv)
解决方案 12:
如前面的答案所述,将 json 转换为 csv 的困难在于,json 文件可以包含嵌套字典,因此是多维数据结构,而 csv 是二维数据结构。但是,将多维结构转换为 csv 的一个好方法是使用主键将多个 csv 绑定在一起。
在您的示例中,第一个 csv 输出包含列“pk”、“model”、“fields”。 “pk”和“model”的值很容易获取,但由于“fields”列包含一个字典,因此它应该是自己的 csv,并且由于“codename”似乎是主键,因此您可以将其用作“fields”的输入来完成第一个 csv。第二个 csv 包含来自“fields”列的字典,其中 codename 是主键,可用于将 2 个 csv 绑定在一起。
这是针对您的 json 文件的解决方案,它将嵌套字典转换为 2 个 csv。
import csv
import json
def readAndWrite(inputFileName, primaryKey=""):
input = open(inputFileName+".json")
data = json.load(input)
input.close()
header = set()
if primaryKey != "":
outputFileName = inputFileName+"-"+primaryKey
if inputFileName == "data":
for i in data:
for j in i["fields"].keys():
if j not in header:
header.add(j)
else:
outputFileName = inputFileName
for i in data:
for j in i.keys():
if j not in header:
header.add(j)
with open(outputFileName+".csv", 'wb') as output_file:
fieldnames = list(header)
writer = csv.DictWriter(output_file, fieldnames, delimiter=',', quotechar='"')
writer.writeheader()
for x in data:
row_value = {}
if primaryKey == "":
for y in x.keys():
yValue = x.get(y)
if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
row_value[y] = str(yValue).encode('utf8')
elif type(yValue) != dict:
row_value[y] = yValue.encode('utf8')
else:
if inputFileName == "data":
row_value[y] = yValue["codename"].encode('utf8')
readAndWrite(inputFileName, primaryKey="codename")
writer.writerow(row_value)
elif primaryKey == "codename":
for y in x["fields"].keys():
yValue = x["fields"].get(y)
if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
row_value[y] = str(yValue).encode('utf8')
elif type(yValue) != dict:
row_value[y] = yValue.encode('utf8')
writer.writerow(row_value)
readAndWrite("data")
解决方案 13:
我知道这个问题已经很久没有被问过了,但我想我可以补充其他人的答案,并分享一篇博客文章,我认为它可以以非常简洁的方式解释解决方案。
以下是链接
打开文件进行写入
employ_data = open('/tmp/EmployData.csv', 'w')
创建 csv 写入器对象
csvwriter = csv.writer(employ_data)
count = 0
for emp in emp_data:
if count == 0:
header = emp.keys()
csvwriter.writerow(header)
count += 1
csvwriter.writerow(emp.values())
确保关闭文件以保存内容
employ_data.close()
解决方案 14:
这不是一种非常聪明的方法,但我遇到了同样的问题,而且这对我有用:
import csv
f = open('data.json')
data = json.load(f)
f.close()
new_data = []
for i in data:
flat = {}
names = i.keys()
for n in names:
try:
if len(i[n].keys()) > 0:
for ii in i[n].keys():
flat[n+"_"+ii] = i[n][ii]
except:
flat[n] = i[n]
new_data.append(flat)
f = open(filename, "r")
writer = csv.DictWriter(f, new_data[0].keys())
writer.writeheader()
for row in new_data:
writer.writerow(row)
f.close()
解决方案 15:
令人惊讶的是,我发现到目前为止这里发布的所有答案都不能正确处理所有可能的情况(例如嵌套字典、嵌套列表、无值等)。
此解决方案应适用于所有场景:
def flatten_json(json):
def process_value(keys, value, flattened):
if isinstance(value, dict):
for key in value.keys():
process_value(keys + [key], value[key], flattened)
elif isinstance(value, list):
for idx, v in enumerate(value):
process_value(keys + [str(idx)], v, flattened)
else:
flattened['__'.join(keys)] = value
flattened = {}
for key in json.keys():
process_value([key], json[key], flattened)
return flattened
解决方案 16:
我解决这个问题的简单方法是:
创建一个新的 Python 文件,如:json_to_csv.py
添加以下代码:
import csv, json, sys
#if you are not using utf-8 files, remove the next line
sys.setdefaultencoding("UTF-8")
#check if you pass the input file and output file
if sys.argv[1] is not None and sys.argv[2] is not None:
fileInput = sys.argv[1]
fileOutput = sys.argv[2]
inputFile = open(fileInput)
outputFile = open(fileOutput, 'w')
data = json.load(inputFile)
inputFile.close()
output = csv.writer(outputFile)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values())
添加此代码后,保存文件并在终端运行:
python json_to_csv.py 输入.txt 输出.csv
我希望这对你有帮助。
再见!
解决方案 17:
此代码适用于任何给定的 json 文件
# -*- coding: utf-8 -*-
"""
Created on Mon Jun 17 20:35:35 2019
author: Ram
"""
import json
import csv
with open("file1.json") as file:
data = json.load(file)
# create the csv writer object
pt_data1 = open('pt_data1.csv', 'w')
csvwriter = csv.writer(pt_data1)
count = 0
for pt in data:
if count == 0:
header = pt.keys()
csvwriter.writerow(header)
count += 1
csvwriter.writerow(pt.values())
pt_data1.close()
解决方案 18:
如果我们考虑以下将 json 格式文件转换为 csv 格式文件的示例。
{
"item_data" : [
{
"item": "10023456",
"class": "100",
"subclass": "123"
}
]
}
下面的代码将 json 文件(data3.json)转换为 csv 文件(data3.csv)。
import json
import csv
with open("/Users/Desktop/json/data3.json") as file:
data = json.load(file)
file.close()
print(data)
fname = "/Users/Desktop/json/data3.csv"
with open(fname, "w", newline='') as file:
csv_file = csv.writer(file)
csv_file.writerow(['dept',
'class',
'subclass'])
for item in data["item_data"]:
csv_file.writerow([item.get('item_data').get('dept'),
item.get('item_data').get('class'),
item.get('item_data').get('subclass')])
上述代码已在本地安装的 pycharm 中执行,并已成功将 json 文件转换为 csv 文件。希望这有助于转换文件。
解决方案 19:
效果还不错。它将 json 压缩并写入 csv 文件。嵌套元素已管理 :)
这是针对 Python 3 的
import json
o = json.loads('your json string') # Be careful, o must be a list, each of its objects will make a line of the csv.
def flatten(o, k='/'):
global l, c_line
if isinstance(o, dict):
for key, value in o.items():
flatten(value, k + '/' + key)
elif isinstance(o, list):
for ov in o:
flatten(ov, '')
elif isinstance(o, str):
o = o.replace('
',' ').replace('
',' ').replace(';', ',')
if not k in l:
l[k]={}
l[k][c_line]=o
def render_csv(l):
ftime = True
for i in range(100): #len(l[list(l.keys())[0]])
for k in l:
if ftime :
print('%s;' % k, end='')
continue
v = l[k]
try:
print('%s;' % v[i], end='')
except:
print(';', end='')
print()
ftime = False
i = 0
def json_to_csv(object_list):
global l, c_line
l = {}
c_line = 0
for ov in object_list : # Assumes json is a list of objects
flatten(ov)
c_line += 1
render_csv(l)
json_to_csv(o)
享受。
解决方案 20:
修改了 Alec McGail 的答案,以支持包含列表的 JSON
def flattenjson(self, mp, delim="|"):
ret = []
if isinstance(mp, dict):
for k in mp.keys():
csvs = self.flattenjson(mp[k], delim)
for csv in csvs:
ret.append(k + delim + csv)
elif isinstance(mp, list):
for k in mp:
csvs = self.flattenjson(k, delim)
for csv in csvs:
ret.append(csv)
else:
ret.append(mp)
return ret
谢谢!
解决方案 21:
import json,csv
t=''
t=(type('a'))
json_data = []
data = None
write_header = True
item_keys = []
try:
with open('kk.json') as json_file:
json_data = json_file.read()
data = json.loads(json_data)
except Exception as e:
print( e)
with open('bar.csv', 'at') as csv_file:
writer = csv.writer(csv_file)#, quoting=csv.QUOTE_MINIMAL)
for item in data:
item_values = []
for key in item:
if write_header:
item_keys.append(key)
value = item.get(key, '')
if (type(value)==t):
item_values.append(value.encode('utf-8'))
else:
item_values.append(value)
if write_header:
writer.writerow(item_keys)
write_header = False
writer.writerow(item_values)
解决方案 22:
由于数据似乎是字典格式,因此您似乎应该使用 csv.DictWriter() 来实际输出带有适当标题信息的行。这应该可以让转换处理得更容易一些。然后,fieldnames 参数将正确设置顺序,而第一行作为标题的输出将允许它稍后由 csv.DictReader() 读取和处理。
例如,Mike Repass 使用
output = csv.writer(sys.stdout)
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values())
但是只需将初始设置更改为 output = csv.DictWriter(filesetting, fieldnames=data[0].keys())
请注意,由于字典中元素的顺序未定义,因此您可能必须明确创建字段名条目。一旦您这样做,writerow 就会起作用。然后写入将按最初显示的方式工作。
解决方案 23:
不幸的是,我没有足够的声誉来为出色的@Alec McGail答案做出一点贡献。我使用的是Python3,我需要按照@Alexis R评论将地图转换为列表。
此外,我发现 csv 编写器在文件中添加了额外的 CR(我在 csv 文件中为每行数据都留了一个空行)。解决方案非常简单,请按照 @Jason R. Coombs 对此主题的回答:
CSV in Python 添加额外的回车符
您只需将 lineterminator='\n' 参数添加到 csv.writer 即可。它将是:`csv_w = csv.writer( out_file, lineterminator='
' )`
解决方案 24:
您可以使用此代码将 json 文件转换为 csv 文件。读取文件后,我将对象转换为 pandas 数据框,然后将其保存为 CSV 文件
import os
import pandas as pd
import json
import numpy as np
data = []
os.chdir('D:\\Your_directory\\folder')
with open('file_name.json', encoding="utf8") as data_file:
for line in data_file:
data.append(json.loads(line))
dataframe = pd.DataFrame(data)
## Saving the dataframe to a csv file
dataframe.to_csv("filename.csv", encoding='utf-8',index= False)
解决方案 25:
我已经尝试了很多建议的解决方案(Panda 也未能正确规范化我的 JSON),但真正好的解决方案是正确解析 JSON 数据,它来自Max Berman。
我编写了一个改进程序,以避免在解析期间为每行添加新列并将其放入现有列中。如果只有一个数据存在,它还可以将值存储为字符串,如果该列有更多值,则创建一个列表。
它以 input.json 文件作为输入并输出 output.csv。
import json
import pandas as pd
def same_length(flattened: dict):
max = 0
for key in flattened.keys():
if isinstance(flattened[key], list):
if len(flattened[key]) > max:
max = len(flattened[key])
for key in flattened.keys():
if isinstance(flattened[key], list):
if len(flattened[key]) < max:
for i in range(max - len(flattened[key])):
flattened[key].append(None)
return flattened
def process_value(keys, value, flattened):
if isinstance(value, dict):
for key in value.keys():
process_value(keys + [key], value[key], flattened)
elif isinstance(value, list):
for idx, v in enumerate(value):
process_value(keys, v, flattened)
else:
jkey = '__'.join(keys)
if not flattened.get(jkey) is None:
if isinstance(flattened[jkey], list):
flattened[jkey] = flattened[jkey] + [value]
else:
flattened[jkey] = [flattened[jkey]] + [value]
else:
flattened[jkey] = value
def flatten_json(json):
flattened_result = {}
json_list = []
if isinstance(json, dict):
json_list.append(json)
elif isinstance(json, list):
json_list = json
else:
print("JSON object must be a dict or list instance, but is type " + str(type(json)))
return {}
for j in json_list:
for key in j.keys():
process_value([key], j[key], flattened_result)
return flattened_result
try:
f = open("input.json", "r")
except (FileNotFoundError, PermissionError, OSError):
print("Error opening file")
exit(1)
y = json.loads(f.read())
flat = flatten_json(y)
df = pd.DataFrame.from_dict(same_length(flat), orient='columns')
df.to_csv('output.csv', index=False, encoding='utf-8')
解决方案 26:
我可能迟到了,但我想我已经处理过类似的问题。我有一个像这样的 json 文件
我只想从这些 json 文件中提取一些键/值。因此,我编写了以下代码来提取相同的内容。
"""json_to_csv.py
This script reads n numbers of json files present in a folder and then extract certain data from each file and write in a csv file.
The folder contains the python script i.e. json_to_csv.py, output.csv and another folder descriptions containing all the json files.
"""
import os
import json
import csv
def get_list_of_json_files():
"""Returns the list of filenames of all the Json files present in the folder
Parameter
---------
directory : str
'descriptions' in this case
Returns
-------
list_of_files: list
List of the filenames of all the json files
"""
list_of_files = os.listdir('descriptions') # creates list of all the files in the folder
return list_of_files
def create_list_from_json(jsonfile):
"""Returns a list of the extracted items from json file in the same order we need it.
Parameter
_________
jsonfile : json
The json file containing the data
Returns
-------
one_sample_list : list
The list of the extracted items needed for the final csv
"""
with open(jsonfile) as f:
data = json.load(f)
data_list = [] # create an empty list
# append the items to the list in the same order.
data_list.append(data['_id'])
data_list.append(data['_modelType'])
data_list.append(data['creator']['_id'])
data_list.append(data['creator']['name'])
data_list.append(data['dataset']['_accessLevel'])
data_list.append(data['dataset']['_id'])
data_list.append(data['dataset']['description'])
data_list.append(data['dataset']['name'])
data_list.append(data['meta']['acquisition']['image_type'])
data_list.append(data['meta']['acquisition']['pixelsX'])
data_list.append(data['meta']['acquisition']['pixelsY'])
data_list.append(data['meta']['clinical']['age_approx'])
data_list.append(data['meta']['clinical']['benign_malignant'])
data_list.append(data['meta']['clinical']['diagnosis'])
data_list.append(data['meta']['clinical']['diagnosis_confirm_type'])
data_list.append(data['meta']['clinical']['melanocytic'])
data_list.append(data['meta']['clinical']['sex'])
data_list.append(data['meta']['unstructured']['diagnosis'])
# In few json files, the race was not there so using KeyError exception to add '' at the place
try:
data_list.append(data['meta']['unstructured']['race'])
except KeyError:
data_list.append("") # will add an empty string in case race is not there.
data_list.append(data['name'])
return data_list
def write_csv():
"""Creates the desired csv file
Parameters
__________
list_of_files : file
The list created by get_list_of_json_files() method
result.csv : csv
The csv file containing the header only
Returns
_______
result.csv : csv
The desired csv file
"""
list_of_files = get_list_of_json_files()
for file in list_of_files:
row = create_list_from_json(f'descriptions/{file}') # create the row to be added to csv for each file (json-file)
with open('output.csv', 'a') as c:
writer = csv.writer(c)
writer.writerow(row)
c.close()
if __name__ == '__main__':
write_csv()
希望对您有所帮助。有关此代码如何工作的详细信息,您可以在此处查看
扫码咨询,免费领取项目管理大礼包!