我有一个JSON文件,我想转换为CSV文件。我如何用Python做到这一点?

我试着:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()

然而,这并没有起作用。我正在使用Django和我收到的错误是:

`file' object has no attribute 'writerow'`

然后我尝试了以下方法:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ← changed

f.close()

然后得到错误:

`sequence expected`

样本json文件:

[{
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    }, {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    }, {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }, {
        "pk": 4,
        "model": "auth.permission",
        "fields": {
            "codename": "add_group",
            "name": "Can add group",
            "content_type": 2
        }
    }, {
        "pk": 10,
        "model": "auth.permission",
        "fields": {
            "codename": "add_message",
            "name": "Can add message",
            "content_type": 4
        }
    }
]

当前回答

JSON可以表示各种各样的数据结构——JS的“对象”大致类似于Python的dict(带有字符串键),JS的“数组”大致类似于Python列表,只要最后的“叶子”元素是数字或字符串,你就可以嵌套它们。

CSV本质上只能表示一个2-D表——可选的第一行是“标题”,即“列名”,这可以使表可解释为字典列表,而不是正常的解释,一个列表的列表(同样,“叶子”元素可以是数字或字符串)。

So, in the general case, you can't translate an arbitrary JSON structure to a CSV. In a few special cases you can (array of arrays with no further nesting; arrays of objects which all have exactly the same keys). Which special case, if any, applies to your problem? The details of the solution depend on which special case you do have. Given the astonishing fact that you don't even mention which one applies, I suspect you may not have considered the constraint, neither usable case in fact applies, and your problem is impossible to solve. But please do clarify!

其他回答

我可能迟到了,但我想,我已经处理过类似的问题。我有一个json文件,看起来像这样

我只想从这些json文件中提取一些键/值。因此,我编写了下面的代码来提取相同的内容。

    """json_to_csv.py
    This script reads n numbers of json files present in a folder and then extract certain data from each file and write in a csv file.
    The folder contains the python script i.e. json_to_csv.py, output.csv and another folder descriptions containing all the json files.
"""

import os
import json
import csv


def get_list_of_json_files():
    """Returns the list of filenames of all the Json files present in the folder
    Parameter
    ---------
    directory : str
        'descriptions' in this case
    Returns
    -------
    list_of_files: list
        List of the filenames of all the json files
    """

    list_of_files = os.listdir('descriptions')  # creates list of all the files in the folder

    return list_of_files


def create_list_from_json(jsonfile):
    """Returns a list of the extracted items from json file in the same order we need it.
    Parameter
    _________
    jsonfile : json
        The json file containing the data
    Returns
    -------
    one_sample_list : list
        The list of the extracted items needed for the final csv
    """

    with open(jsonfile) as f:
        data = json.load(f)

    data_list = []  # create an empty list

    # append the items to the list in the same order.
    data_list.append(data['_id'])
    data_list.append(data['_modelType'])
    data_list.append(data['creator']['_id'])
    data_list.append(data['creator']['name'])
    data_list.append(data['dataset']['_accessLevel'])
    data_list.append(data['dataset']['_id'])
    data_list.append(data['dataset']['description'])
    data_list.append(data['dataset']['name'])
    data_list.append(data['meta']['acquisition']['image_type'])
    data_list.append(data['meta']['acquisition']['pixelsX'])
    data_list.append(data['meta']['acquisition']['pixelsY'])
    data_list.append(data['meta']['clinical']['age_approx'])
    data_list.append(data['meta']['clinical']['benign_malignant'])
    data_list.append(data['meta']['clinical']['diagnosis'])
    data_list.append(data['meta']['clinical']['diagnosis_confirm_type'])
    data_list.append(data['meta']['clinical']['melanocytic'])
    data_list.append(data['meta']['clinical']['sex'])
    data_list.append(data['meta']['unstructured']['diagnosis'])
    # In few json files, the race was not there so using KeyError exception to add '' at the place
    try:
        data_list.append(data['meta']['unstructured']['race'])
    except KeyError:
        data_list.append("")  # will add an empty string in case race is not there.
    data_list.append(data['name'])

    return data_list


def write_csv():
    """Creates the desired csv file
    Parameters
    __________
    list_of_files : file
        The list created by get_list_of_json_files() method
    result.csv : csv
        The csv file containing the header only
    Returns
    _______
    result.csv : csv
        The desired csv file
    """

    list_of_files = get_list_of_json_files()
    for file in list_of_files:
        row = create_list_from_json(f'descriptions/{file}')  # create the row to be added to csv for each file (json-file)
        with open('output.csv', 'a') as c:
            writer = csv.writer(c)
            writer.writerow(row)
        c.close()


if __name__ == '__main__':
    write_csv()

我希望这能有所帮助。有关此代码如何工作的详细信息,请查看这里

正如在前面的回答中提到的,将json转换为csv的困难在于json文件可以包含嵌套字典,因此是多维数据结构,而csv是2D数据结构。但是,将多维结构转换为csv的一个好方法是使用多个主键连接在一起的csv。

在你的例子中,第一个csv输出的列是“pk”,“model”,“fields”。“pk”和“model”的值很容易获得,但因为“fields”列包含一个字典,它应该是它自己的csv,因为“codename”似乎是主键,你可以使用作为“fields”的输入来完成第一个csv。第二个csv包含来自“fields”列的字典,以codename作为主键,可用于将两个csv绑定在一起。

这是一个解决方案,为您的json文件转换嵌套字典2 csv。

import csv
import json

def readAndWrite(inputFileName, primaryKey=""):
    input = open(inputFileName+".json")
    data = json.load(input)
    input.close()

    header = set()

    if primaryKey != "":
        outputFileName = inputFileName+"-"+primaryKey
        if inputFileName == "data":
            for i in data:
                for j in i["fields"].keys():
                    if j not in header:
                        header.add(j)
    else:
        outputFileName = inputFileName
        for i in data:
            for j in i.keys():
                if j not in header:
                    header.add(j)

    with open(outputFileName+".csv", 'wb') as output_file:
        fieldnames = list(header)
        writer = csv.DictWriter(output_file, fieldnames, delimiter=',', quotechar='"')
        writer.writeheader()
        for x in data:
            row_value = {}
            if primaryKey == "":
                for y in x.keys():
                    yValue = x.get(y)
                    if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
                        row_value[y] = str(yValue).encode('utf8')
                    elif type(yValue) != dict:
                        row_value[y] = yValue.encode('utf8')
                    else:
                        if inputFileName == "data":
                            row_value[y] = yValue["codename"].encode('utf8')
                            readAndWrite(inputFileName, primaryKey="codename")
                writer.writerow(row_value)
            elif primaryKey == "codename":
                for y in x["fields"].keys():
                    yValue = x["fields"].get(y)
                    if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
                        row_value[y] = str(yValue).encode('utf8')
                    elif type(yValue) != dict:
                        row_value[y] = yValue.encode('utf8')
                writer.writerow(row_value)

readAndWrite("data")

JSON可以表示各种各样的数据结构——JS的“对象”大致类似于Python的dict(带有字符串键),JS的“数组”大致类似于Python列表,只要最后的“叶子”元素是数字或字符串,你就可以嵌套它们。

CSV本质上只能表示一个2-D表——可选的第一行是“标题”,即“列名”,这可以使表可解释为字典列表,而不是正常的解释,一个列表的列表(同样,“叶子”元素可以是数字或字符串)。

So, in the general case, you can't translate an arbitrary JSON structure to a CSV. In a few special cases you can (array of arrays with no further nesting; arrays of objects which all have exactly the same keys). Which special case, if any, applies to your problem? The details of the solution depend on which special case you do have. Given the astonishing fact that you don't even mention which one applies, I suspect you may not have considered the constraint, neither usable case in fact applies, and your problem is impossible to solve. But please do clarify!

修改了Alec McGail的答案,以支持包含列表的JSON

    def flattenjson(self, mp, delim="|"):
            ret = []
            if isinstance(mp, dict):
                    for k in mp.keys():
                            csvs = self.flattenjson(mp[k], delim)
                            for csv in csvs:
                                    ret.append(k + delim + csv)
            elif isinstance(mp, list):
                    for k in mp:
                            csvs = self.flattenjson(k, delim)
                            for csv in csvs:
                                    ret.append(csv)
            else:
                    ret.append(mp)

            return ret

谢谢!

一个通用的解决方案,将任何json列表的平面对象转换为csv。

传递输入。Json文件作为命令行的第一个参数。

import csv, json, sys

input = open(sys.argv[1])
data = json.load(input)
input.close()

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
    output.writerow(row.values())