我有一个Excel文件,其中有一些西班牙字符(波浪号等),我需要将其转换为CSV文件作为导入文件使用。然而,当我将另存为CSV时,它会破坏不是ASCII字符的“特殊”西班牙字符。它似乎也这样做的左右引号和长破折号,似乎是来自最初的用户在Mac中创建Excel文件。

由于CSV只是一个文本文件,我确信它可以处理UTF8编码,所以我猜这是Excel的限制,但我正在寻找一种方法,从Excel到CSV,并保持非ascii字符完整。


当前回答

使用Powershell怎么样?

Get-Content 'C:\my.csv' | Out-File 'C:\my_utf8.csv' -Encoding UTF8

其他回答

我发现OpenOffice的电子表格应用程序Calc非常擅长处理CSV数据。

在“另存为…”对话框中,单击“格式选项”可获得CSV的不同编码。LibreOffice的工作原理与AFAIK相同。

I needed to automate this process on my Mac. I originally tried using catdoc/xls2csv as suggested by mpowered, but xls2csv had trouble detecting the original encoding of the document and not all documents were the same. What I ended up doing was setting the default webpage output encoding to be UTF-8 and then providing the files to Apple's Automator, applying the Convert Format of Excel Files action to convert to Web Page (HTML). Then using PHP, DOMDocument and XPath, I queried the documents and formatted them to CSV.

这是PHP脚本(process.php):

<?php
$pi = pathinfo($argv[1]);
$file = $pi['dirname'] . '/' . $pi['filename'] . '.csv';
$fp = fopen($file,'w+');
$doc = new DOMDocument;
$doc->loadHTMLFile($argv[1]);
$xpath = new DOMXPath($doc);
$table = [];
foreach($xpath->query('//tr') as $row){
    $_r = [];
    foreach($xpath->query('td',$row) as $col){
        $_r[] = trim($col->textContent);
    }
    fputcsv($fp,$_r);
}
fclose($fp);
?>

这是我用来将HTML文档转换为csv的shell命令:

find . -name '*.htm' | xargs -I{} php ./process.php {}

这是一种非常非常迂回的方法,但这是我发现的最可靠的方法。

我写了一个小的Python脚本,可以导出UTF-8格式的工作表。

您只需要提供Excel文件作为第一个参数,然后是要导出的表。如果不提供工作表,脚本将导出Excel文件中存在的所有工作表。

#!/usr/bin/env python

# export data sheets from xlsx to csv

from openpyxl import load_workbook
import csv
from os import sys

reload(sys)
sys.setdefaultencoding('utf-8')

def get_all_sheets(excel_file):
    sheets = []
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    all_worksheets = workbook.get_sheet_names()
    for worksheet_name in all_worksheets:
        sheets.append(worksheet_name)
    return sheets

def csv_from_excel(excel_file, sheets):
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    for worksheet_name in sheets:
        print("Export " + worksheet_name + " ...")

        try:
            worksheet = workbook.get_sheet_by_name(worksheet_name)
        except KeyError:
            print("Could not find " + worksheet_name)
            sys.exit(1)

        your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
        wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
        for row in worksheet.iter_rows():
            lrow = []
            for cell in row:
                lrow.append(cell.value)
            wr.writerow(lrow)
        print(" ... done")
    your_csv_file.close()

if not 2 <= len(sys.argv) <= 3:
    print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
    sys.exit(1)
else:
    sheets = []
    if len(sys.argv) == 3:
        sheets = list(sys.argv[2].split(','))
    else:
        sheets = get_all_sheets(sys.argv[1])
    assert(sheets != None and len(sheets) > 0)
    csv_from_excel(sys.argv[1], sheets)

保存对话框>工具按钮> Web选项>编码选项卡

Easy way to do it: download open office (here), load the spreadsheet and open the excel file (.xls or .xlsx). Then just save it as a text CSV file and a window opens asking to keep the current format or to save as a .ODF format. select "keep the current format" and in the new window select the option that works better for you, according with the language that your file is been written on. For Spanish language select Western Europe (Windows-1252/ WinLatin 1) and the file works just fine. If you select Unicode (UTF-8), it is not going to work with the spanish characters.