我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。

我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp

我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。


当前回答

这是一个旧线程,但它仍然在搜索结果的顶部,所以我添加我的解决方案使用std::stringstream和一个简单的字符串替换方法由Yves Baumes我在这里找到。

下面的例子将逐行读取文件,忽略以//开头的注释行,并将其他行解析为字符串、int和double的组合。Stringstream进行解析,但希望字段由空格分隔,因此我使用stringreplace首先将逗号转换为空格。它可以处理制表符,但不处理带引号的字符串。

坏的或丢失的输入被简单地忽略,这可能是好事,也可能不是好事,这取决于您的情况。

#include <string>
#include <sstream>
#include <fstream>

void StringReplace(std::string& str, const std::string& oldStr, const std::string& newStr)
// code by  Yves Baumes
// http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string
{
  size_t pos = 0;
  while((pos = str.find(oldStr, pos)) != std::string::npos)
  {
     str.replace(pos, oldStr.length(), newStr);
     pos += newStr.length();
  }
}

void LoadCSV(std::string &filename) {
   std::ifstream stream(filename);
   std::string in_line;
   std::string Field;
   std::string Chan;
   int ChanType;
   double Scale;
   int Import;
   while (std::getline(stream, in_line)) {
      StringReplace(in_line, ",", " ");
      std::stringstream line(in_line);
      line >> Field >> Chan >> ChanType >> Scale >> Import;
      if (Field.substr(0,2)!="//") {
         // do your stuff 
         // this is CBuilder code for demonstration, sorry
         ShowMessage((String)Field.c_str() + "\n" + Chan.c_str() + "\n" + IntToStr(ChanType) + "\n" +FloatToStr(Scale) + "\n" +IntToStr(Import));
      }
   }
}

其他回答

不好意思,但是为了隐藏几行代码,这似乎是非常复杂的语法。

为什么不这样呢:

/**

  Read line from a CSV file

  @param[in] fp file pointer to open file
  @param[in] vls reference to vector of strings to hold next line

  */
void readCSV( FILE *fp, std::vector<std::string>& vls )
{
    vls.clear();
    if( ! fp )
        return;
    char buf[10000];
    if( ! fgets( buf,999,fp) )
        return;
    std::string s = buf;
    int p,q;
    q = -1;
    // loop over columns
    while( 1 ) {
        p = q;
        q = s.find_first_of(",\n",p+1);
        if( q == -1 ) 
            break;
        vls.push_back( s.substr(p+1,q-p-1) );
    }
}

int _tmain(int argc, _TCHAR* argv[])
{
    std::vector<std::string> vls;
    FILE * fp = fopen( argv[1], "r" );
    if( ! fp )
        return 1;
    readCSV( fp, vls );
    readCSV( fp, vls );
    readCSV( fp, vls );
    std::cout << "row 3, col 4 is " << vls[3].c_str() << "\n";

    return 0;
}

使用Spirit来解析csv并不过分。Spirit非常适合微解析任务。例如,使用Spirit 2.1,它就像:

bool r = phrase_parse(first, last,

    //  Begin grammar
    (
        double_ % ','
    )
    ,
    //  End grammar

    space, v);

向量v被值填满了。在刚刚与Boost 1.41一起发布的新的Spirit 2.1文档中,有一系列教程涉及到这一点。

本教程从简单到复杂。CSV解析器呈现在中间的某个位置,并涉及使用Spirit的各种技术。生成的代码与手写代码一样紧凑。检查生成的汇编程序!

我写了一个很好的解析CSV文件的方法,我认为我应该把它作为一个答案:

#include <algorithm>
#include <fstream>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>

struct CSVDict
{
  std::vector< std::string > inputImages;
  std::vector< double > inputLabels;
};

/**
\brief Splits the string

\param str String to split
\param delim Delimiter on the basis of which splitting is to be done
\return results Output in the form of vector of strings
*/
std::vector<std::string> stringSplit( const std::string &str, const std::string &delim )
{
  std::vector<std::string> results;

  for (size_t i = 0; i < str.length(); i++)
  {
    std::string tempString = "";
    while ((str[i] != *delim.c_str()) && (i < str.length()))
    {
      tempString += str[i];
      i++;
    }
    results.push_back(tempString);
  }

  return results;
}

/**
\brief Parse the supplied CSV File and obtain Row and Column information. 

Assumptions:
1. Header information is in first row
2. Delimiters are only used to differentiate cell members

\param csvFileName The full path of the file to parse
\param inputColumns The string of input columns which contain the data to be used for further processing
\param inputLabels The string of input labels based on which further processing is to be done
\param delim The delimiters used in inputColumns and inputLabels
\return Vector of Vector of strings: Collection of rows and columns
*/
std::vector< CSVDict > parseCSVFile( const std::string &csvFileName, const std::string &inputColumns, const std::string &inputLabels, const std::string &delim )
{
  std::vector< CSVDict > return_CSVDict;
  std::vector< std::string > inputColumnsVec = stringSplit(inputColumns, delim), inputLabelsVec = stringSplit(inputLabels, delim);
  std::vector< std::vector< std::string > > returnVector;
  std::ifstream inFile(csvFileName.c_str());
  int row = 0;
  std::vector< size_t > inputColumnIndeces, inputLabelIndeces;
  for (std::string line; std::getline(inFile, line, '\n');)
  {
    CSVDict tempDict;
    std::vector< std::string > rowVec;
    line.erase(std::remove(line.begin(), line.end(), '"'), line.end());
    rowVec = stringSplit(line, delim);

    // for the first row, record the indeces of the inputColumns and inputLabels
    if (row == 0)
    {
      for (size_t i = 0; i < rowVec.size(); i++)
      {
        for (size_t j = 0; j < inputColumnsVec.size(); j++)
        {
          if (rowVec[i] == inputColumnsVec[j])
          {
            inputColumnIndeces.push_back(i);
          }
        }
        for (size_t j = 0; j < inputLabelsVec.size(); j++)
        {
          if (rowVec[i] == inputLabelsVec[j])
          {
            inputLabelIndeces.push_back(i);
          }
        }
      }
    }
    else
    {
      for (size_t i = 0; i < inputColumnIndeces.size(); i++)
      {
        tempDict.inputImages.push_back(rowVec[inputColumnIndeces[i]]);
      }
      for (size_t i = 0; i < inputLabelIndeces.size(); i++)
      {
        double test = std::atof(rowVec[inputLabelIndeces[i]].c_str());
        tempDict.inputLabels.push_back(std::atof(rowVec[inputLabelIndeces[i]].c_str()));
      }
      return_CSVDict.push_back(tempDict);
    }
    row++;
  }

  return return_CSVDict;
}

您可以使用fopen,fscanf函数打开和读取.csv文件,但重要的是解析数据。使用分隔符解析数据的最简单方法。对于.csv,分隔符为','。

假设你的data1.csv文件如下所示:

A,45,76,01
B,77,67,02
C,63,76,03
D,65,44,04

您可以标记数据并存储在字符数组中,然后使用atoi()等函数进行适当的转换

FILE *fp;
char str1[10], str2[10], str3[10], str4[10];

fp = fopen("G:\\data1.csv", "r");
if(NULL == fp)
{
    printf("\nError in opening file.");
    return 0;
}
while(EOF != fscanf(fp, " %[^,], %[^,], %[^,], %s, %s, %s, %s ", str1, str2, str3, str4))
{
    printf("\n%s %s %s %s", str1, str2, str3, str4);
}
fclose(fp);

[^,], ^ -它颠倒了逻辑,意思是匹配任何不包含逗号的字符串,然后最后,表示匹配终止前一个字符串的逗号。

使用Boost Tokenizer的解决方案:

std::vector<std::string> vec;
using namespace boost;
tokenizer<escaped_list_separator<char> > tk(
   line, escaped_list_separator<char>('\\', ',', '\"'));
for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin());
   i!=tk.end();++i) 
{
   vec.push_back(*i);
}