我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。
我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp
我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。
我需要在c++中加载和使用CSV文件数据。在这一点上,它实际上只是一个以逗号分隔的解析器(即不用担心转义新行和逗号)。主要需要的是逐行解析器,它将在每次调用方法时为下一行返回一个向量。
我发现这篇文章看起来很有前途: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp
我从未使用过Boost's Spirit,但我愿意尝试一下。但前提是我忽略了一个更直接的解决方案。
当前回答
使用Spirit来解析csv并不过分。Spirit非常适合微解析任务。例如,使用Spirit 2.1,它就像:
bool r = phrase_parse(first, last,
// Begin grammar
(
double_ % ','
)
,
// End grammar
space, v);
向量v被值填满了。在刚刚与Boost 1.41一起发布的新的Spirit 2.1文档中,有一系列教程涉及到这一点。
本教程从简单到复杂。CSV解析器呈现在中间的某个位置,并涉及使用Spirit的各种技术。生成的代码与手写代码一样紧凑。检查生成的汇编程序!
其他回答
另一个类似于Loki Astari的答案的解决方案,在c++ 11中。这里的行是给定类型的std::元组。代码扫描一行,然后扫描到每个分隔符,然后将值直接转换并转储到元组中(使用一些模板代码)。
for (auto row : csv<std::string, int, float>(file, ',')) {
std::cout << "first col: " << std::get<0>(row) << std::endl;
}
优势:
非常干净,使用简单,只有c++ 11。 自动类型转换为std::tuple<t1,…>通过算子>>。
缺少什么:
转义和引用 没有错误处理的情况下畸形的CSV。
主要代码:
#include <iterator>
#include <sstream>
#include <string>
namespace csvtools {
/// Read the last element of the tuple without calling recursively
template <std::size_t idx, class... fields>
typename std::enable_if<idx >= std::tuple_size<std::tuple<fields...>>::value - 1>::type
read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
std::string cell;
std::getline(in, cell, delimiter);
std::stringstream cell_stream(cell);
cell_stream >> std::get<idx>(out);
}
/// Read the @p idx-th element of the tuple and then calls itself with @p idx + 1 to
/// read the next element of the tuple. Automatically falls in the previous case when
/// reaches the last element of the tuple thanks to enable_if
template <std::size_t idx, class... fields>
typename std::enable_if<idx < std::tuple_size<std::tuple<fields...>>::value - 1>::type
read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
std::string cell;
std::getline(in, cell, delimiter);
std::stringstream cell_stream(cell);
cell_stream >> std::get<idx>(out);
read_tuple<idx + 1, fields...>(in, out, delimiter);
}
}
/// Iterable csv wrapper around a stream. @p fields the list of types that form up a row.
template <class... fields>
class csv {
std::istream &_in;
const char _delim;
public:
typedef std::tuple<fields...> value_type;
class iterator;
/// Construct from a stream.
inline csv(std::istream &in, const char delim) : _in(in), _delim(delim) {}
/// Status of the underlying stream
/// @{
inline bool good() const {
return _in.good();
}
inline const std::istream &underlying_stream() const {
return _in;
}
/// @}
inline iterator begin();
inline iterator end();
private:
/// Reads a line into a stringstream, and then reads the line into a tuple, that is returned
inline value_type read_row() {
std::string line;
std::getline(_in, line);
std::stringstream line_stream(line);
std::tuple<fields...> retval;
csvtools::read_tuple<0, fields...>(line_stream, retval, _delim);
return retval;
}
};
/// Iterator; just calls recursively @ref csv::read_row and stores the result.
template <class... fields>
class csv<fields...>::iterator {
csv::value_type _row;
csv *_parent;
public:
typedef std::input_iterator_tag iterator_category;
typedef csv::value_type value_type;
typedef std::size_t difference_type;
typedef csv::value_type * pointer;
typedef csv::value_type & reference;
/// Construct an empty/end iterator
inline iterator() : _parent(nullptr) {}
/// Construct an iterator at the beginning of the @p parent csv object.
inline iterator(csv &parent) : _parent(parent.good() ? &parent : nullptr) {
++(*this);
}
/// Read one row, if possible. Set to end if parent is not good anymore.
inline iterator &operator++() {
if (_parent != nullptr) {
_row = _parent->read_row();
if (!_parent->good()) {
_parent = nullptr;
}
}
return *this;
}
inline iterator operator++(int) {
iterator copy = *this;
++(*this);
return copy;
}
inline csv::value_type const &operator*() const {
return _row;
}
inline csv::value_type const *operator->() const {
return &_row;
}
bool operator==(iterator const &other) {
return (this == &other) or (_parent == nullptr and other._parent == nullptr);
}
bool operator!=(iterator const &other) {
return not (*this == other);
}
};
template <class... fields>
typename csv<fields...>::iterator csv<fields...>::begin() {
return iterator(*this);
}
template <class... fields>
typename csv<fields...>::iterator csv<fields...>::end() {
return iterator();
}
我在GitHub上放了一个小的工作示例;我一直用它来解析一些数值数据,它达到了它的目的。
这是一个旧线程,但它仍然在搜索结果的顶部,所以我添加我的解决方案使用std::stringstream和一个简单的字符串替换方法由Yves Baumes我在这里找到。
下面的例子将逐行读取文件,忽略以//开头的注释行,并将其他行解析为字符串、int和double的组合。Stringstream进行解析,但希望字段由空格分隔,因此我使用stringreplace首先将逗号转换为空格。它可以处理制表符,但不处理带引号的字符串。
坏的或丢失的输入被简单地忽略,这可能是好事,也可能不是好事,这取决于您的情况。
#include <string>
#include <sstream>
#include <fstream>
void StringReplace(std::string& str, const std::string& oldStr, const std::string& newStr)
// code by Yves Baumes
// http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string
{
size_t pos = 0;
while((pos = str.find(oldStr, pos)) != std::string::npos)
{
str.replace(pos, oldStr.length(), newStr);
pos += newStr.length();
}
}
void LoadCSV(std::string &filename) {
std::ifstream stream(filename);
std::string in_line;
std::string Field;
std::string Chan;
int ChanType;
double Scale;
int Import;
while (std::getline(stream, in_line)) {
StringReplace(in_line, ",", " ");
std::stringstream line(in_line);
line >> Field >> Chan >> ChanType >> Scale >> Import;
if (Field.substr(0,2)!="//") {
// do your stuff
// this is CBuilder code for demonstration, sorry
ShowMessage((String)Field.c_str() + "\n" + Chan.c_str() + "\n" + IntToStr(ChanType) + "\n" +FloatToStr(Scale) + "\n" +IntToStr(Import));
}
}
}
如果您正在使用Visual Studio / MFC,下面的解决方案可能会使您的工作更轻松。它支持Unicode和MBCS,有注释,除了CString之外没有其他依赖项,对我来说工作得很好。它不支持在带引号的字符串中嵌入换行符,但我不在乎,只要它在这种情况下不崩溃,它不会崩溃。
总体策略是,将带引号的字符串和空字符串作为特殊情况处理,其余使用Tokenize。对于带引号的字符串,策略是找到真正的结束引号,跟踪是否遇到了连续的引号对。如果是,则使用Replace将成对转换为单个。毫无疑问,有更有效的方法,但在我的案例中,性能还不够重要,不足以证明进一步优化的合理性。
class CParseCSV {
public:
// Construction
CParseCSV(const CString& sLine);
// Attributes
bool GetString(CString& sDest);
protected:
CString m_sLine; // line to extract tokens from
int m_nLen; // line length in characters
int m_iPos; // index of current position
};
CParseCSV::CParseCSV(const CString& sLine) : m_sLine(sLine)
{
m_nLen = m_sLine.GetLength();
m_iPos = 0;
}
bool CParseCSV::GetString(CString& sDest)
{
if (m_iPos < 0 || m_iPos > m_nLen) // if position out of range
return false;
if (m_iPos == m_nLen) { // if at end of string
sDest.Empty(); // return empty token
m_iPos = -1; // really done now
return true;
}
if (m_sLine[m_iPos] == '\"') { // if current char is double quote
m_iPos++; // advance to next char
int iTokenStart = m_iPos;
bool bHasEmbeddedQuotes = false;
while (m_iPos < m_nLen) { // while more chars to parse
if (m_sLine[m_iPos] == '\"') { // if current char is double quote
// if next char exists and is also double quote
if (m_iPos < m_nLen - 1 && m_sLine[m_iPos + 1] == '\"') {
// found pair of consecutive double quotes
bHasEmbeddedQuotes = true; // request conversion
m_iPos++; // skip first quote in pair
} else // next char doesn't exist or is normal
break; // found closing quote; exit loop
}
m_iPos++; // advance to next char
}
sDest = m_sLine.Mid(iTokenStart, m_iPos - iTokenStart);
if (bHasEmbeddedQuotes) // if string contains embedded quote pairs
sDest.Replace(_T("\"\""), _T("\"")); // convert pairs to singles
m_iPos += 2; // skip closing quote and trailing delimiter if any
} else if (m_sLine[m_iPos] == ',') { // else if char is comma
sDest.Empty(); // return empty token
m_iPos++; // advance to next char
} else { // else get next comma-delimited token
sDest = m_sLine.Tokenize(_T(","), m_iPos);
}
return true;
}
// calling code should look something like this:
CStdioFile fIn(pszPath, CFile::modeRead);
CString sLine, sToken;
while (fIn.ReadString(sLine)) { // for each line of input file
if (!sLine.IsEmpty()) { // ignore blank lines
CParseCSV csv(sLine);
while (csv.GetString(sToken)) {
// do something with sToken here
}
}
}
使用Boost Tokenizer的解决方案:
std::vector<std::string> vec;
using namespace boost;
tokenizer<escaped_list_separator<char> > tk(
line, escaped_list_separator<char>('\\', ',', '\"'));
for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin());
i!=tk.end();++i)
{
vec.push_back(*i);
}
你可以在escaped_list_separator中使用Boost Tokenizer。
Escaped_list_separator解析csv的超集。Boost::记号赋予器
这只使用Boost标记器头文件,不需要链接到Boost库。
下面是一个例子,(详情请参阅c++中使用Boost Tokenizer解析CSV文件或Boost:: Tokenizer):
#include <iostream> // cout, endl
#include <fstream> // fstream
#include <vector>
#include <string>
#include <algorithm> // copy
#include <iterator> // ostream_operator
#include <boost/tokenizer.hpp>
int main()
{
using namespace std;
using namespace boost;
string data("data.csv");
ifstream in(data.c_str());
if (!in.is_open()) return 1;
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;
while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());
// vector now contains strings from one row, output to cout here
copy(vec.begin(), vec.end(), ostream_iterator<string>(cout, "|"));
cout << "\n----------------------" << endl;
}
}