如何迭代由空格分隔的单词组成的字符串中的单词?

注意,我对C字符串函数或那种字符操作/访问不感兴趣。比起效率,我更喜欢优雅。我当前的解决方案:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

我有一种与其他解决方案非常不同的方法,它提供了很多其他解决方案所缺乏的价值,但当然也有其缺点。这是一个工作实现,示例是在单词周围放置<tag></tag>。

首先,这个问题可以通过一个循环解决,不需要额外的内存,只需考虑四种逻辑情况。从概念上讲,我们对边界感兴趣。我们的代码应该反映出这一点:让我们遍历字符串,一次查看两个字符,记住字符串的开头和结尾都有特殊情况。

缺点是我们必须编写实现,这有点冗长,但大多是方便的样板。

好处是我们编写了实现,因此很容易根据特定的需要定制它,例如区分左和写单词边界,使用任何一组分隔符,或处理其他情况,例如无边界或错误位置。

using namespace std;

#include <iostream>
#include <string>

#include <cctype>

typedef enum boundary_type_e {
    E_BOUNDARY_TYPE_ERROR = -1,
    E_BOUNDARY_TYPE_NONE,
    E_BOUNDARY_TYPE_LEFT,
    E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;

typedef struct boundary_s {
    boundary_type_t type;
    int pos;
} boundary_t;

bool is_delim_char(int c) {
    return isspace(c); // also compare against any other chars you want to use as delimiters
}

bool is_word_char(int c) {
    return ' ' <= c && c <= '~' && !is_delim_char(c);
}

boundary_t maybe_word_boundary(string str, int pos) {
    int len = str.length();
    if (pos < 0 || pos >= len) {
        return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
    } else {
        if (pos == 0 && is_word_char(str[pos])) {
            // if the first character is word-y, we have a left boundary at the beginning
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
        } else if (pos == len - 1 && is_word_char(str[pos])) {
            // if the last character is word-y, we have a right boundary left of the null terminator
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        } else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
            // if we have a delimiter followed by a word char, we have a left boundary left of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
        } else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
            // if we have a word char followed by a delimiter, we have a right boundary right of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        }
        return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
    }
}

int main() {
    string str;
    getline(cin, str);

    int len = str.length();
    for (int i = 0; i < len; i++) {
        boundary_t boundary = maybe_word_boundary(str, i);
        if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
            // whatever
        } else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
            // whatever
        }
    }
}

正如您所看到的,代码非常容易理解和微调,代码的实际使用非常简短和简单。使用C++不应阻止我们编写最简单、最容易定制的代码,即使这意味着不使用STL。我认为这是Linus Torvalds所说的“品味”的一个例子,因为我们已经消除了所有不需要的逻辑,而写作风格自然允许在需要处理的时候处理更多的案件。

可以改进此代码的可能是使用enum类,在maybe_word_boundary中接受指向is_word_char的函数指针,而不是直接调用is_word_char,并传递lambda。

其他回答

我用这个分隔符分隔字符串。第一个将结果放入预先构建的向量中,第二个返回新向量。

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

请注意,此解决方案不会跳过空令牌,因此下面将找到4项,其中一项为空:

std::vector<std::string> x = split("one:two::three", ':');

LazyString拆分器:

#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

class LazyStringSplitter
{
    string::const_iterator start, finish;
    unordered_set<char> chop;

public:

    // Empty Constructor
    explicit LazyStringSplitter()
    {}

    explicit LazyStringSplitter (const string cstr, const string delims)
        : start(cstr.begin())
        , finish(cstr.end())
        , chop(delims.begin(), delims.end())
    {}

    void operator () (const string cstr, const string delims)
    {
        chop.insert(delims.begin(), delims.end());
        start = cstr.begin();
        finish = cstr.end();
    }

    bool empty() const { return (start >= finish); }

    string next()
    {
        // return empty string
        // if ran out of characters
        if (empty())
            return string("");

        auto runner = find_if(start, finish, [&](char c) {
            return chop.count(c) == 1;
        });

        // construct next string
        string ret(start, runner);
        start = runner + 1;

        // Never return empty string
        // + tail recursion makes this method efficient
        return !ret.empty() ? ret : next();
    }
};

我将此方法称为LazyStringSplitter是因为一个原因——它不会一次性拆分字符串。本质上,它的行为类似于python生成器它公开了一个名为next的方法,该方法返回从原始字符串拆分的下一个字符串我使用了c++11STL中的无序集,因此查找分隔符的速度要快得多下面是它的工作原理

测试程序

#include <iostream>
using namespace std;

int main()
{
    LazyStringSplitter splitter;

    // split at the characters ' ', '!', '.', ','
    splitter("This, is a string. And here is another string! Let's test and see how well this does.", " !.,");

    while (!splitter.empty())
        cout << splitter.next() << endl;
    return 0;
}

输出,输出

This
is
a
string
And
here
is
another
string
Let's
test
and
see
how
well
this
does

改进这一点的下一个计划是实施开始和结束方法,以便可以执行以下操作:

vector<string> split_string(splitter.begin(), splitter.end());

这是另一种方法。。

void split_string(string text,vector<string>& words)
{
  int i=0;
  char ch;
  string word;

  while(ch=text[i++])
  {
    if (isspace(ch))
    {
      if (!word.empty())
      {
        words.push_back(word);
      }
      word = "";
    }
    else
    {
      word += ch;
    }
  }
  if (!word.empty())
  {
    words.push_back(word);
  }
}

这是我最喜欢的遍历字符串的方法。每个词你都可以做你想做的事。

string line = "a line of text to iterate through";
string word;

istringstream iss(line, istringstream::in);

while( iss >> word )     
{
    // Do something on `word` here...
}

我使用以下代码:

namespace Core
{
    typedef std::wstring String;

    void SplitString(const Core::String& input, const Core::String& splitter, std::list<Core::String>& output)
    {
        if (splitter.empty())
        {
            throw std::invalid_argument(); // for example
        }

        std::list<Core::String> lines;

        Core::String::size_type offset = 0;

        for (;;)
        {
            Core::String::size_type splitterPos = input.find(splitter, offset);

            if (splitterPos != Core::String::npos)
            {
                lines.push_back(input.substr(offset, splitterPos - offset));
                offset = splitterPos + splitter.size();
            }
            else
            {
                lines.push_back(input.substr(offset));
                break;
            }
        }

        lines.swap(output);
    }
}

// gtest:

class SplitStringTest: public testing::Test
{
};

TEST_F(SplitStringTest, EmptyStringAndSplitter)
{
    std::list<Core::String> result;
    ASSERT_ANY_THROW(Core::SplitString(Core::String(), Core::String(), result));
}

TEST_F(SplitStringTest, NonEmptyStringAndEmptySplitter)
{
    std::list<Core::String> result;
    ASSERT_ANY_THROW(Core::SplitString(L"xy", Core::String(), result));
}

TEST_F(SplitStringTest, EmptyStringAndNonEmptySplitter)
{
    std::list<Core::String> result;
    Core::SplitString(Core::String(), Core::String(L","), result);
    ASSERT_EQ(1, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
}

TEST_F(SplitStringTest, OneCharSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L"x,y", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y", *result.rbegin());

    Core::SplitString(L",xy", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L"xy", *result.rbegin());

    Core::SplitString(L"xy,", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"xy", *result.begin());
    ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, TwoCharsSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L"x,.y,z", L",.", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y,z", *result.rbegin());

    Core::SplitString(L"x,,y,z", L",,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y,z", *result.rbegin());
}

TEST_F(SplitStringTest, RecursiveSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L",,,", L",,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L",", *result.rbegin());

    Core::SplitString(L",.,.,", L",.,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L".,", *result.rbegin());

    Core::SplitString(L"x,.,.,y", L",.,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L".,y", *result.rbegin());

    Core::SplitString(L",.,,.,", L",.,", result);
    ASSERT_EQ(3, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(Core::String(), *(++result.begin()));
    ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, NullTerminators)
{
    std::list<Core::String> result;

    Core::SplitString(L"xy", Core::String(L"\0", 1), result);
    ASSERT_EQ(1, result.size());
    ASSERT_EQ(L"xy", *result.begin());

    Core::SplitString(Core::String(L"x\0y", 3), Core::String(L"\0", 1), result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y", *result.rbegin());
}