如何迭代字符串的单词？

如何迭代由空格分隔的单词组成的字符串中的单词？

注意，我对C字符串函数或那种字符操作/访问不感兴趣。比起效率，我更喜欢优雅。我当前的解决方案：

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

当前回答

我们可以在c++中使用strtok，

#include <iostream>
#include <cstring>
using namespace std;

int main()
{
    char str[]="Mickey M;12034;911416313;M;01a;9001;NULL;0;13;12;0;CPP,C;MSC,3D;FEND,BEND,SEC;";
    char *pch = strtok (str,";,");
    while (pch != NULL)
    {
        cout<<pch<<"\n";
        pch = strtok (NULL, ";,");
    }
    return 0;
}

2015-06-03 12:23:03

其他回答

最小的解决方案是一个函数，它将std:：字符串和一组分隔符（作为std:：string）作为输入，并返回std:：：字符串的std:：向量。

#include <string>
#include <vector>

std::vector<std::string>
tokenize(const std::string& str, const std::string& delimiters)
{
  using ssize_t = std::string::size_type;
  const ssize_t str_ln = str.length();
  ssize_t last_pos = 0;

  // container for the extracted tokens
  std::vector<std::string> tokens;

  while (last_pos < str_ln) {
      // find the position of the next delimiter
      ssize_t pos = str.find_first_of(delimiters, last_pos);

      // if no delimiters found, set the position to the length of string
      if (pos == std::string::npos)
         pos = str_ln;

      // if the substring is nonempty, store it in the container
      if (pos != last_pos)
         tokens.emplace_back(str.substr(last_pos, pos - last_pos));

      // scan past the previous substring
      last_pos = pos + 1;
  }

  return tokens;
}

用法示例：

#include <iostream>

int main()
{
    std::string input_str = "one + two * (three - four)!!---! ";
    const char* delimiters = "! +- (*)";
    std::vector<std::string> tokens = tokenize(input_str, delimiters);

    std::cout << "input = '" << input_str << "'\n"
              << "delimiters = '" << delimiters << "'\n"
              << "nr of tokens found = " << tokens.size() << std::endl;
    for (const std::string& tk : tokens) {
        std::cout << "token = '" << tk << "'\n";
    }

  return 0;
}

2021-10-22 12:15:34

这里有一个只使用标准正则表达式库的简单解决方案

#include <regex>
#include <string>
#include <vector>

std::vector<string> Tokenize( const string str, const std::regex regex )
{
    using namespace std;

    std::vector<string> result;

    sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
    sregex_token_iterator reg_end;

    for ( ; it != reg_end; ++it ) {
        if ( !it->str().empty() ) //token could be empty:check
            result.emplace_back( it->str() );
    }

    return result;
}

正则表达式参数允许检查多个参数（空格、逗号等）

我通常只选中空格和逗号分隔，所以我也有这个默认函数：

std::vector<string> TokenizeDefault( const string str )
{
    using namespace std;

    regex re( "[\\s,]+" );

    return Tokenize( str, re );
}

“[\\s，]+”检查空格（\\s）和逗号（，）。

注意，如果要拆分wstring而不是string，

将所有std:：regex更改为std:：wregex将所有sregex_token_iterator更改为wsregex_token_idterator

注意，根据编译器的不同，您可能还希望引用字符串参数。

2014-05-06 05:49:21

对于一个大得离谱而且可能是冗余的版本，可以尝试很多For循环。

string stringlist[10];
int count = 0;

for (int i = 0; i < sequence.length(); i++)
{
    if (sequence[i] == ' ')
    {
        stringlist[count] = sequence.substr(0, i);
        sequence.erase(0, i+1);
        i = 0;
        count++;
    }
    else if (i == sequence.length()-1)  // Last word
    {
        stringlist[count] = sequence.substr(0, i+1);
    }
}

它并不漂亮，但总的来说（除了标点符号和一系列其他错误）它是有效的！

2008-10-25 09:34:36

这是我的条目：

template <typename Container, typename InputIter, typename ForwardIter>
Container
split(InputIter first, InputIter last,
      ForwardIter s_first, ForwardIter s_last)
{
    Container output;

    while (true) {
        auto pos = std::find_first_of(first, last, s_first, s_last);
        output.emplace_back(first, pos);
        if (pos == last) {
            break;
        }

        first = ++pos;
    }

    return output;
}

template <typename Output = std::vector<std::string>,
          typename Input = std::string,
          typename Delims = std::string>
Output
split(const Input& input, const Delims& delims = " ")
{
    using std::cbegin;
    using std::cend;
    return split<Output>(cbegin(input), cend(input),
                         cbegin(delims), cend(delims));
}

auto vec = split("Mary had a little lamb");

第一个定义是采用两对迭代器的STL样式泛型函数。第二个是一个方便的函数，可以让你不用自己做所有的开始和结束。例如，如果要使用列表，还可以将输出容器类型指定为模板参数。

它之所以优雅（IMO），是因为与其他大多数答案不同，它不限于字符串，而是可以与任何STL兼容的容器一起使用。在不更改上述代码的情况下，您可以说：

using vec_of_vecs_t = std::vector<std::vector<int>>;

std::vector<int> v{1, 2, 0, 3, 4, 5, 0, 7, 8, 0, 9};
auto r = split<vec_of_vecs_t>(v, std::initializer_list<int>{0, 2});

这将在每次遇到0或2时将向量v分割成单独的向量。

（还有一个额外的好处，即使用字符串，这个实现比基于strtok（）和getline（）的版本更快，至少在我的系统上是这样。）

2015-10-03 08:06:45

使用Boost的可能解决方案可能是：

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

这种方法可能比字符串流方法更快。由于这是一个通用模板函数，因此可以使用各种分隔符拆分其他类型的字符串（wchar等或UTF-8）。

有关详细信息，请参阅文档。

2008-10-25 20:28:20

如何迭代字符串的单词？

推荐文章

最新文章

标签