Merge bitcoin/bitcoin#22953: refactor: introduce single-separator split helper (boost::split replacement)

a62e84438d fuzz: add `SplitString` fuzz target (MarcoFalke)
4fad7e46d9 test: add unit tests for `SplitString` helper (Kiminuo)
9cc8e876e4 refactor: introduce single-separator split helper `SplitString` (Sebastian Falbesoner)

Pull request description:

  This PR adds a simple string split helper `SplitString` that takes use of the spanparsing `Split` function that was first introduced in #13697 (commit fe8a7dcd78). This enables to replace most calls to `boost::split`, in the cases where only a single separator character is used. Note that while previous attempts to replace `boost::split` were controversial (e.g. #13751), this one has a trivial implementation: it merely uses an internal helper (that is unit tested and in regular use with output descriptiors) and converts its result from spans to strings. As a drawback though, not all `boost::split` instances can be tackled.

  As a possible optimization, one could return a vector of `std::string_view`s (available since C++17) instead of strings, to avoid copies. This would need more carefulness on the caller sites though, to avoid potential lifetime issues, and it's probably not worth it, considering that none of the places where strings are split are really performance-critical.

ACKs for top commit:
  martinus:
    Code review ACK a62e84438d. Ran all tests. I also like that with `boost::split` it was not obvious that the resulting container was cleared, and with `SplitString` API that's obvious.

Tree-SHA512: 10cb22619ebe46831b1f8e83584a89381a036b54c88701484ac00743e2a62cfe52c9f3ecdbb2d0815e536c99034558277cc263600ec3f3588b291c07eef8ed24
This commit is contained in:
fanquake
2022-04-26 09:30:34 +01:00
14 changed files with 97 additions and 71 deletions

View File

@@ -48,20 +48,4 @@ Span<const char> Expr(Span<const char>& sp)
return ret;
}
std::vector<Span<const char>> Split(const Span<const char>& sp, char sep)
{
std::vector<Span<const char>> ret;
auto it = sp.begin();
auto start = it;
while (it != sp.end()) {
if (*it == sep) {
ret.emplace_back(start, it);
start = it + 1;
}
++it;
}
ret.emplace_back(start, it);
return ret;
}
} // namespace spanparsing

View File

@@ -43,7 +43,22 @@ Span<const char> Expr(Span<const char>& sp);
* Note that this function does not care about braces, so splitting
* "foo(bar(1),2),3) on ',' will return {"foo(bar(1)", "2)", "3)"}.
*/
std::vector<Span<const char>> Split(const Span<const char>& sp, char sep);
template <typename T = Span<const char>>
std::vector<T> Split(const Span<const char>& sp, char sep)
{
std::vector<T> ret;
auto it = sp.begin();
auto start = it;
while (it != sp.end()) {
if (*it == sep) {
ret.emplace_back(start, it);
start = it + 1;
}
++it;
}
ret.emplace_back(start, it);
return ret;
}
} // namespace spanparsing

View File

@@ -6,6 +6,7 @@
#define BITCOIN_UTIL_STRING_H
#include <attributes.h>
#include <util/spanparsing.h>
#include <algorithm>
#include <array>
@@ -15,6 +16,11 @@
#include <string>
#include <vector>
[[nodiscard]] inline std::vector<std::string> SplitString(std::string_view str, char sep)
{
return spanparsing::Split<std::string>(str, sep);
}
[[nodiscard]] inline std::string TrimString(const std::string& str, const std::string& pattern = " \f\n\r\t\v")
{
std::string::size_type front = str.find_first_not_of(pattern);