bench: add fluent API for untimed setup steps in nanobench

Some benchmarks need per-epoch state reset so every measured run does the same work.
Add `Bench::setup(...).run(...)` for untimed per-epoch setup.
The existing `run()` now delegates to `runImpl()` with an empty setup lambda, keeping the old API unchanged.

This vendors the upstream change from `martinus/nanobench`.
The upstream PR also adds tests that verify setup is excluded from timing and runs once before each epoch's iterations.
Those tests are not copied here because wiring them into `src/bench/nanobench.h` outside the benchmarking setup would be awkward.

The `Default is 1ms, so we are mostly relying ...` comment update matches current upstream `nanobench` master.

-------

Running a few benchmarks (which will be migrated in the next commit to use the new setup method) several times to showcase the spread:

./build/bin/bench_bitcoin -filter='^(BnBExhaustion|AddrManAddThenGood|DeserializeBlockTest|DeserializeAndCheckBlockTest|CheckBlockTest|LoadExternalBlockFile|FindByte|WalletCreatePlain|WalletCreateEncrypted|WalletLoadingDescriptors)$'

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       26,400,542.00 |               37.88 |    0.4% |      0.29 | `AddrManAddThenGood`
|          189,075.00 |            5,288.91 |    0.4% |      0.01 | `BnBExhaustion`

|            ns/block |             block/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        1,237,000.00 |              808.41 |    2.4% |      0.01 | `DeserializeAndCheckBlockTest`
|          893,333.00 |            1,119.40 |    0.6% |      0.01 | `DeserializeBlockTest`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               31.62 |       31,622,370.70 |    0.2% |      0.01 | `FindByte`
|        5,506,875.00 |              181.59 |    1.4% |      0.06 | `LoadExternalBlockFile`
|      593,480,333.00 |                1.68 |    0.4% |      6.53 | `WalletCreateEncrypted`
|      174,305,167.00 |                5.74 |    0.7% |      1.93 | `WalletCreatePlain`
|      160,833,875.00 |                6.22 |    0.2% |      0.80 | `WalletLoadingDescriptors`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       26,005,125.00 |               38.45 |    1.3% |      0.29 | `AddrManAddThenGood`
|          181,909.67 |            5,497.23 |    0.1% |      0.01 | `BnBExhaustion`

|            ns/block |             block/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        1,223,000.00 |              817.66 |    2.8% |      0.01 | `DeserializeAndCheckBlockTest`
|          892,917.00 |            1,119.92 |    0.7% |      0.01 | `DeserializeBlockTest`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               31.58 |       31,660,608.70 |    0.5% |      0.01 | `FindByte`
|        5,612,750.00 |              178.17 |    1.1% |      0.06 | `LoadExternalBlockFile`
|      594,012,250.00 |                1.68 |    0.2% |      6.53 | `WalletCreateEncrypted`
|      174,668,334.00 |                5.73 |    0.8% |      1.92 | `WalletCreatePlain`
|      158,494,375.00 |                6.31 |    0.3% |      0.79 | `WalletLoadingDescriptors`
This commit is contained in:
Lőrinc
2026-01-03 15:38:35 +01:00
parent c7a3ea2483
commit 83b8528ddb

View File

@@ -137,6 +137,11 @@ class Result;
class Rng;
class BigO;
namespace detail {
template <typename SetupOp>
class SetupRunner;
} // namespace detail
/**
* @brief Renders output from a mustache-like template and benchmark results.
*
@@ -819,7 +824,7 @@ public:
/**
* @brief Minimum time each epoch should take.
*
* Default is zero, so we are fully relying on clockResolutionMultiple(). In most cases this is exactly what you want. If you see
* Default is 1ms, so we are mostly relying on clockResolutionMultiple(). In most cases this is exactly what you want. If you see
* that the evaluation is unreliable with a high `err%`, you can increase either minEpochTime() or minEpochIterations().
*
* @see maxEpochTime, minEpochIterations
@@ -1007,7 +1012,21 @@ public:
Bench& config(Config const& benchmarkConfig);
ANKERL_NANOBENCH(NODISCARD) Config const& config() const noexcept;
/**
* @brief Configure an untimed setup step per epoch (fluent API).
*
* Example: `bench.setup(...).run(...);`
*/
template <typename SetupOp>
detail::SetupRunner<SetupOp> setup(SetupOp setupOp);
private:
template <typename SetupOp, typename Op>
Bench& runImpl(SetupOp& setupOp, Op&& op);
template <typename SetupOp>
friend class detail::SetupRunner;
Config mConfig{};
std::vector<Result> mResults{};
};
@@ -1207,14 +1226,44 @@ constexpr uint64_t Rng::rotl(uint64_t x, unsigned k) noexcept {
return (x << k) | (x >> (64U - k));
}
namespace detail {
template <typename SetupOp>
class SetupRunner {
public:
explicit SetupRunner(SetupOp setupOp, Bench& bench)
: mSetupOp(std::move(setupOp))
, mBench(bench) {}
template <typename Op>
ANKERL_NANOBENCH_NO_SANITIZE("integer")
Bench& run(Op&& op) {
return mBench.runImpl(mSetupOp, std::forward<Op>(op));
}
private:
SetupOp mSetupOp;
Bench& mBench;
};
} // namespace detail
template <typename Op>
ANKERL_NANOBENCH_NO_SANITIZE("integer")
Bench& Bench::run(Op&& op) {
auto setupOp = [] {};
return runImpl(setupOp, std::forward<Op>(op));
}
template <typename SetupOp, typename Op>
ANKERL_NANOBENCH_NO_SANITIZE("integer")
Bench& Bench::runImpl(SetupOp& setupOp, Op&& op) {
// It is important that this method is kept short so the compiler can do better optimizations/ inlining of op()
detail::IterationLogic iterationLogic(*this);
auto& pc = detail::performanceCounters();
while (auto n = iterationLogic.numIters()) {
setupOp();
pc.beginMeasure();
Clock::time_point const before = Clock::now();
while (n-- > 0) {
@@ -1229,6 +1278,11 @@ Bench& Bench::run(Op&& op) {
return *this;
}
template <typename SetupOp>
detail::SetupRunner<SetupOp> Bench::setup(SetupOp setupOp) {
return detail::SetupRunner<SetupOp>(std::move(setupOp), *this);
}
// Performs all evaluations.
template <typename Op>
Bench& Bench::run(char const* benchmarkName, Op&& op) {