01960c53c7d71c70792abe19413315768dc2275a fuzz: make FuzzedDataProvider usage deterministic (Martin Leitner-Ankerl)
Pull request description:
There exist many usages of `fuzzed_data_provider` where it is evaluated directly in the function call.
Unfortunately, [the order of evaluation of function arguments is unspecified](https://en.cppreference.com/w/cpp/language/eval_order), and a simple example shows that it can differ e.g. between clang++ and g++: https://godbolt.org/z/jooMezWWY
When the evaluation order is not consistent, the same fuzzing/random input will produce different output, which is bad for coverage/reproducibility. This PR fixes all these cases I have found where unspecified evaluation order could be a problem.
Finding these has been manual work; I grepped the sourcecode for these patterns, and looked at each usage individually. So there is a chance I missed some.
* `fuzzed_data_provider`
* `.Consume`
* `>Consume`
* `.rand`
I first discovered this in https://github.com/bitcoin/bitcoin/pull/29013#discussion_r1420236394. Note that there is a possibility that due to this fix the evaluation order is now different in many cases than when the fuzzing corpus has been created. If that is the case, the fuzzing corpus will have worse coverage than before.
Update: In list-initialization the order of evaluation is well defined, so e.g. usages in `initializer_list` or constructors that use `{...}` is ok.
ACKs for top commit:
achow101:
ACK 01960c53c7d71c70792abe19413315768dc2275a
vasild:
ACK 01960c53c7d71c70792abe19413315768dc2275a
ismaelsadeeq:
ACK 01960c53c7d71c70792abe19413315768dc2275a
Tree-SHA512: e56d087f6f4bf79c90b972a5f0c6908d1784b3cfbb8130b6b450d5ca7d116c5a791df506b869a23bce930b2a6977558e1fb5115bb4e061969cc40f568077a1ad
In the process of doing so, refactor `ConsumeNetAddr()` to generate the
addresses from IPv4, IPv6, Tor, I2P and CJDNS networks in the same way -
by preparing some random stream and deserializing from it. Similar code
was already found in `RandAddr()`.
There exist many usages of `fuzzed_data_provider` where it is evaluated directly in the function call.
Unfortunately, the order of evaluation of function arguments is unspecified. This means it can differ
between compilers/version/optimization levels etc. But when the evaluation order changes, the same
fuzzing input will produce different output, which is bad for coverage/reproducibility.
This PR fixes all these cases where by moving multiple calls to `fuzzed_data_provider` out of the
function arguments.
This also cleans up the addrman (de)serialization code paths to only
allow `Disk` serialization. Some unit tests previously forced a
`Network` serialization, which does not make sense, because Bitcoin Core
in production will always `Disk` serialize.
This cleanup idea was suggested by Pieter Wuille and implemented by Anthony
Towns.
Co-authored-by: Pieter Wuille <pieter@wuille.net>
Co-authored-by: Anthony Towns <aj@erisian.com.au>
025fda0a76893d09d19ec9a6c0ba86ad11c466f7 fuzz: addrman, avoid `ConsumeDeserializable` when possible (brunoerg)
Pull request description:
Using specific functions like `ConsumeService`, `ConsumeAddress` and `ConsumeNetAddr` may be more effective than using `ConsumeDeserializable`. They always return some value while `ConsumeDeserializable` may return `std::nullopt`.
E.g.: In this part of the code, if `op_net_addr` is `std::nullopt`, we basically generated the addresses (if so) unnecessarily, because we won't be able to use them:
```cpp
std::vector<CAddress> addresses;
LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), 10000) {
const std::optional<CAddress> opt_address = ConsumeDeserializable<CAddress>(fuzzed_data_provider);
if (!opt_address) {
break;
}
addresses.push_back(*opt_address);
}
const std::optional<CNetAddr> opt_net_addr = ConsumeDeserializable<CNetAddr>(fuzzed_data_provider);
if (opt_net_addr) {
addr_man.Add(addresses, *opt_net_addr, std::chrono::seconds{ConsumeTime(fuzzed_data_provider, 0, 100000000)});
}
```
Also, if we are not calling `Add` effectively, it would also be affect other functions that may "depend" on it.
ACKs for top commit:
dergoegge:
Code review ACK 025fda0a76893d09d19ec9a6c0ba86ad11c466f7
Tree-SHA512: 02450bec0b084c15ba0cd1cbdfbac067c8fea4ccf27be0c86d54e020f029a6c749a16d8e0558f9d6d35a7ca9db8916f180c872f09474702b5591129e9be0d192
7d92b1430a6fd42c4438810640576830d0ff8d13 refactor: use Span for SipHash::Write (Sebastian Falbesoner)
Pull request description:
This simple refactoring PR changes the interface for the `SipHash` arbitrary-data `Write` method to take a `Span<unsigned char>` instead of having to pass data and length. (`Span<std::byte>` seems to be more modern, but vectors of `unsigned char` are still used prety much everywhere where SipHash is called, and I didn't find it very appealing having to clutter the code with `Make(Writable)ByteSpan` helpers).
ACKs for top commit:
sipa:
utACK 7d92b1430a6fd42c4438810640576830d0ff8d13
MarcoFalke:
lgtm ACK 7d92b1430a6fd42c4438810640576830d0ff8d13
achow101:
ACK 7d92b1430a6fd42c4438810640576830d0ff8d13
Tree-SHA512: f17a27013c942aead4b09f5a64e0c3ff8dbc7e83fe63eb9a2e3ace8be9921c9cbba3ec67e3e83fbe3332ca941c42370efd059e702c060f9b508307e9657c66f2
fa6dfaaf45bde465969fa7d8fa6b6574a497a6ca scripted-diff: Use new FUZZ_TARGET macro everywhere (MarcoFalke)
fa36ad8b091c70190491280dcf0794e94e34a9ed fuzz: Accept options in FUZZ_TARGET macro (MarcoFalke)
Pull request description:
The `FUZZ_TARGET` macros have many issues:
* The developer will have to pick the right macro to pass the wanted option.
* Adding a new option requires doubling the number of existing macros in the worst case.
Fix all issues by using only a single macro.
This refactor does not change behavior.
ACKs for top commit:
dergoegge:
ACK fa6dfaaf45bde465969fa7d8fa6b6574a497a6ca
Tree-SHA512: 49a34553867a1734ce89e616b2d7c29b784a67cd8990db6573f0c7b18957636ef0c81d3d0d444a04c12cdc98bc4c4aa7a2ec94e6232dc363620a746e28416444
35a2175ad8bec92b409ae2202c124e39b2f3f838 fuzz: addrman, add coverage for `network` field in `Select()`, `Size()` and `GetAddr()` (brunoerg)
Pull request description:
This PR adds fuzz coverage for `network` field in `Select()`, `Size()` and `GetAddr()`, there was only call to them without passing a network.
https://marcofalke.github.io/b-c-cov/fuzz.coverage/src/addrman.cpp.gcov.html
ACKs for top commit:
amitiuttarwar:
for the record, ACK 35a2175ad8bec92b409ae2202c124e39b2f3f838 - only small changes from the version (previously) proposed in 27213
achow101:
ACK 35a2175ad8bec92b409ae2202c124e39b2f3f838
mzumsande:
Code Review ACK 35a2175ad8bec92b409ae2202c124e39b2f3f838, haven't tested this yet, but I will let the fuzzer run for a while now.
Tree-SHA512: dddb8322298d6c373c8e68d57538470b11825a9a310a355828c351d5c0b19ff6779d024a800e3ea90126d0c050e86f71fd22cd23d1a306c784cef0f82c45e3ca
`ConsumeDeserializable` may return `std::nullopt`, prefer
to call specific functions such as `ConsumeService`and
`ConsumeNetAddr` which always return a value.
This commit effectively moves the definition of these constants
out of the chainparamsbase to their own file.
Using the ChainType enums provides better type safety compared to
passing around strings.
The commit is part of an ongoing effort to decouple the libbitcoinkernel
library from the ArgsManager and other functionality that should not be
part of the kernel library.
This is an extraction of ArgsManager related functions from util/system
into their own common file.
Config file related functions are moved to common/config.cpp.
The background of this commit is an ongoing effort to decouple the
libbitcoinkernel library from the ArgsManager. The ArgsManager belongs
into the common library, since the kernel library should not depend on
it. See doc/design/libraries.md for more information on this rationale.
The functionality of the old size() is covered by the new Size()
when no arguments are specified, so this does not change behavior.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Make it possible to override from the command line (without recompiling)
the addrman check ratio in addrman fuzz tests instead of hardcoding it
to 0:
```
FUZZ=addrman ./src/test/fuzz/fuzz --checkaddrman=5
```
Blindly chose a cap of 10000 iterations for every loop, except for
the two in script_ops.cpp and scriptnum_ops.cpp which appeared to
(sometimes) be deserializing individual bytes; capped those to one
million to ensure that sometimes we try working with massive scripts.
There was also one fuzzer-controlled loop in timedata.cpp which was
already capped, so I left that alone.
git grep 'while (fuzz' should now run clean except for timedata.cpp
Use a (reference) parameter instead of a data member of
CAddrManDeterministic. This will allow us to make Fill() a free function
in a later commit.
Also remove CAddrManDeterministic.m_fuzzed_data_provider since it's no
longer used.
Introduce the pimpl pattern for CAddrMan to separate the implementation details
from the externally used object representation. This reduces compile-time
dependencies and conceptually clarifies AddrMan's interface from the
implementation specifics.
Since the unit & fuzz tests currently rely on accessing CAddrMan internals, this
commit introduces addrman_impl.h, which is exclusively imported by addrman.cpp
and test files.
Review hint: git diff --color-moved=dimmed-zebra
--color-moved-ws=ignore-all-space
Just use unique_ptr<CAddrMan>s and reset the pointer if a frest addrman is required.
Also make CAddrMan::Clear() private to ensure that no call sites are missed.
Currently addrman consistency checks are a compile time option, and are not
enabled in our CI. It's unlikely anyone is running these consistency checks.
Make them a runtime option instead, where users can enable addrman
consistency checks every n operations (similar to mempool tests). Update
the addrman unit tests to do internal consistency checks every 100
operations (checking on every operations causes the test runtime to
increase by several seconds).
Also assert on a failed addrman consistency check to terminate program
execution.
fae108ceb53f61d7338ba205873623ede3c1d3be Fix incorrect whitespace in addrman (MarcoFalke)
fa32024d51c098441623e246f304a80f011e29d1 Add missing GUARDED_BY to CAddrMan::insecure_rand (MarcoFalke)
fab755b77f88873f01cbd988051de7ad3f0150de fuzz: Actually use const addrman (MarcoFalke)
fae0c79351ce34186249d44af0c5c9c7521f4b6c refactor: Mark CAddrMan::GetAddr const (MarcoFalke)
fa02934c8c9d290ea4d12683e8680c70967a4d3a refactor: Mark CAddrMan::Select const (MarcoFalke)
Pull request description:
To clarify that a call to this only changes the random state and nothing else.
ACKs for top commit:
jnewbery:
Code review ACK fae108ceb53f61d7338ba205873623ede3c1d3be
theStack:
re-ACK fae108ceb53f61d7338ba205873623ede3c1d3be 🍦
Tree-SHA512: 3ffb211d4715cc3daeb3bfcdb3fcc6b108ca96df5fa565510436fac0e8da86c93b30c9c4aad0563e27d84f615fcd729481072009a4e2360c8b3d40787ab6506a