51f7668d31 addrman: change nid_type from int to int64_t (Martin Zumsande)
051ba3290e addrman, refactor: introduce user-defined type for internal nId (Martin Zumsande)
Pull request description:
With `nIdCount` being incremented for each addr received, an attacker could cause an overflow in the past, see https://bitcoincore.org/en/2024/07/31/disclose-addrman-int-overflow/
Even though that attack was made infeasible indirectly by addr rate-limiting (PR #22387), to be on the safe side and prevent any regressions change the `nId`s used internally to `int64_t`.
This is being done by first introducing a user-defined type for `nId`s in the first commit, and then updating it to `int64_t` (thanks sipa for help with this!).
Note that `nId` is only used internally, it is not part of the serialization, so `peers.dat` should not be affected by this.
I assume that the only reason this was not done in the past is to not draw attention to this previously undisclosed issue.
ACKs for top commit:
naumenkogs:
ACK 51f7668d31
stratospher:
ACK 51f7668d31. I think it's a good change to make the nId space large(64 bits) so that the nId values are distinct.
achow101:
ACK 51f7668d31
Tree-SHA512: 68d4b8b0269a01a9544bedfa7c1348ffde00a288537e4c8bf2b88372ac7d96c4566a44dd6b06285f2fcf31b4f9336761e3bca7253fbc20db5e0d04e887156224
01960c53c7 fuzz: make FuzzedDataProvider usage deterministic (Martin Leitner-Ankerl)
Pull request description:
There exist many usages of `fuzzed_data_provider` where it is evaluated directly in the function call.
Unfortunately, [the order of evaluation of function arguments is unspecified](https://en.cppreference.com/w/cpp/language/eval_order), and a simple example shows that it can differ e.g. between clang++ and g++: https://godbolt.org/z/jooMezWWY
When the evaluation order is not consistent, the same fuzzing/random input will produce different output, which is bad for coverage/reproducibility. This PR fixes all these cases I have found where unspecified evaluation order could be a problem.
Finding these has been manual work; I grepped the sourcecode for these patterns, and looked at each usage individually. So there is a chance I missed some.
* `fuzzed_data_provider`
* `.Consume`
* `>Consume`
* `.rand`
I first discovered this in https://github.com/bitcoin/bitcoin/pull/29013#discussion_r1420236394. Note that there is a possibility that due to this fix the evaluation order is now different in many cases than when the fuzzing corpus has been created. If that is the case, the fuzzing corpus will have worse coverage than before.
Update: In list-initialization the order of evaluation is well defined, so e.g. usages in `initializer_list` or constructors that use `{...}` is ok.
ACKs for top commit:
achow101:
ACK 01960c53c7
vasild:
ACK 01960c53c7
ismaelsadeeq:
ACK 01960c53c7
Tree-SHA512: e56d087f6f4bf79c90b972a5f0c6908d1784b3cfbb8130b6b450d5ca7d116c5a791df506b869a23bce930b2a6977558e1fb5115bb4e061969cc40f568077a1ad
This makes it easier to track which spots refer to an nId
(as opposed to, for example, bucket index etc. which also use int)
Co-authored-by: Pieter Wuille <pieter@wuille.net>
In the process of doing so, refactor `ConsumeNetAddr()` to generate the
addresses from IPv4, IPv6, Tor, I2P and CJDNS networks in the same way -
by preparing some random stream and deserializing from it. Similar code
was already found in `RandAddr()`.
There exist many usages of `fuzzed_data_provider` where it is evaluated directly in the function call.
Unfortunately, the order of evaluation of function arguments is unspecified. This means it can differ
between compilers/version/optimization levels etc. But when the evaluation order changes, the same
fuzzing input will produce different output, which is bad for coverage/reproducibility.
This PR fixes all these cases where by moving multiple calls to `fuzzed_data_provider` out of the
function arguments.
This also cleans up the addrman (de)serialization code paths to only
allow `Disk` serialization. Some unit tests previously forced a
`Network` serialization, which does not make sense, because Bitcoin Core
in production will always `Disk` serialize.
This cleanup idea was suggested by Pieter Wuille and implemented by Anthony
Towns.
Co-authored-by: Pieter Wuille <pieter@wuille.net>
Co-authored-by: Anthony Towns <aj@erisian.com.au>
025fda0a76 fuzz: addrman, avoid `ConsumeDeserializable` when possible (brunoerg)
Pull request description:
Using specific functions like `ConsumeService`, `ConsumeAddress` and `ConsumeNetAddr` may be more effective than using `ConsumeDeserializable`. They always return some value while `ConsumeDeserializable` may return `std::nullopt`.
E.g.: In this part of the code, if `op_net_addr` is `std::nullopt`, we basically generated the addresses (if so) unnecessarily, because we won't be able to use them:
```cpp
std::vector<CAddress> addresses;
LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), 10000) {
const std::optional<CAddress> opt_address = ConsumeDeserializable<CAddress>(fuzzed_data_provider);
if (!opt_address) {
break;
}
addresses.push_back(*opt_address);
}
const std::optional<CNetAddr> opt_net_addr = ConsumeDeserializable<CNetAddr>(fuzzed_data_provider);
if (opt_net_addr) {
addr_man.Add(addresses, *opt_net_addr, std::chrono::seconds{ConsumeTime(fuzzed_data_provider, 0, 100000000)});
}
```
Also, if we are not calling `Add` effectively, it would also be affect other functions that may "depend" on it.
ACKs for top commit:
dergoegge:
Code review ACK 025fda0a76
Tree-SHA512: 02450bec0b084c15ba0cd1cbdfbac067c8fea4ccf27be0c86d54e020f029a6c749a16d8e0558f9d6d35a7ca9db8916f180c872f09474702b5591129e9be0d192
7d92b1430a refactor: use Span for SipHash::Write (Sebastian Falbesoner)
Pull request description:
This simple refactoring PR changes the interface for the `SipHash` arbitrary-data `Write` method to take a `Span<unsigned char>` instead of having to pass data and length. (`Span<std::byte>` seems to be more modern, but vectors of `unsigned char` are still used prety much everywhere where SipHash is called, and I didn't find it very appealing having to clutter the code with `Make(Writable)ByteSpan` helpers).
ACKs for top commit:
sipa:
utACK 7d92b1430a
MarcoFalke:
lgtm ACK 7d92b1430a
achow101:
ACK 7d92b1430a
Tree-SHA512: f17a27013c942aead4b09f5a64e0c3ff8dbc7e83fe63eb9a2e3ace8be9921c9cbba3ec67e3e83fbe3332ca941c42370efd059e702c060f9b508307e9657c66f2
fa6dfaaf45 scripted-diff: Use new FUZZ_TARGET macro everywhere (MarcoFalke)
fa36ad8b09 fuzz: Accept options in FUZZ_TARGET macro (MarcoFalke)
Pull request description:
The `FUZZ_TARGET` macros have many issues:
* The developer will have to pick the right macro to pass the wanted option.
* Adding a new option requires doubling the number of existing macros in the worst case.
Fix all issues by using only a single macro.
This refactor does not change behavior.
ACKs for top commit:
dergoegge:
ACK fa6dfaaf45
Tree-SHA512: 49a34553867a1734ce89e616b2d7c29b784a67cd8990db6573f0c7b18957636ef0c81d3d0d444a04c12cdc98bc4c4aa7a2ec94e6232dc363620a746e28416444
35a2175ad8 fuzz: addrman, add coverage for `network` field in `Select()`, `Size()` and `GetAddr()` (brunoerg)
Pull request description:
This PR adds fuzz coverage for `network` field in `Select()`, `Size()` and `GetAddr()`, there was only call to them without passing a network.
https://marcofalke.github.io/b-c-cov/fuzz.coverage/src/addrman.cpp.gcov.html
ACKs for top commit:
amitiuttarwar:
for the record, ACK 35a2175ad8 - only small changes from the version (previously) proposed in 27213
achow101:
ACK 35a2175ad8
mzumsande:
Code Review ACK 35a2175ad8, haven't tested this yet, but I will let the fuzzer run for a while now.
Tree-SHA512: dddb8322298d6c373c8e68d57538470b11825a9a310a355828c351d5c0b19ff6779d024a800e3ea90126d0c050e86f71fd22cd23d1a306c784cef0f82c45e3ca
`ConsumeDeserializable` may return `std::nullopt`, prefer
to call specific functions such as `ConsumeService`and
`ConsumeNetAddr` which always return a value.
This commit effectively moves the definition of these constants
out of the chainparamsbase to their own file.
Using the ChainType enums provides better type safety compared to
passing around strings.
The commit is part of an ongoing effort to decouple the libbitcoinkernel
library from the ArgsManager and other functionality that should not be
part of the kernel library.
This is an extraction of ArgsManager related functions from util/system
into their own common file.
Config file related functions are moved to common/config.cpp.
The background of this commit is an ongoing effort to decouple the
libbitcoinkernel library from the ArgsManager. The ArgsManager belongs
into the common library, since the kernel library should not depend on
it. See doc/design/libraries.md for more information on this rationale.
The functionality of the old size() is covered by the new Size()
when no arguments are specified, so this does not change behavior.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Make it possible to override from the command line (without recompiling)
the addrman check ratio in addrman fuzz tests instead of hardcoding it
to 0:
```
FUZZ=addrman ./src/test/fuzz/fuzz --checkaddrman=5
```
Blindly chose a cap of 10000 iterations for every loop, except for
the two in script_ops.cpp and scriptnum_ops.cpp which appeared to
(sometimes) be deserializing individual bytes; capped those to one
million to ensure that sometimes we try working with massive scripts.
There was also one fuzzer-controlled loop in timedata.cpp which was
already capped, so I left that alone.
git grep 'while (fuzz' should now run clean except for timedata.cpp
Use a (reference) parameter instead of a data member of
CAddrManDeterministic. This will allow us to make Fill() a free function
in a later commit.
Also remove CAddrManDeterministic.m_fuzzed_data_provider since it's no
longer used.
Introduce the pimpl pattern for CAddrMan to separate the implementation details
from the externally used object representation. This reduces compile-time
dependencies and conceptually clarifies AddrMan's interface from the
implementation specifics.
Since the unit & fuzz tests currently rely on accessing CAddrMan internals, this
commit introduces addrman_impl.h, which is exclusively imported by addrman.cpp
and test files.
Review hint: git diff --color-moved=dimmed-zebra
--color-moved-ws=ignore-all-space
Just use unique_ptr<CAddrMan>s and reset the pointer if a frest addrman is required.
Also make CAddrMan::Clear() private to ensure that no call sites are missed.