7f318e1dd0 test: Add better coverage for Autofile size() (Fabian Jahr)
b7af960eb8 refactor: Add AutoFile::size (Fabian Jahr)
ec0f75862e refactor: Modernize logging in util/asmap.cpp (Fabian Jahr)
606a251e0a tests: add unit test vectors for asmap interpreter (Pieter Wuille)
Pull request description:
This contains some commits from #28792 that can be easily reviewed and merged independently. I hope splitting this change off can make this part move a bit faster and reduce frequency of needed rebases for #28792.
The commits in order:
- Add additional unit test vectors to the asmap interpreter (written by sipa). This helps to ensure that the further refactors in #28792 don't change behavior.
- Modernizes the logging in `util/asmap.cpp`, I added this while touching the rest of the file all over anyway.
- Adds an `AutoFile::size` helper function with some additional test coverage in a separate commit
ACKs for top commit:
maflcko:
review ACK 7f318e1dd0🏀
hodlinator:
tACK 7f318e1dd0
laanwj:
Code review ACK 7f318e1dd0
Tree-SHA512: 45156b74e4bd9278a7ec24521dfdafe4dab1ba3384243c7d589ef17e16ca374ee2af7178c86b7229e80ca262dbe78c4d456d80a6ee742ec31d2ab5243dac8b57
This is meant to focus the usages to narrow the scope of the obfuscation optimization.
`Obfuscation::Xor` is mostly a move.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.
-BEGIN VERIFY SCRIPT-
sed -i \
-e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
-e 's/\bxor_key\b/obfuscation/g' \
-e 's/\bxor_pat\b/obfuscation/g' \
-e 's/\bm_xor_key\b/m_obfuscation/g' \
-e 's/\bm_xor\b/m_obfuscation/g' \
-e 's/\bobfuscate_key\b/m_obfuscation/g' \
-e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
-e 's/\bSetXor(/SetObfuscation(/g' \
-e 's/\bdata_xor\b/obfuscation/g' \
-e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
-e 's/\bobfuscate key\b/obfuscation key/g' \
$(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
The two tests are doing different things - `xor_roundtrip_random_chunks` does black-box style property-based testing to validate that certain invariants hold - that deobfuscating an obfuscation results in the original message (higher level, it doesn't have to know about the implementation details).
The `xor_bytes_reference` test makes sure the optimized xor implementation behaves in every imaginable scenario exactly as the simplest possible obfuscation - with random chunks, random alignment, random data, random key.
Since we're touching the file, other related small refactors were also applied:
* `nullpt` typo fixed;
* manual byte-by-byte xor key creations were replaced with `_hex` factories;
* since we're only using 64 bit keys in production, smaller keys were changed to reflect real-world usage;
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
There is no way to report a close error from `AutoFile` destructor.
Such an error could be serious if the file has been written to because
it may mean the file is now corrupted (same as if write fails).
So, change all users of `AutoFile` that use it to write data to
explicitly close the file and handle a possible error.
The obfuscation (XOR) operations are currently done byte-by-byte during serialization. Buffering the reads will enable batching the obfuscation operations later.
Different operating systems handle file caching differently, so reading larger batches (and processing them from memory) is measurably faster, likely because of fewer native fread calls and reduced lock contention.
Note that `ReadRawBlock` doesn't need buffering since it already reads the whole block directly.
Unlike `ReadBlockUndo`, the new `ReadBlock` implementation delegates to `ReadRawBlock`, which uses more memory than a buffered alternative but results in slightly simpler code and a small performance increase (~0.4%). This approach also clearly documents that `ReadRawBlock` is a logical subset of `ReadBlock` functionality.
The current implementation, which iterates over a fixed-size buffer, provides a more general alternative to Cory Fields' solution of reading the entire block size in advance.
Buffer sizes were selected based on benchmarking to ensure the buffered reader produces performance similar to reading the whole block into memory. Smaller buffers were slower, while larger ones showed diminishing returns.
------
> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=10000
Before:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 2,271,441.67 | 440.25 | 0.1% | 11.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1,738,971.29 | 575.05 | 0.2% | 10.97 | `ReadBlockBench`
------
> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=20000
Before:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 6,895,987.11 | 145.01 | 0.0% | 71,055,269.86 | 23,977,374.37 | 2.963 | 5,074,828.78 | 0.4% | 22.00 | `ReadBlockBench`
After:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 5,771,882.71 | 173.25 | 0.0% | 65,741,889.82 | 20,453,232.33 | 3.214 | 3,971,321.75 | 0.3% | 22.01 | `ReadBlockBench`
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
Historically, the headers have been bumped some time after a file has
been touched. Do it now to avoid having to touch them again in the
future for that reason.
-BEGIN VERIFY SCRIPT-
sed -i --regexp-extended 's;( 20[0-2][0-9])(-20[0-2][0-9])? The Bitcoin Core developers;\1-present The Bitcoin Core developers;g' $( git show --pretty="" --name-only HEAD~1 )
-END VERIFY SCRIPT-
The existing code provides two randomness mechanisms for test purposes:
- g_insecure_rand_ctx (with its wrappers InsecureRand*), which during tests is
initialized using either zeros (SeedRand::ZEROS), or using environment-provided
randomness (SeedRand::SEED).
- g_mock_deterministic_tests, which controls some (but not all) of the normal
randomness output if set, but then makes it extremely predictable (identical
output repeatedly).
Replace this with a single mechanism, which retains the SeedRand modes to control
all randomness. There is a new internal deterministic PRNG inside the random
module, which is used in GetRandBytes() when in test mode, and which is also used
to initialize g_insecure_rand_ctx. This means that during tests, all random numbers
are made deterministic. There is one exception, GetStrongRandBytes(), which even
in test mode still uses the normal PRNG state.
This probably opens the door to removing a lot of the ad-hoc "deterministic" mode
functions littered through the codebase (by simply running relevant tests in
SeedRand::ZEROS mode), but this isn't done yet.
On systems where `int8_t` is defined as `char`, the
`{S,Uns}erialize(Stream&, signed char)` functions become undefined.
This change resolves the issue by testing
`{S,Uns}erialize(Stream&, int8_t)` instead.
No behavior change on systems where `int8_t` is defined as
`signed char`, which is the case for most other systems.
fa79a881ce refactor: P2P transport without serialize version and type (MarcoFalke)
fa9b5f4fe3 refactor: NetMsg::Make() without nVersion (MarcoFalke)
66669da4a5 Remove unused Make() overload in netmessagemaker.h (MarcoFalke)
fa0ed07941 refactor: VectorWriter without nVersion (MarcoFalke)
Pull request description:
Now that the serialize framework ignores the serialize version and serialize type, everything related to it can be removed from the code.
This is the first step, removing dead code from the P2P stack. A different pull will remove it from the wallet and other parts.
ACKs for top commit:
ajtowns:
reACK fa79a881ce
Tree-SHA512: 785b413580d980f51f0d4f70ea5e0a99ce14cd12cb065393de2f5254891be94a14f4266110c8b87bd2dbc37467676655bce13bdb295ab139749fcd8b61bd5110
fac29a0ab1 Remove SER_GETHASH, hard-code client version in CKeyPool serialize (MarcoFalke)
fa72f09d6f Remove CHashWriter type (MarcoFalke)
fa4a9c0f43 Remove unused GetType() from OverrideStream, CVectorWriter, SpanReader (MarcoFalke)
Pull request description:
Removes a bunch of redundant, dead or duplicate code.
Uses the idea from and finishes the idea https://github.com/bitcoin/bitcoin/pull/28428 by theuni
ACKs for top commit:
ajtowns:
ACK fac29a0ab1
kevkevinpal:
added one nit but otherwise ACK [fac29a0](fac29a0ab1)
Tree-SHA512: cc805e2f38e73869a6691fdb5da09fa48524506b87fc93f05d32c336ad3033425a2d7608e317decd3141fde3f084403b8de280396c0c39132336fe0f7510af9e
This was only explicitly used in the tests, where it can be replaced by
wrapping the original raw file pointer into a CAutoFile on creation and
then calling CAutoFile::fclose().
Also, it was used in LoadExternalBlockFile(), where it can also be
replaced by the (implicit call to the) CAutoFile destructor after
wrapping the original raw file pointer in a CAutoFile.
While touching all constructors in the previous commit, the class name
can be adjusted to comply with the style guide.
-BEGIN VERIFY SCRIPT-
sed -i 's/CBufferedFile/BufferedFile/g' $( git grep -l CBufferedFile )
-END VERIFY SCRIPT-
Avoid use of the expensive mod operator (%) when calculating the
buffer offset. No functional difference.
Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
The fs.* files are already part of the libbitcoin_util library. With the
introduction of the fs_helpers.* it makes sense to move fs.* into the
util/ directory as well.
SkipTo() reads data from the file into the CBufferedFile object
(memory), but, unlike this object's read() method, SkipTo() doesn't
transfer data into a caller's memory buffer. This is useful because
after skipping forward in the stream in this way, the user can, if
needed, rewind the stream (SetPos()) and access the object's memory
buffer including ranges that were skipped over (without needing to
read from the disk file).
Warning: Replacing fs::system_complete calls with fs::absolute calls
in this commit may cause minor changes in behaviour because fs::absolute
no longer strips trailing slashes; however these changes are believed to
be safe.
Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
This removes the ability to set an offset in the SpanReader constructor,
as the current code is broken. All call sites use pos=0, so it is actually
unused. If future call sites need it, SpanReader{a, b, c, d} is equivalent
to SpanReader{a, b, c.subspan(d)}.
It also removes the ability to deserialize from SpanReader directly from
the constructor. This too is unused, and can be more idiomatically
simulated using (SpanReader{a, b, c} >> x >> y >> z) instead of
SpanReader{a, b, c, x, y, z}.