Commit Graph

83 Commits

Author SHA1 Message Date
merge-script
509dc91db1 Merge bitcoin/bitcoin#33026: test, refactor: Embedded ASMap [1/3]: Selected minor preparatory work
7f318e1dd0 test: Add better coverage for Autofile size() (Fabian Jahr)
b7af960eb8 refactor: Add AutoFile::size (Fabian Jahr)
ec0f75862e refactor: Modernize logging in util/asmap.cpp (Fabian Jahr)
606a251e0a tests: add unit test vectors for asmap interpreter (Pieter Wuille)

Pull request description:

  This contains some commits from #28792 that can be easily reviewed and merged independently. I hope splitting this change off can make this part move a bit faster and reduce frequency of needed rebases for #28792.

  The commits in order:
  - Add additional unit test vectors to the asmap interpreter (written by sipa). This helps to ensure that the further refactors in #28792 don't change behavior.
  - Modernizes the logging in `util/asmap.cpp`, I added this while touching the rest of the file all over anyway.
  - Adds an `AutoFile::size` helper function with some additional test coverage in a separate commit

ACKs for top commit:
  maflcko:
    review ACK 7f318e1dd0 🏀
  hodlinator:
    tACK 7f318e1dd0
  laanwj:
    Code review ACK 7f318e1dd0

Tree-SHA512: 45156b74e4bd9278a7ec24521dfdafe4dab1ba3384243c7d589ef17e16ca374ee2af7178c86b7229e80ca262dbe78c4d456d80a6ee742ec31d2ab5243dac8b57
2025-11-19 09:28:44 +00:00
Fabian Jahr
7f318e1dd0 test: Add better coverage for Autofile size()
The new test explicitly checks that the function does not change the current position.
2025-11-14 16:37:06 +02:00
Fabian Jahr
b7af960eb8 refactor: Add AutoFile::size 2025-11-14 01:17:38 +02:00
Lőrinc
2dea045425 test: make obfuscation_serialize more thorough
See: https://github.com/bitcoin/bitcoin/pull/31144#discussion_r2216849672

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-07-22 10:26:15 -07:00
Lőrinc
a17d8202c3 test: merge xor_roundtrip_random_chunks and xor_bytes_reference
Instead of a separate roundtrip test and a simplified xor reference test, we can merge the two and provide the same coverage

See: https://github.com/bitcoin/bitcoin/pull/31144#discussion_r2211205949

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-07-22 10:24:13 -07:00
Lőrinc
478d40afc6 refactor: encapsulate vector/array keys into Obfuscation 2025-07-16 14:33:07 -07:00
Lőrinc
377aab8e5a refactor: move util::Xor to Obfuscation().Xor
This is meant to focus the usages to narrow the scope of the obfuscation optimization.

`Obfuscation::Xor` is mostly a move.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-07-16 14:33:07 -07:00
Lőrinc
0b8bec8aa6 scripted-diff: unify xor-vs-obfuscation nomenclature
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.

-BEGIN VERIFY SCRIPT-
sed -i \
  -e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
  -e 's/\bxor_key\b/obfuscation/g' \
  -e 's/\bxor_pat\b/obfuscation/g' \
  -e 's/\bm_xor_key\b/m_obfuscation/g' \
  -e 's/\bm_xor\b/m_obfuscation/g' \
  -e 's/\bobfuscate_key\b/m_obfuscation/g' \
  -e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
  -e 's/\bSetXor(/SetObfuscation(/g' \
  -e 's/\bdata_xor\b/obfuscation/g' \
  -e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
  -e 's/\bobfuscate key\b/obfuscation key/g' \
  $(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
2025-07-16 14:32:01 -07:00
Lőrinc
618a30e326 test: compare util::Xor with randomized inputs against simple impl
The two tests are doing different things - `xor_roundtrip_random_chunks` does black-box style property-based testing to validate that certain invariants hold - that deobfuscating an obfuscation results in the original message (higher level, it doesn't have to know about the implementation details).

The `xor_bytes_reference` test makes sure the optimized xor implementation behaves in every imaginable scenario exactly as the simplest possible obfuscation - with random chunks, random alignment, random data, random key.

Since we're touching the file, other related small refactors were also applied:
* `nullpt` typo fixed;
* manual byte-by-byte xor key creations were replaced with `_hex` factories;
* since we're only using 64 bit keys in production, smaller keys were changed to reflect real-world usage;

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
2025-07-16 14:28:05 -07:00
Lőrinc
54ab0bd64c refactor: commit to 8 byte obfuscation keys
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-07-16 13:19:18 -07:00
Vasil Dimov
8bb34f07df Explicitly close all AutoFiles that have been written
There is no way to report a close error from `AutoFile` destructor.
Such an error could be serious if the file has been written to because
it may mean the file is now corrupted (same as if write fails).

So, change all users of `AutoFile` that use it to write data to
explicitly close the file and handle a possible error.
2025-06-16 15:33:15 +02:00
Lőrinc
8d801e3efb optimization: bulk serialization writes in WriteBlockUndo and WriteBlock
Similarly to the serialization reads optimization, buffered writes will enable batched XOR calculations.
This is especially beneficial since the current implementation requires copying the write input's `std::span` to perform obfuscation.
Batching allows us to apply XOR operations on the internal buffer instead, reducing unnecessary data copying and improving performance.

------

> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=10000

Before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        5,149,564.31 |              194.19 |    0.8% |     10.95 | `WriteBlockBench`

After:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,990,564.63 |              334.39 |    1.5% |     11.27 | `WriteBlockBench`

------

> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=20000

Before:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        5,152,973.58 |              194.06 |    2.2% |   19,350,886.41 |    8,784,539.75 |  2.203 |   3,079,335.21 |    0.4% |     23.18 | `WriteBlockBench`

After:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        4,145,681.13 |              241.21 |    4.0% |   15,337,596.85 |    5,732,186.47 |  2.676 |   2,239,662.64 |    0.1% |     23.94 | `WriteBlockBench`

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
2025-04-14 12:04:06 +02:00
Lőrinc
520965e293 optimization: bulk serialization reads in UndoRead, ReadBlock
The obfuscation (XOR) operations are currently done byte-by-byte during serialization. Buffering the reads will enable batching the obfuscation operations later.

Different operating systems handle file caching differently, so reading larger batches (and processing them from memory) is measurably faster, likely because of fewer native fread calls and reduced lock contention.

Note that `ReadRawBlock` doesn't need buffering since it already reads the whole block directly.
Unlike `ReadBlockUndo`, the new `ReadBlock` implementation delegates to `ReadRawBlock`, which uses more memory than a buffered alternative but results in slightly simpler code and a small performance increase (~0.4%). This approach also clearly documents that `ReadRawBlock` is a logical subset of `ReadBlock` functionality.

The current implementation, which iterates over a fixed-size buffer, provides a more general alternative to Cory Fields' solution of reading the entire block size in advance.

Buffer sizes were selected based on benchmarking to ensure the buffered reader produces performance similar to reading the whole block into memory. Smaller buffers were slower, while larger ones showed diminishing returns.

------

> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=10000

Before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,271,441.67 |              440.25 |    0.1% |     11.00 | `ReadBlockBench`

After:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        1,738,971.29 |              575.05 |    0.2% |     10.97 | `ReadBlockBench`

------

> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=20000

Before:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        6,895,987.11 |              145.01 |    0.0% |   71,055,269.86 |   23,977,374.37 |  2.963 |   5,074,828.78 |    0.4% |     22.00 | `ReadBlockBench`

After:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        5,771,882.71 |              173.25 |    0.0% |   65,741,889.82 |   20,453,232.33 |  3.214 |   3,971,321.75 |    0.3% |     22.01 | `ReadBlockBench`

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
2025-04-14 12:04:06 +02:00
MarcoFalke
fa942332b4 scripted-diff: Bump copyright headers after std::span changes
Historically, the headers have been bumped some time after a file has
been touched. Do it now to avoid having to touch them again in the
future for that reason.

-BEGIN VERIFY SCRIPT-
 sed -i --regexp-extended 's;( 20[0-2][0-9])(-20[0-2][0-9])? The Bitcoin Core developers;\1-present The Bitcoin Core developers;g' $( git show --pretty="" --name-only HEAD~1 )
-END VERIFY SCRIPT-
2025-03-12 19:46:54 +01:00
MarcoFalke
fade0b5e5e scripted-diff: Use std::span over Span
-BEGIN VERIFY SCRIPT-

 ren() { sed -i "s!\<$1\>!$2!g" $( git grep -l "$1" -- "./src" ":(exclude)src/span.h" ":(exclude)src/leveldb/db/log_test.cc" ) ; }

 ren Span            std::span
 ren AsBytes         std::as_bytes
 ren AsWritableBytes std::as_writable_bytes

 sed -i 's!SpanPopBack(Span!SpanPopBack(std::span!g' ./src/span.h

-END VERIFY SCRIPT-
2025-03-12 19:45:37 +01:00
Pieter Wuille
e624a9bef1 streams: cache file position within AutoFile 2024-09-13 07:35:41 -04:00
MarcoFalke
fa0fe08eca scripted-diff: [test] Use g_rng/m_rng directly
-BEGIN VERIFY SCRIPT-

 # Use m_rng in unit test files
 ren() { sed -i "s:\<$1\>:$2:g" $( git grep -l "$1" src/test/*.cpp src/wallet/test/*.cpp src/test/util/setup_common.cpp ) ; }
 ren InsecureRand32                m_rng.rand32
 ren InsecureRand256               m_rng.rand256
 ren InsecureRandBits              m_rng.randbits
 ren InsecureRandRange             m_rng.randrange
 ren InsecureRandBool              m_rng.randbool
 ren g_insecure_rand_ctx           m_rng
 ren g_insecure_rand_ctx_temp_path g_rng_temp_path

-END VERIFY SCRIPT-
2024-08-26 11:19:52 +02:00
MarcoFalke
fa895c7283 mingw: Document mode wbx workaround 2024-07-26 17:31:15 +02:00
Pieter Wuille
810cdf6b4e tests: overhaul deterministic test randomness
The existing code provides two randomness mechanisms for test purposes:
- g_insecure_rand_ctx (with its wrappers InsecureRand*), which during tests is
  initialized using either zeros (SeedRand::ZEROS), or using environment-provided
  randomness (SeedRand::SEED).
- g_mock_deterministic_tests, which controls some (but not all) of the normal
  randomness output if set, but then makes it extremely predictable (identical
  output repeatedly).

Replace this with a single mechanism, which retains the SeedRand modes to control
all randomness. There is a new internal deterministic PRNG inside the random
module, which is used in GetRandBytes() when in test mode, and which is also used
to initialize g_insecure_rand_ctx. This means that during tests, all random numbers
are made deterministic. There is one exception, GetStrongRandBytes(), which even
in test mode still uses the normal PRNG state.

This probably opens the door to removing a lot of the ad-hoc "deterministic" mode
functions littered through the codebase (by simply running relevant tests in
SeedRand::ZEROS mode), but this isn't done yet.
2024-07-01 10:26:46 -04:00
Hennadii Stepanov
976e5d8f7b test: Fix test/streams_tests.cpp compilation on SunOS / illumos
On systems where `int8_t` is defined as `char`, the
`{S,Uns}erialize(Stream&, signed char)` functions become undefined.

This change resolves the issue by testing
`{S,Uns}erialize(Stream&, int8_t)` instead.

No behavior change on systems where `int8_t` is defined as
`signed char`, which is the case for most other systems.
2024-04-18 10:53:32 +01:00
Hennadii Stepanov
d2fe90571e test: Drop x modifier in fsbridge::fopen call for mingw builds
The MinGW-w64 toolchain links executables to the old msvcrt C Runtime
Library that does not support the `x` modifier for the _wfopen()
function.
2024-02-26 14:47:31 +00:00
MarcoFalke
fac39b56b7 refactor: SpanReader without nVersion
The field is unused, so remove it.

This is also required for future commits.
2023-11-28 12:42:07 +01:00
fanquake
c252a0fc0f Merge bitcoin/bitcoin#28892: refactor: P2P transport without serialize version and type
fa79a881ce refactor: P2P transport without serialize version and type (MarcoFalke)
fa9b5f4fe3 refactor: NetMsg::Make() without nVersion (MarcoFalke)
66669da4a5 Remove unused Make() overload in netmessagemaker.h (MarcoFalke)
fa0ed07941 refactor: VectorWriter without nVersion (MarcoFalke)

Pull request description:

  Now that the serialize framework ignores the serialize version and serialize type, everything related to it can be removed from the code.

  This is the first step, removing dead code from the P2P stack. A different pull will remove it from the wallet and other parts.

ACKs for top commit:
  ajtowns:
    reACK fa79a881ce

Tree-SHA512: 785b413580d980f51f0d4f70ea5e0a99ce14cd12cb065393de2f5254891be94a14f4266110c8b87bd2dbc37467676655bce13bdb295ab139749fcd8b61bd5110
2023-11-28 11:24:09 +00:00
Anthony Towns
cde9a4b137 refactor: switch from CAutoFile to AutoFile 2023-11-18 03:01:41 +10:00
Anthony Towns
e63f643079 streams: Base BufferedFile on AutoFile instead of CAutoFile 2023-11-18 00:15:22 +10:00
MarcoFalke
fa0ed07941 refactor: VectorWriter without nVersion
The field is unused, so remove it.

This is also required for future commits.
2023-11-17 14:38:26 +01:00
fanquake
48b8910d12 Merge bitcoin/bitcoin#28508: refactor: Remove SER_GETHASH, hard-code client version in CKeyPool serialize
fac29a0ab1 Remove SER_GETHASH, hard-code client version in CKeyPool serialize (MarcoFalke)
fa72f09d6f Remove CHashWriter type (MarcoFalke)
fa4a9c0f43 Remove unused GetType() from OverrideStream, CVectorWriter, SpanReader (MarcoFalke)

Pull request description:

  Removes a bunch of redundant, dead or duplicate code.

  Uses the idea from and finishes the idea https://github.com/bitcoin/bitcoin/pull/28428 by theuni

ACKs for top commit:
  ajtowns:
    ACK fac29a0ab1
  kevkevinpal:
    added one nit but otherwise ACK [fac29a0](fac29a0ab1)

Tree-SHA512: cc805e2f38e73869a6691fdb5da09fa48524506b87fc93f05d32c336ad3033425a2d7608e317decd3141fde3f084403b8de280396c0c39132336fe0f7510af9e
2023-10-02 12:33:54 +02:00
MarcoFalke
fa4a9c0f43 Remove unused GetType() from OverrideStream, CVectorWriter, SpanReader
GetType() is never called, so it is completely unused and can be
removed.
2023-09-19 14:19:57 +00:00
MarcoFalke
9999b89cd3 Make BufferedFile to be a CAutoFile wrapper
This refactor allows to forward some calls to the underlying CAutoFile,
instead of re-implementing the logic in the buffered file.
2023-09-15 14:34:17 +02:00
MarcoFalke
fa389d902f refactor: Drop unused fclose() from BufferedFile
This was only explicitly used in the tests, where it can be replaced by
wrapping the original raw file pointer into a CAutoFile on creation and
then calling CAutoFile::fclose().

Also, it was used in LoadExternalBlockFile(), where it can also be
replaced by the (implicit call to the) CAutoFile destructor after
wrapping the original raw file pointer in a CAutoFile.
2023-09-15 14:33:51 +02:00
MarcoFalke
fa19c914f7 scripted-diff: Rename CBufferedFile to BufferedFile
While touching all constructors in the previous commit, the class name
can be adjusted to comply with the style guide.

-BEGIN VERIFY SCRIPT-
 sed -i 's/CBufferedFile/BufferedFile/g' $( git grep -l CBufferedFile )
-END VERIFY SCRIPT-
2023-09-12 12:55:29 +02:00
MarcoFalke
fa2f2413b8 Remove unused GetType() from CBufferedFile and CAutoFile
GetType() is only called in tests, so it is unused and can be removed.
2023-09-12 12:35:13 +02:00
MarcoFalke
fa626af3ed Remove unused legacy CHashVerifier 2023-09-05 10:13:50 +02:00
MarcoFalke
fa633aa690 streams: Teach AutoFile how to XOR 2023-07-19 18:12:42 +02:00
Larry Ruane
72efc26439 util: improve streams.h:FindByte() performance
Avoid use of the expensive mod operator (%) when calculating the
buffer offset. No functional difference.

Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
2023-05-05 06:03:17 -06:00
TheCharlatan
00e9b97f37 refactor: Move fs.* to util/fs.*
The fs.* files are already part of the libbitcoin_util library. With the
introduction of the fs_helpers.* it makes sense to move fs.* into the
util/ directory as well.
2023-03-23 12:55:18 +01:00
Jon Atack
81f5ade2a3 Move random test util code from setup_common to random
as many of the unit tests don't use this code
2023-02-06 12:26:04 -08:00
MarcoFalke
fa29e73cda Use DataStream where possible 2023-01-26 10:44:05 +01:00
Martin Zumsande
da6c7aeca3 hash: add HashedSourceWriter
This class is the counterpart to CHashVerifier, in that it
writes data to an underlying source stream,
while keeping a hash of the written data.
2023-01-17 17:19:56 -05:00
Pasta
f2fc03ec85 refactor: use braced init for integer constants instead of c style casts 2023-01-03 19:31:29 -06:00
Hennadii Stepanov
306ccd4927 scripted-diff: Bump copyright headers
-BEGIN VERIFY SCRIPT-
./contrib/devtools/copyright_header.py update ./
-END VERIFY SCRIPT-

Commits of previous years:
- 2021: f47dda2c58
- 2020: fa0074e2d8
- 2019: aaaaad6ac9
2022-12-24 23:49:50 +00:00
Larry Ruane
c72de9990a util: add CBufferedFile::SkipTo() to move ahead in the stream
SkipTo() reads data from the file into the CBufferedFile object
(memory), but, unlike this object's read() method, SkipTo() doesn't
transfer data into a caller's memory buffer. This is useful because
after skipping forward in the stream in this way, the user can, if
needed, rewind the stream (SetPos()) and access the object's memory
buffer including ranges that were skipped over (without needing to
read from the disk file).
2022-10-24 13:02:37 -06:00
MarcoFalke
faee5f8dc2 test: Create fresh CDataStream each time
Can be reviewed with --ignore-all-space
2022-02-03 20:16:41 +01:00
MarcoFalke
fa71114926 test: Inline expected_xor 2022-02-03 20:16:39 +01:00
Kiminuo
41d7166c8a refactor: replace boost::filesystem with std::filesystem
Warning: Replacing fs::system_complete calls with fs::absolute calls
in this commit may cause minor changes in behaviour because fs::absolute
no longer strips trailing slashes; however these changes are believed to
be safe.

Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
2022-02-03 18:35:52 +08:00
MarcoFalke
faa630aa15 test: Fix sanitizer suppresions in streams_tests 2022-01-28 10:23:42 +01:00
MarcoFalke
fa24493d63 Use spans of std::byte in serialize
This switches .read() and .write() to take spans of bytes.
2022-01-02 11:40:31 +01:00
Hennadii Stepanov
f47dda2c58 scripted-diff: Bump copyright headers
-BEGIN VERIFY SCRIPT-
./contrib/devtools/copyright_header.py update ./
-END VERIFY SCRIPT-

Commits of previous years:
* 2020: fa0074e2d8
* 2019: aaaaad6ac9
2021-12-30 19:36:57 +02:00
Pieter Wuille
31ba1af74a Remove unused (and broken) functionality in SpanReader
This removes the ability to set an offset in the SpanReader constructor,
as the current code is broken. All call sites use pos=0, so it is actually
unused. If future call sites need it, SpanReader{a, b, c, d} is equivalent
to SpanReader{a, b, c.subspan(d)}.

It also removes the ability to deserialize from SpanReader directly from
the constructor. This too is unused, and can be more idiomatically
simulated using (SpanReader{a, b, c} >> x >> y >> z) instead of
SpanReader{a, b, c, x, y, z}.
2021-12-06 16:18:14 -05:00
Pieter Wuille
2c35a93b3c Generalize/simplify VectorReader into SpanReader 2021-12-02 14:47:17 -05:00