[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits



On 08 May, 21:53, tevador <tevador@xxxxxxxxx> wrote:
> In particular, the following parameters should be set differently from
> Monero:
>
>     RANDOMX_ARGON_SALT = "RandomX-TOR-v1"
>
> The unique RandomX salt means we do not need to use a separate salt as PoW
> input as specified in § 3.2.
>
>     RANDOMX_ARGON_ITERATIONS = 1
>     RANDOMX_CACHE_ACCESSES = 4
>     RANDOMX_DATASET_BASE_SIZE = 1073741824
>     RANDOMX_DATASET_EXTRA_SIZE = 16777216
>
> These 4 changes reduce the RandomX Dataset size to ~1 GiB, which allows
> the number of iteration to be reduced from 8 to 4. The combined effect of
> this is that Dataset initialization becomes 4 times faster, which is needed
> due to more frequent updates of the seed (Monero updates once per ~3 days).
>
>     RANDOMX_PROGRAM_COUNT = 2
>     RANDOMX_SCRATCHPAD_L3 = 1048576
>
> Additionally, reducing the number of programs from 8 to 2 makes the hash
> calculation about 4 times faster, while still providing resistance against
> program filtering strategies (see [REF_RANDOMX_PROGRAMS]). Since there are
> 4 times fewer writes, we also have to reduce the scratchpad size. I suggest
> to use a 1 MiB scratchpad size as a compromise between scratchpad write
> density and memory hardness. Most x86 CPUs will perform roughly the same
> with a 512 KiB and 1024 KiB scratchpad, while the larger size provides
> higher resistance against specialized hardware, at the cost of possible
> time-memory tradeoffs (see [REF_RANDOMX_TMTO] for details).
>
> Lastly, we reduce the output of RandomX to just 8 bytes:
>
>    RANDOMX_HASH_SIZE = 8
>
> 64-bit preimage security is more than sufficient for proof-of-work and it
> allows the result to be treated as a little-endian encoded unsigned integer
> for easy effort calculation.

I have implemented this in the tor-pow branch of the RandomX repository:

    https://github.com/tevador/RandomX/tree/tor-pow

Namely I have changed the API to return the hash value as an uint64_t and
made corresponding changes in the benchmark.

Benchmark example:

    ./randomx-benchmark --mine \
                        --avx2 \
                        --jit  \
                        --largePages \
                        --nonces 10000 \
                        --seed 1234 \
                        --init 1 \
                        --threads 1 \
                        --batch
    RandomX-TOR-v1 benchmark
     - Argon2 implementation: AVX2
     - full memory mode (1040 MiB)
     - JIT compiled mode
     - hardware AES mode
     - large pages mode
     - batch mode
    Initializing (1 thread) ...
    Memory initialized in 5.32855 s
    Initializing 1 virtual machine(s) ...
    Running benchmark (10000 nonces) ...
    Performance: 2535.43 hashes per second
    Best result:
      Nonce: 8bc3ded34d2dcdeed9000000f95cd20c
      Result: d947ceff08750300
      Effort: 18956
      Valid: 1

At the end, it prints out the nonce that gives the highest effort value and
validates it.

For the actual implementation in TOR, the RandomX validator should run in
a separate thread that doesn't do anything else apart from validation and
moving valid requests into the Intro Queue. This way we can reach the maximum
performance of ~2000 processed requests per second.

Finally, here are some disadvantages of RandomX-TOR:

 1) Fast verification requires ~1 GiB of memory. If we decide to use two
    overlapping seed epochs, each service will need to allocate >2 GiB of RAM
    just to verify the PoW. Alternatively, it is possible to use the slow
    mode, which requires only 256 MiB per seed, but runs 4x slower.
 2) The fast mode needs about 5 seconds to initialize every time the
seed is      changed (can be reduced to under 1 second using multiple
threads). The
    slow mode needs about 0.1 seconds to initialize.
 3) RandomX includes a JIT compiler for maximum performance. The iOS operating
    system doesn't support JIT compilation, so RandomX runs about 10x slower
    there.
 4) The JIT compiler in RandomX is currently implemented only for
x86-64 and      ARM64 CPU architectures. Other architectures will run
very slowly
    (especially 32-bit systems). However, the two supported architectures
    cover the vast majority of devices, so this should not be an issue.
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev