An Introdction to RandomX and its Benchmarking Tool

This thread is about RandomX,

UPDATED 11/July/19 testing environment and Linux USB install benchmarking.
See here - An Introdction to RandomX and its Benchmarking Tool

Initially using the benchmarking tool on windows, but I will go deep on the internals of it. Lets start out slow.

Please see this GitHub issues regarding the bug I encountered. I closed it as “cannot reproduce” - I will leave this waring up though. Running more than one instance on a consumer grade CPU (non NUMA, is pointless and results in no benefits. - upgrading all the windows patches does though)

If you have accidentally run two instances and get the blue screen memory management issue, these steps completely resolved the issue for me.

How I managed to recover to a normal state

I had a number of dotnet files corrupted. specifically MSCORMMC.DLL which makes sense.
This might have happened because I was missing two updates for recent dotnet. or it might just be because something else was trying to use the mmc at the same time. I don’t know.

In order to get these update I needed to install dotnet 4.8. - This also appears to gives a hash rate increase. don’t know why.

An additional bonus of having all this up to date is that you get significant hashrate increase. Version numbers to come. Window 1903 spring 2019 update gives me the best performance.

Details of running two instances were quite surprising. Now they are irrelevant. Don’t bother doing it. Just update to the latest versions of everything via windows update and run one optimised instance. I have left the stuff below to show how bad hashrate can be caused by missing windows updates. and how running two instances with full updates is the same as running one instance.

For those who don’t or cannot mess with their windows patch management I suggest following the ubuntu guide linked above.

Two concurrent benchmarking instances - not worth it. left in anyway.

Notice the combined hashrate is the same as just one instance. It ran over 100000 nonces. I fell asleep whilst it was running and didn’t record the cpu usage. - I’m guessing 100%, if it was then more cooling might give more hashes. Or the cpu might have just correctly allocated half of the resources to each instance so I was only using the normal amount of CPU and gained nothing.

I will re run this on ryzen, where I have better cooling and a lot more control over the config.

I also did not fire off both instances at the same time I waited for the first instance to finish initialising (11 seconds. Not 100% why it was longer than normal, maybe something to do with unrelated background tasks.)

FWIW - im pretty sure that the actual bug was a kernel exploit. When I go through my patching I will try to find the CVE for the bug. It may be an intel only bug, or a windows bug, or something entirely different. - It seems a windows dotnet bug to me at this point though.

On with the benchmarks.

benchmark --mine --init 4 --threads 2 --nonces 100000 --largePages --jit
RandomX benchmark

  • full memory mode (2080 MiB)
  • JIT compiled mode
  • hardware AES mode
  • large pages mode
    Initializing (4 threads) …
    Memory initialized in 12.5623 s
    Initializing 2 virtual machine(s) …
    Running benchmark (100000 nonces) …
    Calculated result: 9b22794882187000d62c6d2b228fab5e585767aaaa5eb74905b0c7c00fcbdad8
    Performance: 500.141 hashes per second

benchmark --mine --init 4 --threads 2 --nonces 100000 --largePages --jit
RandomX benchmark

  • full memory mode (2080 MiB)
  • JIT compiled mode
  • hardware AES mode
  • large pages mode
    Initializing (4 threads) …
    Memory initialized in 11.1539 s
    Initializing 2 virtual machine(s) …
    Running benchmark (100000 nonces) …
    Calculated result: 9b22794882187000d62c6d2b228fab5e585767aaaa5eb74905b0c7c00fcbdad8
    Performance: 478.394 hashes per second

Big update to hashrate. All tables have been updated with new info.

I love this kind of testing. :smiley:


Anyway the info.
The updates have given me a bit increase in hashrate and the benchmark program scales more inline with the outlined specs.

Test i7 Laptop

CPU-Z TXT Report

Processors Information

Processor 1 ID = 0
Number of cores 2 (max 2)
Number of threads 4 (max 4)
Name Intel Core i7 3540M
Codename Ivy Bridge
Specification Intel® Core™ i7-3540M CPU @ 3.00GHz
Package (platform ID) Socket 1023 FCBGA (0x4)
CPUID 6.A.9
Extended CPUID 6.3A
Core Stepping E1/L1
Technology 22 nm
TDP Limit 35.0 Watts
Tjmax 105.0 °C
Core Speed 3487.9 MHz
Multiplier x Bus Speed 35.0 x 99.7 MHz
Stock frequency 3000 MHz
Instructions sets MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX
L1 Data cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L1 Instruction cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L2 cache 2 x 256 KBytes, 8-way set associative, 64-byte line size
L3 cache 4 MBytes, 16-way set associative, 64-byte line size
FID/VID Control yes

Thread dumps

CPU Thread 0
APIC ID 0
Topology Processor ID 0, Core ID 0, Thread ID 0
Type 01020101h
Max CPUID level 0000000Dh
Max CPUID ext. level 80000008h
Cache descriptor Level 1, D, 32 KB, 2 thread(s)
Cache descriptor Level 1, I, 32 KB, 2 thread(s)
Cache descriptor Level 2, U, 256 KB, 2 thread(s)
Cache descriptor Level 3, U, 4 MB, 16 thread(s)

Chipset

Northbridge Intel Ivy Bridge rev. 09
Southbridge Intel QM77 rev. 04
Memory Type DDR3
Memory Size 16 GBytes
Channels Dual
Memory Frequency 797.3 MHz (1:6)
CAS# latency (CL) 11.0
RAS# to CAS# delay (tRCD) 11
RAS# Precharge (tRP) 11
Cycle Time (tRAS) 28
Command Rate (CR) 1T
Host Bridge 0x0154

MCHBAR I/O Base address 0x0FED10000
MCHBAR I/O Size 19456
MCHBAR registers

Memory SPD

DIMM # 1
SMBus address 0x50
Memory type DDR3
Module format SO-DIMM
Manufacturer (ID) Kingston (7F980000000000000000)
Size 8192 MBytes
Max bandwidth PC3-12800 (800 MHz)
Part number 9905428-422.A00LF
Serial number 0B36C2B8
Manufacturing date Week 18/Year 15
Number of banks 8
Nominal Voltage 1.50 Volts
EPP no
XMP no
AMP no
JEDEC timings table CL-tRCD-tRP-tRAS-tRC @ frequency
JEDEC #1 5.0-5-5-14-19 @ 380 MHz
JEDEC #2 6.0-6-6-16-22 @ 457 MHz
JEDEC #3 7.0-7-7-19-26 @ 533 MHz
JEDEC #4 8.0-8-8-22-30 @ 609 MHz
JEDEC #5 9.0-9-9-24-33 @ 685 MHz
JEDEC #6 10.0-10-10-27-37 @ 761 MHz
JEDEC #7 11.0-11-11-28-39 @ 800 MHz

DIMM # 2
SMBus address 0x52
Memory type DDR3
Module format SO-DIMM
Manufacturer (ID) Kingston (7F980000000000000000)
Size 8192 MBytes
Max bandwidth PC3-12800 (800 MHz)
Part number 9905428-417.A00LF
Serial number 593A5631
Manufacturing date Week 13/Year 15
Number of banks 8
Nominal Voltage 1.35 Volts
EPP no
XMP no
AMP no
JEDEC timings table CL-tRCD-tRP-tRAS-tRC @ frequency
JEDEC #1 5.0-5-5-14-19 @ 380 MHz
JEDEC #2 6.0-6-6-16-22 @ 457 MHz
JEDEC #3 7.0-7-7-19-26 @ 533 MHz
JEDEC #4 8.0-8-8-22-30 @ 609 MHz
JEDEC #5 9.0-9-9-24-33 @ 685 MHz
JEDEC #6 10.0-10-10-27-37 @ 761 MHz
JEDEC #7 11.0-11-11-28-39 @ 800 MHz

I have done some very basic CPU testing so far. From these results (they are averages) you will see the very basics of how to use the benchmark tool. I do intend to go much further with this - This is just some very basic stuff. More to follow.

According to the github - [link] you want
16 KB of L1 cache
256 KB of L2 cache
2MB of L3 cache per thread.

The basic output of the tool is like this:

This example is from a Dell Latitude E6230, i7-3450M - 2 core/4 threads @3.5gh boost, 16gb ddr3 [not 100% sure of the speed, heh]
L1 - 128KB
L2 - 512MB
L3 - 4MBNO
Using power usage as displayed in coretemp.

command:

benchmark --mine --init 4 --threads 2 --nonces 10000 --largePages --jit

This produces:

RandomX benchmark
 - full memory mode (2080 MiB)
 - JIT compiled mode
 - hardware AES mode
 - large pages mode
Initializing (4 threads) ...
Memory initialized in 8.70408 s
Initializing 2 virtual machine(s) ...
Running benchmark (10000 nonces) ...
Calculated result: c3f95625df8cfebe179a46dc640b52885c660d56e71cd19ac70f947a2fd401db
Performance: 998.182 hashes per second
  • line 1 is because we have the --mine flag
  • line 2 is because we have the --jit flag [This is essential to test an x86 cpu]
  • line 3 is default because the cpu supports hardware AES
  • line 4 is large pages as apposed to small pages. This makes a big difference. - see below how to enable large pages in windows.
  • line 5 is the amount of threads specified by --init. More detail on this to come.
  • line 6 is the time taken to create the memory scratch pad (i think thats what it is called) - This is directly related to core speed and the --init value.
  • line 7 is the number of mining VM’s created, this is the --threads parameter.
  • line 8 is it showing it is doing something, and how many, this does not update in realtime.
  • line 9 is the resulting hash. For --seed 0 and --nonces 10000 everybody should get this exact hash regardless of hardware or settings.
  • line 10 - speaks for itsself. This number fluctuates by 5% but this is probably more to do with my monitoring software.

How to enable large pages in windows: [vega monero miners already do this]

Enabling largePages
  • Press start and type gpedit, this should bring up the Local Group Policy Editor
  • Windows settings -> security settings -> local policies -> user rights assignment
  • Now scroll down to Lock Pages In Memory
  • Double click it
  • Add the user which will be doing the testing. I normally just add administrotor
  • Click check name, to make sure it is the correct name - if you sign in with an @example.com then you need to add that.
  • Reboot.
  • Dont forget if you only added LOCAL-PC/Administator you will need to run the command prompt as admin :slight_smile:

Basic overview of how to configure your test setup.

You want the --init parameter to be as close to your total cpu thread count. - This reduces the time to initialise the memory. I am still not 100% on how often this needs to be redone.
You want to fiddle with the --threads parameter. A basic guide is 1 per 2mb of L3 but there are some variances I have yet to work out.

A simple yet effective method to test this in windows is to have up the performance monitor and set the cpu graph to be Overall Utilization:
When the Memory is initializing you want it to show 100% usage. (See table 1 for time examples, v usage.)

Table 1

Note: this is only over 1000 nonces so the hashrate will fluctuate a bit. A key take away is > 4 makes no difference.
init 1 - 30% cpu @ 15 watts
init 2 - 60% cpu @ 19 watts
init 3 - 90% cpu @22 watts
init 4 - 100% cpu ??

This only impacts the Initializing time. It seems no impact on hashrate or threads (VM cpu) usage

  • benchmark --mine --init 1 --threads 2 --nonces 1000 --largePages --jit

Initializing (1 thread)
Memory initialized in 21.187 s
Performance: 943.445 hashes per second

  • benchmark --mine --init 2 --threads 2 --nonces 1000 --largePages --jit

Initializing (2 threads)
Memory initialized in 11.595 s
Initializing 2 virtual machine(s) …
Performance: 920.051 hashes per second

  • benchmark --mine --init 3 --threads 2 --nonces 1000 --largePages --jit

Initializing (3 threads)
Memory initialized in 9.68857 s
Performance: 958.648 hashes per second

  • benchmark --mine --init 4 --threads 2 --nonces 1000 --largePages --jit

Initializing (4 threads)
Memory initialized in 8.6624 s
Performance: 980.156 hashes per second
Note the Init speed. even if you increase the init value this number doesnt change.

  • benchmark --mine --init 5 --threads 2 --nonces 1000 --largePages --jit

Initializing (5 threads)
Memory initialized in 8.69043 s
Performance: 963.246 hashes per second
Note: Perf seems a little high, but we are only checking 1000 nonces.

  • benchmark --mine --init 6 --threads 2 --nonces 1000 --largePages --jit

Initializing (6 threads)
Memory initialized in 8.67927 s
Performance: 932.245 hashes per second
Note: Perf stil seems a little high, but we are only checking 1000 nonces.

  • benchmark --mine --init 7 --threads 2 --nonces 10000 --largePages --jit

Initializing (7 threads)
Memory initialized in 8.68968 s
Performance: 948.815 hashes per second

benchmark --mine --init 8 --threads 2 --nonces 10000 --largePages --jit
Initializing (8 threads)
Memory initialized in 8.73109 s
Performance: 995.622 hashes per second

Threads work differently.

Summary:

You get 30% increase in cpu usage with each extra thread (remember this is tested on the above i7) and an increase in hashrate.

2 seems to be the sweet spot now. gives somewhere between 950 - 1000 hashes. Where as 3 seems to give 900 - 950 but at an extra 30% cpu. This is tested over 100000 nonces. 4 thermal throttles down to 3.0 ghz a couple of times. I would try with additional cooling like I was going to before. But since the patches it seems unnecessary.

Table two contains info for threads 1,2,3 and 4.

Table 2
  • benchmark --mine --init 4 --threads 1 --nonces 100000 --largePages --jit

Initializing 1 virtual machine(s)
Running benchmark (100000 nonces)
Calculated result: 9b22794882187000d62c6d2b228fab5e585767aaaa5eb74905b0c7c00fcbdad8
Performance: 606.123 hashes per second

  • benchmark --mine --init 4 --threads 2 --nonces 100000 --largePages --jit

Initializing 2 virtual machine(s)
Running benchmark (100000 nonces)
Calculated result: 9b22794882187000d62c6d2b228fab5e585767aaaa5eb74905b0c7c00fcbdad8
Performance: 964.728 hashes per second

  • benchmark --mine --init 4 --threads 3 --nonces 100000 --largePages --jit

Initializing 3 virtual machine(s)
Running benchmark (100000 nonces)
Calculated result: 9b22794882187000d62c6d2b228fab5e585767aaaa5eb74905b0c7c00fcbdad8
Performance: 945.195 hashes per second

  • benchmark --mine --init 4 --threads 4 --nonces 100000 --largePages --jit

Initializing 4 virtual machine(s)
Running benchmark (100000 nonces)
Calculated result: 9b22794882187000d62c6d2b228fab5e585767aaaa5eb74905b0c7c00fcbdad8
Performance: 869.339 hashes per second
Note: This thermal throttled to 3.0ghz from 3.5 a number of times.

I have been doing much more testing on my ryzen 2600 and from that I can make some basic inferences. I have not tested these thoroughly yet though

  • Core speed is relevant. but not that relevant.
  • DDR4 Memory frequency matters a lot! (I can get 4400+ hashes)
  • Memory timings do matter, but I am not sure as to what extent, but I think more than core frequency.
  • I have lots more testing to do.
  • I have had random lockups when running this tool on ryzen, as in total system freeze - no data dumps, nothing - had to reset the bios to even get into it. I think this was due to my memory timings being really really tight.

Hope this helps someone.

Steve

4 Likes

@Sacsayhuaman

Id love to help :slight_smile:

I have copied this from the ASIC thread to here, I think this is a better place to discuss this stuff.

When you say you are mining at the moment, what are you mining? I am pretty new to RandomX myself though.

If you can provide some more info on:

  • CPU Make/Model
  • Ram Frequency
  • Ram type (ddr3/4)
  • Ram Config (dual channel, etc)
  • Ram timings.
  • Ram Size
  • Bios version and MB version
  • CPU Microcode version.

With that info I can make a guess, and I can also give you some good parameters to get a rough idea with the benchmarking tool.

A lot of this in fact all of it (bar the microcode, that is a strange one to get depending on the cpu/mb) you can get from CPU-Z

If you open up CPU-Z, go to about, click on save as txt file.
Open the text file, then search for these headers

  • Processors Information
  • Thread dumps
  • Chipset
  • Memory SPD

You do not need to put all of the info from each section into your post. I have show the relevant data below.

If you use [details] brackets for the text it becomes expandable.

Here is an example of the report from the laptop above I have done the testing with. If you cannot get this report don’t worry too much but try to answer the first set of questions.

Would you mind running a few tests?

Test i7 Laptop

CPU-Z TXT Report

Processors Information

Processor 1 ID = 0
Number of cores 2 (max 2)
Number of threads 4 (max 4)
Name Intel Core i7 3540M
Codename Ivy Bridge
Specification Intel® Core™ i7-3540M CPU @ 3.00GHz
Package (platform ID) Socket 1023 FCBGA (0x4)
CPUID 6.A.9
Extended CPUID 6.3A
Core Stepping E1/L1
Technology 22 nm
TDP Limit 35.0 Watts
Tjmax 105.0 °C
Core Speed 3487.9 MHz
Multiplier x Bus Speed 35.0 x 99.7 MHz
Stock frequency 3000 MHz
Instructions sets MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX
L1 Data cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L1 Instruction cache 2 x 32 KBytes, 8-way set associative, 64-byte line size
L2 cache 2 x 256 KBytes, 8-way set associative, 64-byte line size
L3 cache 4 MBytes, 16-way set associative, 64-byte line size
FID/VID Control yes

Thread dumps

CPU Thread 0
APIC ID 0
Topology Processor ID 0, Core ID 0, Thread ID 0
Type 01020101h
Max CPUID level 0000000Dh
Max CPUID ext. level 80000008h
Cache descriptor Level 1, D, 32 KB, 2 thread(s)
Cache descriptor Level 1, I, 32 KB, 2 thread(s)
Cache descriptor Level 2, U, 256 KB, 2 thread(s)
Cache descriptor Level 3, U, 4 MB, 16 thread(s)

Chipset

Northbridge Intel Ivy Bridge rev. 09
Southbridge Intel QM77 rev. 04
Memory Type DDR3
Memory Size 16 GBytes
Channels Dual
Memory Frequency 797.3 MHz (1:6)
CAS# latency (CL) 11.0
RAS# to CAS# delay (tRCD) 11
RAS# Precharge (tRP) 11
Cycle Time (tRAS) 28
Command Rate (CR) 1T
Host Bridge 0x0154

MCHBAR I/O Base address 0x0FED10000
MCHBAR I/O Size 19456
MCHBAR registers

Memory SPD

DIMM # 1
SMBus address 0x50
Memory type DDR3
Module format SO-DIMM
Manufacturer (ID) Kingston (7F980000000000000000)
Size 8192 MBytes
Max bandwidth PC3-12800 (800 MHz)
Part number 9905428-422.A00LF
Serial number 0B36C2B8
Manufacturing date Week 18/Year 15
Number of banks 8
Nominal Voltage 1.50 Volts
EPP no
XMP no
AMP no
JEDEC timings table CL-tRCD-tRP-tRAS-tRC @ frequency
JEDEC #1 5.0-5-5-14-19 @ 380 MHz
JEDEC #2 6.0-6-6-16-22 @ 457 MHz
JEDEC #3 7.0-7-7-19-26 @ 533 MHz
JEDEC #4 8.0-8-8-22-30 @ 609 MHz
JEDEC #5 9.0-9-9-24-33 @ 685 MHz
JEDEC #6 10.0-10-10-27-37 @ 761 MHz
JEDEC #7 11.0-11-11-28-39 @ 800 MHz

DIMM # 2
SMBus address 0x52
Memory type DDR3
Module format SO-DIMM
Manufacturer (ID) Kingston (7F980000000000000000)
Size 8192 MBytes
Max bandwidth PC3-12800 (800 MHz)
Part number 9905428-417.A00LF
Serial number 593A5631
Manufacturing date Week 13/Year 15
Number of banks 8
Nominal Voltage 1.35 Volts
EPP no
XMP no
AMP no
JEDEC timings table CL-tRCD-tRP-tRAS-tRC @ frequency
JEDEC #1 5.0-5-5-14-19 @ 380 MHz
JEDEC #2 6.0-6-6-16-22 @ 457 MHz
JEDEC #3 7.0-7-7-19-26 @ 533 MHz
JEDEC #4 8.0-8-8-22-30 @ 609 MHz
JEDEC #5 9.0-9-9-24-33 @ 685 MHz
JEDEC #6 10.0-10-10-27-37 @ 761 MHz
JEDEC #7 11.0-11-11-28-39 @ 800 MHz

I do see lots of potential avenues for hardware optimisation, along with hardsoftware (microcode - the stuff that defines what your cpu does, even L1/L2/L3 latencies) One currently very busy (not on this unfortunately) has a lot of experience with writing microcode and L1/L2 cache optimisation & design.

He is a long link on why microcode could make a massive difference for RandomX

Anyone serious about optimisation of RandomX mining hardware needs to read that. I will spare the link about Reverse Engineering it. (but that is linked in that article.)

Cheers.

4 Likes

A few people have been expressing an interest in helping me test.

I really appreciate it and welcome everyone. To help this though and because I ran into that issue with dotnet - it is 100% a dotnet issue. Im going create a plan going forward, especially if other people are helping.

TL:DR
  • Thank you!
  • Update results for i7 (today)
  • Get full software requirements for testing, so others can join in without fear (1-2 days)
  • Write test methodology so we can all do the same tests and understand the results (1 maybe 2 days)
  • Do testing (1-2days maybe?)
  • I like the details collapsible

I am going to update the OP with my new i7 results and table 2 - Installing 4.8 framework really sped things up and changed a bit on how things work on my i7 - I can get around 950hashes now with 2 threads @ 22cpu watts. (this is on par with vega!)

Then I am going to spend some time on destructive testing in VM’s and some corner case tests. Then I will post a list of what version of what is required, how to check and how to update. This should take a day or so.

I will then be running through a full test methodology (test plan, test cases, formatting of test results, collating and analysis of the results.) for my Ryzen system - I will be posting this too so you can follow and see what is optimum for your setup based of the results you get from the tests. It would be great if you could follow my reporting templates.

Whilst doing the Ryzen testing I am sure I will think of more test cases to add. I am going to create a comprehensive list as I go and people can do them or not. I will not publish a test I have not done myself.

This should hopefully all be done by the end of the week. I was going to automate most of it, but because of the bluescreen, I want to do it manually at least twice.

Arg, I just realised im going to have to do some or all of this 3 times. CPU’S, Vega, 1070 (they use different miners, lol)

GPU testing will be done after cpu.

feel free to prod me if I haven’t updated in a while. It might be life, it might be an tech issue, or most likely it is taking longer than I thought.

Thanks for all the interest, it really helps motivate me.

2 Likes

Bumping thread to let people know the two instances bug is no more - please see the OP for more information. I will edit this message to be the test setup version number on windows.

Kinda dropped windows. It is easier to make a more consistent test environment in ubuntu. Please see the OP for windows config. I will do windows ryzen testing and add that in here.

I will not be doing a standard windows build config. Just update everything and you will be fine.

Use the binaries. or if you want to compile from source (recommended way) follow the Linux instructions below.

2 Likes

Bumping just to let people know that there will be some big updates very soon.

I have been working with two other forum members and we have ironed out a lot of different issues in different setups.

Including

  • NUMA in windows.
  • NUMA through VM’s
  • Basic issues and how to fix them
  • How to optimise results
  • DDR configuration

On top of this I am working on a small usb Linux distro that fires up the benchmarking tool, does a load of tests, collates the results, grabs the averages and timings.

There is also a new version of the tool which has more options and sanity checks. So I will be updating the OP with that info next. Then editing my previous post with the setup’s. (windows desktop & Linux desktop - I will have a separate thread for server stuff and NUMA).

So Im not being lazy, we are just getting a lot of stuff sorted out in private before we post.

And thanks to the two people helping. I wont mention you by name (feel free to state it yourselves though) We are really getting somewhere.

1 Like

Vosk Talks about RandomX https://youtu.be/WQ6aXXhiP4U

1 Like

Ive never really followed his channel, I watched the randomx bit then stopped.

I think a key point he misses, is that the idea is not that you cannot build an ASIC or FPGA, it is you cannot build a single chip version without that chip more or less resembling a cpu. Then you have to develop your own branch predication or just not take that random branch that might happen.

The weakness if there is one will be in the verification/light mode. That looks very doable, even on cheap hardware - are the trade offs worth it? idk. That’s why I am looking into it.

The issue with getting this on an FPGA is l3 cache. - you have to either buy something large and expensive (making a cpu the cheaper more efficient option) or work out some way of dropping ‘off chip latency’ from >200ns to <40ns. This is way beyond my ability, but not that of MiStFGPA. but it is like herding cats at the moment. everyone is so busy with ‘real’ jobs.

That is just one of the mechanisms behind the anti fgpa side of RandomX there are others. (floating point being another)

Thanks for the link.

How long until RandomX activates on Monero? Will be interesting to see how the activation goes. Last fork I think it took a few hours before blocks starting propagating again?

Hopefully everyone will be able to figure out how to run RandomX to keep the chain secure until the dust settles.

Erm, sometime October if all goes to plan, iirc. [code freeze is this month at some point]

Kinda the point of this thread :slight_smile: - will let people know they can process transactions with rubbish 3rd world hardware and still be useful. I know I keep saying this, but I will update the test environment today. and the ryzen stuff. (im just so absorbed in the server side of stuff, so much more interesting)

The desktop UMA (not NUMA) setup for win and Linux with my benchmarks and how to reproduce them yourselves.

To be fair tho I doubt it will make much difference. but if I can get all those old zec miners to dust off their cpu’s for RandomX and GPU’s for ProgPow Id be happy. I did what I could. (including the fpga L3 cache latency solution should it prove economically viable to produce one [a solution has been proposed, I need to mock up the idea - anyone who has worked in this type of stuff knows that datasheets and technical specs may all agree that you can do something but 9/10 in reality you can never get those ideal conditions])

2 Likes

This assumes you are running from windows environment and want to build a clean(ish) Ubuntu setup on a bootable USB stick to test RandomX. (usb 2.0 is fine, just a bit slow. does not impact the benchmark at all.)

A lot of this info is taken from different posts and the readme on the RandomX github
This post will be specifically dealing with desktop hardware. Light hardware, GPU, Server and FPGA to follow. (in that order probably)

The main point is to get a standard test environment set up and run some basic tests for desktops. There will be further detail in the ryzen post.

First we need to create a bootable, ubuntu install with persistence. This lets you play with overclocking and other stuff. It will keep files across reboots.

Download ubuntu desktop 18.04 from:
https://ubuntu.com/download/desktop

(19.04 will work too.)

Download unetbootin 661 from:

https://github.com/unetbootin/unetbootin/releases/download/661/unetbootin-windows-661.exe

  • Grab a USB drive (minimum 8gb) usb 3 highly recommended but not required. It runs fine from usb 2.0
  • Format to FAT32.
  • Launch unetbootin
  • MAKE SURE YOU ARE WRITING TO THE CORRECT DRIVE

Configure like in the image - note set the persistant to be over the size of the usb and it will use the entire disk.

click to see image

Hit Go.
Wait quite a bit. (creating the persistance file can take a little bit of time.)
Now reboot to the usb drive.

Note I have had to manually add “persistent” when booting from UEFI but never when booting from the BIOS. YMMV - follow the instructions below.

Boot to USB from UEFI

If you are using UEFI you will probably have to use the windows reboot feature.
Press the windows key then type

change advanced startup options

Select restart now.
This will bring up an options screen, select boot from the new Ubuntu usb install.
When loading from UEFI press e on the options screen and make sure the kernel line ends with --- persistent

screenshots

This is the options screen

This is what you are looking for/need to add.

For the BIOS

Boot to USB from BIOS

When you restart your PC you should see something like press F12 for boot options.
Press this key when you see the option appear.
Select the USB device with the new ubuntu install
On the Boot screen press the TAB key with the default option highlighted and make sure the end of the kernel line ends with --- persistent

screenshot

This is the boot splash screen you need to press F6 on to check.

Once you have launched the persistent USB install, we need to get everything up to date otherwise all sorts of strangeness happens. it is relatively easy. and takes 10 - 15 minutes with a decent network connection.

You get a much greater hashrate with the latest updates, and they are needed for RandomX to function properly.
Because this is the desktop version there is a lot of stuff that gets updated that we dont need. but might as well do it.

bring up a terminal prompt

ctrl+alt+t

Apply some updates and a few tools.

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install cmake git build-essential

Now we have a working ubuntu install we need to get RandomX and its dependices setup.

This next bit is taken straight from the github. is taken straight from the github readme. you do not need to be familiar with compiling code for this to work. it works as is.

This is also slightly easier and less reliant on someone else making binaries for you. Binaries are available on the release page. I am going to deal with compiling from source. (it really is easier)

git clone https://github.com/tevador/RandomX.git
cd RandomX
mkdir build && cd build
cmake -DARCH=native ..
make

This will produce a bunch of output. (cut from post, due to length) You should not get any error messages. If you do post a message below.

We should move the complied benchmarking tool somewhere else for easier testing.

mkdir ~/rdx
mv ./randomx-benchmark ~/rdx/

Might as well run the tests while we are here and then clean up the build process.(If you move the usb stick to a new pc, with different hardware, you might as well compile it again.)

./randomx-tests
make clean

If any of those tests from running ./randomx-tests fail then please post the output below.

Now back to testing, and make sure everything is working.

cd ~/rdx/ and a quick check that the miner is working. This will be slow, and this will not be representative of your actual hashrate, there are a few more tweaks to do after.

./randomx-benchmark --mine --jit

This will just run 1 thread and 1 core so it will take a while. We are not using largePages either, we will set that in a bit.

results of basic test
RandomX benchmark v1.0.4
 - full memory mode (2080 MiB)
 - JIT compiled mode
 - hardware AES mode
 - small pages mode
Initializing (1 thread) ...
Memory initialized in 24.825 s
Initializing 1 virtual machine(s) ...
Running benchmark (1000 nonces) ...
Calculated result: 38d47ea494480bff8d621189e8e92747288bb1da6c75dc401f2ab4b6807b6010
Reference result:  38d47ea494480bff8d621189e8e92747288bb1da6c75dc401f2ab4b6807b6010
Performance: 395.119 hashes per second

Post the error message below if you get one.

Check that the result it says you should get matches what you do get, if it doesnt something is very wrong. reduce overclock. post below.

Nice, now it is time to configure it. The requirements are:

per mining thread.

to get this info, as well as rough guesses for the --init and --threads value we can use dmidecode - this will produce a page or two of output. the real relevant parts I have posted below. you can scroll up through the terminal window to check all the values.

sudo dmidecode -t processor
The most relevant bits are:
# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x0053, DMI type 4, 42 bytes
Processor Information
	Socket Designation: SOCKET 0
	Signature: Type 0, Family 6, Model 58, Stepping 9
	Version: Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz

	Core Count: 2
	Core Enabled: 2
	Thread Count: 4

Now we look for the amount of L1,2 and 3 the CPU supports.

sudo dmidecode -t cache
cache values
Socket Designation: CPU Internal L2
	Configuration: Enabled, Not Socketed, Level 2
	Operational Mode: Write Through
	Location: Internal
	Installed Size: 512 kB
	Maximum Size: 512 kB


Socket Designation: CPU Internal L1
	Configuration: Enabled, Not Socketed, Level 1
	Operational Mode: Write Through
	Location: Internal
	Installed Size: 128 kB
	Maximum Size: 128 kB

Socket Designation: CPU Internal L3
	Configuration: Enabled, Not Socketed, Level 3
	Operational Mode: Write Back
	Location: Internal
	Installed Size: 4096 kB
	Maximum Size: 4096 kB

So, this cpu has

  • 2 Cores (probably maximum threads)
  • 4 Threads (quicker init time)
  • 128kb L1 (no limit)
  • 512kb L2 (limits to two mining threads)
  • 4MB L3 (limits to two mining threads)

So it should fit perfectly with

–threads 2 --init 4

Lets try.

First enable largePages

sudo sysctl -w vm.nr_hugepages=1250

Now try the benchmark (note 50000 nonces takes roughly 1 minute on this cpu, it will be different on yours, adjust so the benchmark takes 1 minute.)

./randomx-benchmark --mine --jit --largePages --threads 2 --init 4 --nonces 50000

So there isnt much point doing ram testing on this hardware, this is cpu limited.

There is not much point in doing overclocking on this laptop, but we should give it a go, This post is long enough so that will be tomorrow.

Now there is a consistent stable test environment I will do the Ryzen testing in this format, but with more ram timings and overclocking. To try to confirm the speculation in the first post.

I have a template already, but im tired. I will continue this later.

I will also re organise this thread to make it easier to read.