Prospectors, Miners and 49er's - Part 2: Dual GPU-CPU Mining on the XU4/MC1/HC1/HC2

GPU TUNING

Last month’s article introduced Dual GPU-CPU mining on the Odroid XU4/MC1/HC1/HC2. This month we'll update the community on the progress of improvements to the original work and discuss some basic GPU tuning. The removal of all the OpenCL AMD dependencies and INTEL assembler for the OpenCL kernels and crypto algorithms is now complete for sgminer-arm. Genesis Mining also recently released a new version of sgminer-gm 5.5.6. Those changes have been incorporate into the completed newly released sgminer 5.5.6-ARM-RC1. Here is brief summary of the kernels and crypto algorithms that were modified for the Odroid and the test results.

ocl/build_kernel.c

algorithm/cryptonight.c

- INTEL assembler optimizations

algorithm/neoscrypt.c

- AMD architecture optimizations

kernel/cryptonight.cl

- AMD OpenCL extensions

kernel/equihash.cl

- AMD OpenCL extensions and AMD architecture optimizations

kernel/ethash.cl

- AMD OpenCL extensions

kernel/ethash-genoil.cl

- AMD architecture optimizations

kernel/ethash-new.cl

- AMD architecture optimizations

kernel/lyra2re.cl

- AMD OpenCL extensions

kernel/lyra2rev2.cl

- AMD OpenCL extensions

kernel/whirlpoolx.cl

- AMD architecture optimizations

kernel/wolf-aes.cl

- AMD OpenCL extensions

kernel/wolf-skein.cl

- AMD OpenCL extensions

Choices had to be made about specific coin algorithms and OpenCL kernels that had architecture specific setting (not AMD extensions) as indicated. 70% of the OpenCL kernels share one or more of the same AMD OpenCL extensions, that were modified and tested with the cryptonight kernel, which also uses 2 OpenCL helper kernels (wolf-aes.cl and wolf-skein.cl). It appears that ethash-new.cl is not used for any coins which would leave only 2 unproven in anyway, whirlpoolx.cl and ethash-genoil.cl. The others had only AMD and or Nvidia architecture optimizations that were removed. The most conservative approach possible was used in modifications so they would run on a wide range of current and future GPU's, but there is always room for technical and human error. The sgminer-arm implementation should be CPU and GPU agnostic which raises the possibility for adding some ARM-Mali optimization based on specific architectures (ARMv7, ARMv8, Mali-T628, Mali-T860) in the future.

Tuning the GPU

When first trying to figure out what the settings should be for a coin you haven't mined, start very conservatively with all of the setting and work your way up using trial and error until it starts to fail or the performance starts to drop. Here are some settings that are a good place to start:

./sgminer -k algorithm -o stratum+tcp://your.pool.com:3333 -u user -p password -I 3 -w 32 -d 0,1 --thread-concurrency 8192
Keep in mind that on all ARM-Mali SOC, the GPU shares the main memory with the CPU so there is a dynamic between the two when your dual mining as well. That is why generally you loss some performance dual CPU/GPU compared to only mining on the CPU.

When you start sgminer-arm and the settings are wrong or too much for the GPU, it can manifest itself in a lot of different ways. The OpenCL kernel can crash, hang or indicate different error messages. Below is a typical error for a GPU tuning problem. It couldn't build the OpenCL kernel.

[19:28:21] Error -6: Creating Kernel from program.  (clCreateKernel)
[19:28:21] Failed to init GPU thread 1, disabling device 1
Another common error message is that the OpenCL kernel is trying to allocate more memory then it has available. These both indicate one or more of the GPU settings need to be reduced(Intensity, Work Size, Number of threads or Thread-concurrency).
[19:28:16] Maximum buffer memory device 0 supports says 522586112
[19:28:16] Your settings come to 536870912
When Dual Mining, get the GPU tuned and running by itself and then get the CPU tuned and running by itself. Then try to run both together but expect to adjust them again accordingly (usually the CPU). Most CPU mining software is going to try and use every system resource it can. You may have to manually set parameters for the CPU miner instead of letting it choose. Likewise, there are situations where there is such tight memory usage that any other normal process trying to start may cause a system problem (crash, errors etc). For CPU mining, none of the HK Image releases use swap by default so Hugh Pages can't be enabled. So in general, there should be similar performance regardless of the CPU miner software your using.

Cooling, Power Utilization and System Monitoring

You have to monitor CPU temperatures while tuning until you know what your mining rig setup is capable of with the crypto-algorithms your using. System damage can and is likely to occur without adequate cooling while mining! Generally speaking, OEM cooling is not sufficient without significantly reducing the CPU frequency. It is one of many reasons a system can crash or reboot while single or dual mining. Even small ambient temperature changes can significantly impact your system and cause damage. Monitor the ambient and system temperatures on a regular basis while mining. Use watchtemp.sh if you don't have some other means.

watchtemp.sh
#!/bin/bash
z=0
echo "T, Freq4,   Freq5,   Freq6,   Freq7,   T4, T5, T6, T7, TGPU"

while true :
do
     fa=`cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq`
     fb=`cat /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq`
     fc=`cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq`
     fd=`cat /sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq`
     s1=`cat /sys/devices/virtual/thermal/thermal_zone0/temp`
     s1t=$(($s1/1000))
     s2=`cat /sys/devices/virtual/thermal/thermal_zone1/temp`
     s2t=$(($s2/1000))
     s3=`cat /sys/devices/virtual/thermal/thermal_zone2/temp`
     s3t=$(($s3/1000))
     s4=`cat /sys/devices/virtual/thermal/thermal_zone3/temp`
     s4t=$(($s4/1000))
         g1=`cat /sys/devices/virtual/thermal/thermal_zone4/temp`
     g1t=$(($g1/1000))

     echo $z, $fa, $fb, $fc, $fd, $s1t, $s2t, $s3t, $s4t, $g1t
     sleep 2
     (( z += 2 ))
done
No power utilization testing has been, only thermal testing. See last months article for a preliminary thermal test. Lots of resource are used simultaneously while dual mining and some crypto-algorithms use considerable more power than others. For example, scrypt2(VRM) mining uses approximately 20-25% more power then cryptonight(Monero) algorithm. Monitor or do a power utilization study for a better understanding of ARM-Mali power usage before or while dual mining different crypto-algorithms.

When dual mining be conservative so you allow the rest of the OS to function, keep an eye on temperature and be aware of power usage until you prove out both the CPU and GPU configurations and then you can lean into it more. Dual mining pushes these system to the limits. This is a new frontier for ARM SBCs, so keep in mind you are on the sharp edge of extreme system utilization.

Crypotnight (Monero Coin) testing on an Odroid-XU4 GPU only:

sgminer 5.5.6-ARM-RC1 - Started: [2018-03-13 03:06:15] - [0 days 12:38:48]
--------------------------------------------------------------------------------
(5s):22.52 (avg):23.85h/s | A:700000  R:10000  HW:48  WU:0.187/m
ST: 1  SS: 0  NB: 421  LW: 48993  GF: 21  RF: 0
Connected to pool.supportxmr.com (stratum) diff 5K as user 49cbPdjG8RUFjWau2aR9gR1bU6fsP7eGBfaXVsQuFtLrPrZkGpC4AuCEJsuKX
Block: e368fdd9...  Diff:905.5  Started: [15:44:30]  Best share: 325K
--------------------------------------------------------------------------------
[P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
GPU 0:                |  13.41/ 13.40h/s | R:  2.6% HW:24 WU:0.103/m I: 7
GPU 1:                |  10.45/ 10.45h/s | R:  0.0% HW:24 WU:0.084/m I: 7
--------------------------------------------------------------------------------
[13:41:07] Accepted 035e18c0 Diff 19.5K/5K GPU 0
[13:54:00] Accepted 058d63f3 Diff 11.8K/5K GPU 0
[13:55:45] Accepted 0b3c0178 Diff 5.83K/5K GPU 0
[14:04:37] Accepted 054a51d3 Diff 12.4K/5K GPU 1
[14:15:02] Accepted 014287d0 Diff 52K/5K GPU 1
[14:19:56] Accepted 018036d4 Diff 43.7K/5K GPU 1
[14:20:16] pool.supportxmr.com stale share detected, submitting (user)
[14:20:16] Accepted 052596c8 Diff 12.7K/5K GPU 0
[14:36:38] Stratum connection to pool.supportxmr.com interrupted
[14:40:09] Stratum connection to pool.supportxmr.com interrupted
[14:42:19] Accepted 0b9ad8d2 Diff 5.65K/5K GPU 0
[14:44:23] Accepted 04b3a1c8 Diff 13.9K/5K GPU 0
[14:44:43] Accepted 011e41a0 Diff 58.6K/5K GPU 0
[14:54:26] pool.supportxmr.com stale share detected, submitting (user)
[14:54:26] Accepted 060914e6 Diff 10.9K/5K GPU 0
[14:57:14] pool.supportxmr.com stale share detected, submitting (user)
[14:57:14] Accepted 0a2b9f80 Diff 6.44K/5K GPU 1
[15:04:20] pool.supportxmr.com stale share detected, submitting (user)
[15:04:20] Accepted 076a4aeb Diff 8.84K/5K GPU 0
[15:05:37] Accepted 09d5465e Diff 6.66K/5K GPU 0
[15:10:32] Accepted 0a066760 Diff 6.54K/5K GPU 1
[15:16:15] pool.supportxmr.com stale share detected, submitting (user)
[15:16:16] Accepted 06082f75 Diff 10.9K/5K GPU 1
[15:18:06] pool.supportxmr.com stale share detected, submitting (user)
[15:18:06] Accepted 0ce5243a Diff 5.08K/5K GPU 1
[15:18:30] Accepted ccf47695 Diff 81.9K/5K GPU 1
[15:30:46] Accepted 0857db6a Diff 7.86K/5K GPU 0
[15:30:57] Accepted 090f5ea6 Diff 7.23K/5K GPU 1
[15:31:34] Accepted 071e4b0f Diff 9.21K/5K GPU 1
[15:38:34] Accepted 0a0c5007 Diff 6.52K/5K GPU 0
[15:44:56] Accepted 88acc80f Diff 123K/5K GPU 0
The test was run for more than 12 hours and everything ran smoothly. The summary shows 2 actual rejected shares I have a relatively slow Internet connection, so any Stratum server disconnects, and purported stale shares are not unusual.
Summary of runtime statistics:

[15:46:02] Started at [2018-03-13 03:06:15]
[15:46:02] Pool: stratum+tcp://pool.supportxmr.com:3333
[15:46:02] Runtime: 12 hrs : 39 mins : 47 secs
[15:46:02] Average hashrate: 0.0 Kilohash/s
[15:46:02] Solved blocks: 0
[15:46:02] Best share difficulty: 325K
[15:46:02] Share submissions: 142
[15:46:02] Accepted shares: 140
[15:46:02] Rejected shares: 2
[15:46:02] Accepted difficulty shares: 700000
[15:46:02] Rejected difficulty shares: 10000
[15:46:02] Reject ratio: 1.4%
[15:46:02] Hardware errors: 48
[15:46:02] Utility (accepted shares / min): 0.18/min
[15:46:02] Work Utility (diff1 shares solved / min): 0.19/min

[15:46:02] Stale submissions discarded due to new blocks: 0
[15:46:02] Unable to get work from server occasions: 21
[15:46:02] Work items generated locally: 49055
[15:46:02] Submitting work remotely delay occasions: 0
[15:46:02] New blocks detected on network: 421

[15:46:02] Summary of per device statistics:

[15:46:02] GPU0                | (5s):13.48 (avg):13.40h/s | A:380000 R:10000 HW:24 WU:0.103/m
[15:46:02] GPU1                | (5s):10.45 (avg):10.45h/s | A:320000 R:0 HW:24 WU:0.084/m
[15:46:02]
Figure 1 - The pool results verify the summary results

With the increasing modification of sgminer a git was setup for ease of use and future modification. Likewise, the installation process has changed and no longer requires any AMD_SDK, only the ARM Computer Vision and Machine Learning library. Below is the new procedure.

Download and install the latest ARM Computer Vision and Machine Learning library from https://github.com/ARM-software/ComputeLibrary/releases. Note that they have separated the Linux and Android libraries so that it now fits on a 8GB SD card. Use the following command to extract the files:

$ tar -xvzf `filename to extract`
Next, install the dependencies and copy the OpenCL headers:
$ apt-get install automake autoconf pkg-config libcurl4-openssl-dev libjansson-dev libssl-dev libgmp-dev make g++ git libgmp-dev libncurses5-dev libtool opencl-headers mali-fbdev
$ cp ./arm_compute-v18.03-bin-linux/include/CL/* /usr/include/CL/
Download sgminer-5.5.6-ARM-RC1 with the following command:
$ git clone https://github.com/hominoids/sgminer-arm
Then, compile the source code:
$ cd sgminer-arm
$ git submodule init
$ git submodule update
$ autoreconf -fi
$ CFLAGS="-Os -Wall -march=native -std=gnu99 -mfpu=neon" ./configure --disable-git-version --disable-adl --disable-adl-checks
You can optionally use the following command to be more explicit as to where you placed the library and headers:
$ CFLAGS="-Os -Wall -march=native -std=gnu99 -mfpu=neon -I/opt/arm_compute-v18.03-bin-linux/include/CL" LDFLAGS="-L/opt/arm_compute-v18.03-bin-linux/lib/linux-armv7a-neon-cl" ./configure --disable-git-version --disable-adl --disable-adl-checks

$ make -j5
Here is the script and settings used for the testing of XMR-Monero coin using the cryptonight algorithm:
#!/bin/bash

export GPU_FORCE_64BIT_PTR=1
export GPU_USE_SYNC_OBJECTS=1
export GPU_MAX_ALLOC_PERCENT=100
export GPU_SINGLE_ALLOC_PERCENT=100
export GPU_MAX_HEAP_SIZE=100

./sgminer -k cryptonight -o stratum+tcp://pool.supportxmr.com:3333 -u username -p password -I 7 -w 32 -d 0,1 --thread-concurrency 8192 --monero --pool-no-keepalive
The ODROID Community now has the only multi-algorithm Linux ARM-Mali OpenCL GPU miner that I'm aware of in the crypto community! Remember to check the forum for more information and updates at https://forum.odroid.com/viewtopic.php?f=98&t=29571.

Be the first to comment

Leave a Reply