HowTo run PoW GPU Miner

See: HowTo_run_PoW_Miner

The GPU miner has been open-sourced:

https://github.com/VeriBlock/nodecore-pow-cuda-miner
A Community AMD miner: https://github.com/monkins1010/nodecore-pow-cuda-miner
A fork with multiple GPU support: https://github.com/michael-kernel-sanders/nodecore-pow-cuda-miner/releases/tag/v0.3.7-mks

Overview

Get the GPU Miner here: https://github.com/VeriBlock/nodecore-pow-cuda-miner

The GPU miner is much much faster than the the CPU miner. However, it requires a NVidia GPU card to run.

Community Contributions

[DISCLAIMER]: VeriBlock Team has not verified these scripts, they are community contributions:

Scripts and readme

Community member Owelet created this script and package: Community_Resource_Contributions#Preset_GPU_Miner_script_and_readme

Automatically kill the miner every 30 minutes and restart it (Windows):

timeout 1800 seconds = 30 minutes

:BEGIN
Start "" /b VeriBlock_CUDA9.2 -u YOUR_ADDRESS_HERE -o POOL_IP_HERE:8501 -l false -d 0
timeout /T 1800 /nobreak >nul
taskkill /VeriBlock_CUDA9.2.exe /F 
GOTO BEGIN

Prerequisites

The GPU Miner requires an NVidia card (9xx, 10xx, Titan X, Titan XP, Titan V, or corresponding Quadro/Tesla).

It requires a recent version of the NVidia display drivers. It should work on any driver >=391.25, but driver version 397.64 is recommended (download from here: https://www.geforce.com/drivers).

The 0.3.5 version requires >= 411 drivers

Steps to run

Linux

unzip veriblock-nodecore-pow-cuda-0.2.7_1_16_linux.zip
cd veriblock-nodecore-pow-cuda-0.2.7_1_16_linux
chmod a+x nodecore_pow_cuda
(and in a screen session or something else): ./nodecore_pow_cuda -o 127.0.0.1:8501 -u YOUR_ADDRESS_HERE -d 0
(changing -d 0 to -d 1, etc. for multiple cards, one instance per card)

Windows

Setup:

Download and unzip: veriblock-nodecore-pow-cuda-0.2.9_1_16_windows.zip

This contains a detailed README.txt. Please read, as it answers many initial questions. From the instructions in the readme.txt, pick which version you'll run (such as "nvml\cpu_shares\release\sm_50").

Update the run.bat (using Notepad, Notepad++, or some other basic text editor) to contain your address (such as V5bLSbCqj9VzQR3MNANqL13YC2tUep) rather than the dummy placeholder.

VeriBlock.Miner.PoW.exe -o 127.0.0.1:8501 -u V5bLSbCqj9VzQR3MNANqL13YC2tUep -d 0

Run:

Run NodeCore, catch up to the most recent block (See: HowTo_run_NodeCore)
Run NC_CLI and run the "startpool" command (See: NodeCore_CommandLine#startpool)
In a specific folder via instructions in the readme (such as "nodecore-cudapow\nvml\gpu_shares\release\sm_52"), execute "run.bat"
1. If that crashes, then follow the readme instructions to pick another folder. Note: If repeated crashes occur, please submit a Github issue at https://github.com/VeriBlock/nodecore-releases/issues including your OS version (Windows 7, 8, 8.1, 10...), and any other information you deem to be relevant (run as administrator, etc.).

Upon first running, several startup outputs will appear:

Then it will switch to a new screen showing the shared being submitted. The text here is updated in-place, it is not continually appended like a log file.

Help Section

There is a help screen, lets you disable/enable logging and verbose output with command-line options, and lets you tweak performance variables (threads per block and blocksize).
Required Arguments:
-o <poolAddress>           The pool address to mine to in the format host:port
-u <username>              The username (often an address) used at the pool
Optional Arguments:
-p <password>              The miner/worker password to use on the pool
-d <deviceNum>             The ordinal of the device to use (default 0)
-tpb <threadPerBlock>      The threads per block to use with the Blake kernel (default 1024)
-bs <blockSize>            The blocksize to use with the vBlake kernel (default 512)
-l <enableLogging>         Whether to log to a file (default true)
-v <enableVerboseOutput>   Whether to enable verbose output for debugging (default false)

Example command line:
VeriBlock-NodeCore-PoW-CUDA -u VHT36jJyoVFN7ap5Gu77Crua2BMv5j -o 94.130.64.18:8501 -l false

FAQ

What is the performance for each type of GPU card?

See: GPU_Performance

Does the GPU miner accept domain names?

As of 0.3.5 version, yes:

NodeCore-PoW-CUDA -o some-pool-domain.xyz:8501 -u YOUR_VBK_ADDRESS_HERE -d 0

Is there a Linux version?

Yes.

How can I run the miner in a loop?

The default run.bat and run.sh (on github) run in a loop:

Does the GPU miner make the CPU miner obsolete?

In one sense yes... as the GPU miner may be 100x faster than a CPU miner. However, the CPU miner still serves as a reference implementation. It can still join a pool and can submit shares.

What does the file structure mean?

There are many different versions of the miner available, which is explained in the README. Overall, three choices must be made:

1.) NVML, yes/no? NVML allows the program to query lots of device-specific information (clock speed, memory speed, temperature), as well as detailed information regarding your current NVidia driver. Try NVML software first, and if it fails try the non-NVML equivalent.

2.) CPU_SHARES vs GPU_SHARES? The CPU_SHARES version of the GPU miner submits shares at the same difficulty as the CPU miner. Because it is several orders of magnitude faster than the CPU miner, it finds these shares incredibly often. If you want to mine as fast as possible on a CPU-dominated testnet, you will need to run the CPU_SHARES version. However, if you want to see the true performance of the GPU miner in it's "natural habitat" (pool software the GPU-miner-level shares being served normally), run the GPU_SHARES version. In other words, the GPU_SHARES version reflects real-world mainnet performance of the GPU miner, but the CPU_MINER submits many "low difficulty" shares which is in-line with the current CPU-dominated behavior of VeriBlock NodeCore Testnet.

3.) Benchmark or Release? The benchmark version of the software will allow you "mine" even if it cannot connect to a pool. Normally, mining software should not mine unless it can submit the results of mining ("shares") back to the pool and get rewarded. However, some users may wish to run the miner in isolation to test performance, without having to run an instance of NodeCore (of find a public pool). In short, if you are looking to mine VeriBlock testnet coins, use the release version. If you are interested in testing out the performance of particular hardware without actually attempting real mining, use the benchmarking software.

The file structure of the CUDA GPU miner is:

no_nvml
- cpu_shares
  - benchmark
    - sm_50
      - cudart64_91.dll
      - run.bat
      - VeriBlock.Miner.PoW.exe
    - sm_52
    - sm_60
    - sm_61
    - sm_70
  - release
- gpu_shares
  - ...
nvml
- ...
README.txt

Can the GPU run on a CPU pool?

Can I use Hive OS to mine?

See: HiveOS

How to create an address without needing a NodeCore full node

How to estimate PoW mining reward

DISCLAIMER: This is just an estimate, not a guarantee. Network hash could vary.

Community estimates of hash rates from various GPU cards here: https://github.com/VeriBlock/nodecore-pow-cuda-miner/wiki/Performance

One way to estimate mining rewards:

Get your current hashrate from your miner(s)
Get the current network hashrate
1. This is available on the explorer
2. This is also available through the API, which returns a JSON blob with "hashRate" in hashes/second, such as: https://wiki.veriblock.org/index.php?title=Dashboard_API#stats.2Fnetwork
Get the total POW rewards in a 24 hr period: This could be approximated as 2880 blocks per day (on average a block every 30 seconds).

Equation:

Projected PoW reward over 24 hours = (Your Hash) / (Network Hash) * (Network PoW Reward)

Example (your actual hash and network hash may vary!):

Your Hash = 800 MH/s =     819200 H/s
Network Hash = 4TH/s = 4294967296 H/s
Network PoW Reward = 1 block every 30 seconds * 2880 blocks/day = 126000 tVBK/day

Projected PoW reward = (819200 / 4294967296) * 126000
Projected PoW reward = 24.03 tVBK/day

Readme.txt

This is copied from the readme.txt for convenience:

The VeriBlock CUDA GPU miner requires a compatible (9xx, 10xx, Titan X, Titan XP, Titan V, or corresponding Quadro/Tesla) NVidia card, as well as a minimum NVidia driver version of 391.35. The miner can only be compiled against the CUDA 9.1 toolkit, as previous CUDA versions had errors with PTX addition carry.

The GPU miner is run using a .bat file (or can be run directly from the command line). If running from the command line, make sure the required .dll files are present in your working directory.

This miner currently only mines on one GPU at a time. If you wish to mine on multiple GPUs simultaneously, you can run multiple instances of the miner (with a different -d flag set for each). By default, the miner mines on GPU 0.

-------------------------------------------------------------------------------------
| If unsure which version to choose (details below), the most sure-fire one is: |
| no_nvml/cpu_shares/release/sm_50 |
-------------------------------------------------------------------------------------

There are many different versions of the miner available:

First-level choice (don't know? choose "no_nvml"):
nvml: If on, uses the NVML library to query the current statistics of your device (clock speed, memory, temperature)
no_nvml: Does not attempt to query device stats.

Second-level choice (don't know? choose "cpu_shares"):
cpu_shares: Shares are submitted at the same difficulty as the CPU miner does (2^24)
gpu_shares: Shares are submitted at the GPU difficulty (2^32) **NOTE: A known display bug causes the gpu versions to still show 2^24 as the share difficulty in the logs, this can be ignored!** ALSO NOTE: this version will mine far less than expected as long as the difficulty stays below an equivalent 2^32 share target!

Third-level choice (don't know? choose "release"):
release: This version will only run when a pool connection is available, and will exit if the pool disconnects
benchmark: This version will attempt to connect to a pool, but if a pool is unavailable, it will mine with dummy header data, allowing benchmarking without requiring a pool

Fourth-level choice (don't know? choose "sm_50"):
sm_50: For: 750 Ti, 750, 950M, 960M, 930M, 8xxM, M10, K1200, K2200, K620, M2000M, M1000M, M6500M, K620M, M10 GPUs
sm_52: For: Titan X, 980 Ti, 980, 970, 960, 950, 980M, 970M, 965M, M6000, M5000, M4000, M2000, M5500, M5000M, M4000M, M3000M, M4, M40, M6, M60
sm_60: GP100, P100
sm_61: Titan Xp, Titan X, GTX 1080 Ti, 1080, 1070 Ti, 1070, 1060, 1050 GTX 1050, P6000, P5000, P4000, P2000, P1000, P600, P400, P5000M, P4000M, P3000M, P40, P6, P4
sm_70: Titan V, V100
**Note: you can also use a version lower than that of your device, for example a 1080 Ti can use sm_50, sm_52, sm_60, and sm_61! You can try out different versions if you would like.**

Once traversing the appropriate folders to get to the version of the miner you wish to run, please edit the .bat file (input your own address, and optionally a different pool address if you are not solo-mining locally). In order to solo-mine, you will need a running instance of NodeCore, and will need to have typed "startpool" in the CLI to start the pool service. If you are mining to a pool operated by a third party, they will provide the appropriate connection (host, port, etc.).

A log file "cuda-miner.log" will be created when you run the GPU miner. If you have any issues with the GPU miner, please check that for information regarding the issue (and include relevant parts of the log if you submit a bug report or ask for help with troubleshooting).

FAQ:
1.) Does it support Linux?
The code itself does support Linux, but only Windows binaries are being provided at the moment. At launch, instructions for compiling and running on Ubuntu will be available.

2.) Why does it flash?
The code clears the terminal and re-prints it on each update.

3.) Why does the cpu_shares version flash far more than the gpu_shares version?
The cpu_shares version of the miner allows 256x the number of shares to be returned from the GPU as the gpu_shares version, and so it updates far more often.

4.) Why does the cpu_shares version find more shares than the gpu_shares version?
In PoW mining, the goal is to find a header for which the hash falls below a given target. Shares are solutions which fall below a higher target (and are thus found more often, and only very few shares are valid block solutions). The current pool implementation in NodeCore serves shares that are appropriate for CPU miners. If a GPU miner were to only search for gpu-difficulty shares on a pool serving CPU shares, it would be throwing away a lot of shares that the pool would accept as valid (and compensate the miner for). The gpu_shares version only searches for shares that are 256x more difficult than the CPU miner.

5.) Why does the cpu_shares version have a lower hashrate?
The cpu_shares version finds far more valid shares, and must return these from the GPU back to the system (and eventually, submit them back to the pool). This constant interruption means that the GPU is not fully utilized, but instead spends significant time idle while valid CPU-difficulty shares are copied back to the host system. Optimziations could be made to avoid this problem, but by the time GPU miners will need to run at their full speed, they will be mining to pools which share GPU-difficulty shares, making this problem irrelevant.

6.) Why does the gpu_shares version seem to earn far less than the cpu_shares version?
The gpu_shares version only submits shares that are at a difficulty 256x higher than that of CPU shares. As long as the network difficulty is less than a single GPU share, the gpu_shares version of the miner is throwing away perfectly good results.

7.) [GPU Miner is now full-speed, not throttled anymore]
Is this version throttled?
Yes, this version only produces shares ~0.4% of the time (which makes GPUs competitive with CPUs on testnet). This has been done to avoid significant difficulty swings on testnet, and to allow CPU miners to also participate in the testnet (mine coins to send around, etc.). Our final version will not be neutered.

8.) I keep getting "cudaDeviceSynchronize returned error code 4 after launching grindNonces!"?
This is generally associated with the miner requesting too many resources. Check whether other processes are running and using a significant portion of the GPU's resources. Also note that the startup parameters in the mining software are set to cater to cards normally purchased for mining, and so mid-range to low-range cards (like the GTX 1050, GTX 1030, GTX 950) may have insufficient resources. We may provide builds in the future with less aggressive launch parameters to target these cards.

Troubleshooting

Check the log file

The GPU miner outputs to a log file, such as "cuda-miner.log". If you encounter errors, this log file should have information to help troubleshoot.

Graphics Card is not working

Ensure that your system has an NVidia GPU card. If this is Windows (only Windows is supported for the initial CUDA miner release), you should be able to see NVidia in your Device Manager, and open up the NVidia Control Panel (an item in "Control Panel"). If you cannot open the "NVidia Control Panel", and it does not have at least a GTX 9xx or 10xx GPU (or Titan, Quadro, or Tesla card utilizing the Maxwell, Pascal, or Volta architecture) with driver version 397... then solve that first before proceeding.

If you get any of the following errors, then most likely your driver does not meet the criteria mentioned above.

addKernel launch failed: no kernel image is available for execution on the device
grindNonces failed!

CUDA encountered an error while setting the device to 0: 35
Last error: CUDA driver version is insufficient for CUDA runtime version
cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?grindNonces failed!

The GPU miner does not show up on a pool

Make sure you can connect the GPU Miner to a GPU pool, not a CPU pool.

Appendix

TODO