I ran my model just on my CPU and it was taking ~570s per epoch.
On home built tensorflow it was taking 735 (i only had two datapoints, and I dont' think I had configured it).
With GPU it was taking 1009s
Running like that I and viewing nvidia-smi I get:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 8% 62C P2 44W / 151W | 1291MiB / 8116MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1322 G /usr/lib/xorg/Xorg 26MiB |
| 0 1364 G /usr/bin/gnome-shell 51MiB |
| 0 1636 G /usr/lib/xorg/Xorg 413MiB |
| 0 1755 G /usr/bin/gnome-shell 224MiB |
| 0 3112 G ...quest-channel-token=4965395672649938349 160MiB |
| 0 19665 G ...than/anaconda3/envs/p36TFGJT/bin/python 3MiB |
| 0 19782 C ...than/anaconda3/envs/p36TFGJT/bin/python 399MiB |
+-----------------------------------------------------------------------------+
htop gives me:
1 [|||| 3.3%] 4 [||||| 4.6%] 2 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 66.4%] 5 [|||| 3.3%] 3 [|||||||||||||||||||||||||||||||||| 38.5%] 6 [|||| 3.3%] Mem[|||||||||||||||||||||||||||||||| 5.82G/31.3G] Tasks: 207, 1066 thr; 2 running Swp[ 0K/2.00G] Load average: 1.00 2.10 3.08 Uptime: 19:21:32 PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 19782 me 20 0 18.8G 1785M 519M S 105. 5.6 0:48.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 19922 me 20 0 18.8G 1785M 519M R 99.0 5.6 0:40.12 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 1755 me 20 0 4254M 435M 96436 S 3.9 1.4 5:13.84 /usr/bin/gnome-shell 19665 me 20 0 3068M 233M 107M S 2.0 0.7 0:04.11 /anaconda3/envs/p36TFGJT/bin/python /home/jonathan/anaconda3/envs/p36TFGJT/bin/spyder 1636 root 20 0 619M 170M 85348 S 1.3 0.5 3:09.35 /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -background none -noreset -keeptty -verbose 3 2293 me 20 0 797M 46732 28624 S 1.3 0.1 0:05.04 /usr/lib/gnome-terminal/gnome-terminal-server 19893 me 20 0 18.8G 1785M 519M S 1.3 5.6 0:00.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 19898 me 20 0 18.8G 1785M 519M S 1.3 5.6 0:00.30 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 3072 me 20 0 1526M 324M 120M S 0.7 1.0 10:02.91 /opt/google/chrome/chrome 18139 me 20 0 41792 5956 3880 R 0.7 0.0 0:16.52 htop 3112 me 20 0 786M 270M 99900 S 0.7 0.8 4:48.11 /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=4918414677377325872,16218879521054795833,131072 --gpu-preference 19895 me 20 0 18.8G 1785M 519M S 0.7 5.6 0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 19896 me 20 0 18.8G 1785M 519M S 0.7 5.6 0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 19820 me 20 0 18.8G 1785M 519M S 0.7 5.6 0:00.22 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 19897 me 20 0 18.8G 1785M 519M S 0.7 5.6 0:00.83 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 1993 me 20 0 789M 38508 29340 S 0.7 0.1 3:39.50 psensor 19894 me 20 0 18.8G 1785M 519M S 0.7 5.6 0:00.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 3578 me 20 0 816M 163M 71896 S 0.7 0.5 2:19.69 /opt/google/chrome/chrome --type=renderer --field-trial-handle=4918414677377325872,16218879521054795833,131072 --service-pipe-toke 1642 root 20 0 619M 170M 85348 S 0.7 0.5 0:04.79 /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -background none -noreset -keeptty -verbose 3 19892 me 20 0 18.8G 1785M 519M S 0.7 5.6 0:00.07 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 19891 me 20 0 18.8G 1785M 519M S 0.0 5.6 0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json 3240 me 20 0 763M 104M 64516 S 0.0 0.3 1:32.81 /opt/google/chrome/chrome --type=renderer --field-trial-handle=4918414677377325872,16218879521054795833,131072 --service-pipe-toke 19785 me 20 0 3068M 233M 107M S 0.0 0.7 0:00.17 /anaconda3/envs/p36TFGJT/bin/python /home/jonathan/anaconda3/envs/p36TFGJT/bin/spyder 3093 me 20 0 1526M 324M 120M S 0.0 1.0 3:25.67 /opt/google/chrome/chrome F1Help F2Setup F3SearchF4FilterF5Tree F6SortByF7Nice -F8Nice +F9Kill F10Quit
I don't seem to be stressing my system, particularly. I want to stress it, it can work harder, dammit.
I need to read this:
https://stackoverflow.com/questions/41948406/why-is-my-gpu-slower-than-cpu-when-training-lstm-rnn-models
But also, this: https://medium.com/@joelognn/improving-cnn-training-times-in-keras-7405baa50e09
Has a quick fix (I like quick fixes!).
In my .fit_generator add:
use_multiprocessing=True, workers = 6,
So let's try that. (The number of workers recommended is actually higher, but that is on Ryzen 7 with 16 threads available, I have an I5 with six threads available).
Lets try this.
nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 8% 62C P2 47W / 151W | 1303MiB / 8116MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1322 G /usr/lib/xorg/Xorg 26MiB |
| 0 1364 G /usr/bin/gnome-shell 51MiB |
| 0 1636 G /usr/lib/xorg/Xorg 413MiB |
| 0 1755 G /usr/bin/gnome-shell 224MiB |
| 0 3112 G ...quest-channel-token=4965395672649938349 172MiB |
| 0 20260 G ...than/anaconda3/envs/p36TFGJT/bin/python 3MiB |
| 0 20376 C ...than/anaconda3/envs/p36TFGJT/bin/python 399MiB |
+-----------------------------------------------------------------------------+
Definitely seems to be using more GPU
htop was hard to get a view of, because it was hammering all cores :) That's what I like to see.
Epoch time was 198s
However, keras does give me a warning:
UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
UserWarning('Using a generator with `use_multiprocessing=True`'
Hmm, is this really faster, or just doing not so great a job?
Let's try a real run with a lot more epochs, and see what results I get. 1 3/4 Hours for 25 Epochs. Not too bad.
I really do need to start tweaking my model now though. Something for later, I guess.