31 March 2019

Speeding up my CNN, on GPU

The story so far: I now have a new PC, self-compiled tensorflow gpu, and a working environment. I have some RGB leds in my PC and on my keyboard (speeds everything up, this is important).

I ran my model just on my CPU and it was taking ~570s per epoch.

On home built tensorflow it was taking 735 (i only had two datapoints, and I dont' think I had configured it).

With GPU it was taking 1009s

Running like that I and viewing nvidia-smi I get:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  8%   62C    P2    44W / 151W |   1291MiB /  8116MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1322      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      1364      G   /usr/bin/gnome-shell                          51MiB |
|    0      1636      G   /usr/lib/xorg/Xorg                           413MiB |
|    0      1755      G   /usr/bin/gnome-shell                         224MiB |
|    0      3112      G   ...quest-channel-token=4965395672649938349   160MiB |
|    0     19665      G   ...than/anaconda3/envs/p36TFGJT/bin/python     3MiB |
|    0     19782      C   ...than/anaconda3/envs/p36TFGJT/bin/python   399MiB |
+-----------------------------------------------------------------------------+
htop gives me:
  1  [||||                                                                              3.3%]   4  [|||||                                                                             4.6%]
  2  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                       66.4%]   5  [||||                                                                              3.3%]
  3  [||||||||||||||||||||||||||||||||||                                               38.5%]   6  [||||                                                                              3.3%]
  Mem[||||||||||||||||||||||||||||||||                                           5.82G/31.3G]   Tasks: 207, 1066 thr; 2 running
  Swp[                                                                              0K/2.00G]   Load average: 1.00 2.10 3.08 
                                                                                                Uptime: 19:21:32

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
19782 me   20   0 18.8G 1785M  519M S 105.  5.6  0:48.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19922 me   20   0 18.8G 1785M  519M R 99.0  5.6  0:40.12 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 1755 me   20   0 4254M  435M 96436 S  3.9  1.4  5:13.84 /usr/bin/gnome-shell
19665 me   20   0 3068M  233M  107M S  2.0  0.7  0:04.11 /anaconda3/envs/p36TFGJT/bin/python /home/jonathan/anaconda3/envs/p36TFGJT/bin/spyder
 1636 root       20   0  619M  170M 85348 S  1.3  0.5  3:09.35 /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -background none -noreset -keeptty -verbose 3
 2293 me   20   0  797M 46732 28624 S  1.3  0.1  0:05.04 /usr/lib/gnome-terminal/gnome-terminal-server
19893 me   20   0 18.8G 1785M  519M S  1.3  5.6  0:00.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19898 me   20   0 18.8G 1785M  519M S  1.3  5.6  0:00.30 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 3072 me   20   0 1526M  324M  120M S  0.7  1.0 10:02.91 /opt/google/chrome/chrome
18139 me   20   0 41792  5956  3880 R  0.7  0.0  0:16.52 htop
 3112 me   20   0  786M  270M 99900 S  0.7  0.8  4:48.11 /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=4918414677377325872,16218879521054795833,131072 --gpu-preference
19895 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19896 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19820 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.22 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19897 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.83 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 1993 me   20   0  789M 38508 29340 S  0.7  0.1  3:39.50 psensor
19894 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 3578 me   20   0  816M  163M 71896 S  0.7  0.5  2:19.69 /opt/google/chrome/chrome --type=renderer --field-trial-handle=4918414677377325872,16218879521054795833,131072 --service-pipe-toke
 1642 root       20   0  619M  170M 85348 S  0.7  0.5  0:04.79 /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -background none -noreset -keeptty -verbose 3
19892 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.07 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19891 me   20   0 18.8G 1785M  519M S  0.0  5.6  0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 3240 me   20   0  763M  104M 64516 S  0.0  0.3  1:32.81 /opt/google/chrome/chrome --type=renderer --field-trial-handle=4918414677377325872,16218879521054795833,131072 --service-pipe-toke
19785 me   20   0 3068M  233M  107M S  0.0  0.7  0:00.17 /anaconda3/envs/p36TFGJT/bin/python /home/jonathan/anaconda3/envs/p36TFGJT/bin/spyder
 3093 me   20   0 1526M  324M  120M S  0.0  1.0  3:25.67 /opt/google/chrome/chrome
F1Help  F2Setup F3SearchF4FilterF5Tree  F6SortByF7Nice -F8Nice +F9Kill  F10Quit


I don't seem to be stressing my system, particularly. I want to stress it, it can work harder, dammit.

I need to read this:
https://stackoverflow.com/questions/41948406/why-is-my-gpu-slower-than-cpu-when-training-lstm-rnn-models

But also, this: https://medium.com/@joelognn/improving-cnn-training-times-in-keras-7405baa50e09
Has a quick fix (I like quick fixes!).

In my .fit_generator add:


use_multiprocessing=True,
workers = 6,

So let's try that. (The number of workers recommended is actually higher, but that is on Ryzen 7 with 16 threads available, I have an I5 with six threads available).


Lets try this.


nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  8%   62C    P2    47W / 151W |   1303MiB /  8116MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1322      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      1364      G   /usr/bin/gnome-shell                          51MiB |
|    0      1636      G   /usr/lib/xorg/Xorg                           413MiB |
|    0      1755      G   /usr/bin/gnome-shell                         224MiB |
|    0      3112      G   ...quest-channel-token=4965395672649938349   172MiB |
|    0     20260      G   ...than/anaconda3/envs/p36TFGJT/bin/python     3MiB |
|    0     20376      C   ...than/anaconda3/envs/p36TFGJT/bin/python   399MiB |
+-----------------------------------------------------------------------------+


Definitely seems to be using more GPU
 htop was hard to get a view of, because it was hammering all cores :) That's what I like to see.

Epoch time was 198s

However, keras does give me a warning:
UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
  UserWarning('Using a generator with `use_multiprocessing=True`'

Hmm, is this really faster, or just doing not so great a job?

Let's try a real run with a lot more epochs, and see what results I get. 1 3/4 Hours for 25 Epochs. Not too bad.

I really do need to start tweaking my model now though. Something for later, I guess.

No comments:

Post a Comment

And now for a little Cthulhu

I decided to have a little play with word clouds... I found a nice wordcloud library ( https://github.com/amueller/word_cloud ), and a com...