Playing with machine learning

15 May 2019

And now for a little Cthulhu

I decided to have a little play with word clouds...

I found a nice wordcloud library (https://github.com/amueller/word_cloud), and a complete works of Lovecraft. What could be better :)

I took some liberties (i did some text replacing), so this is a biased wordcloud.

The code is mostly copied from the examples for the library itself.

 from wordcloud import WordCloud  
 import matplotlib.pyplot as plt  
 from PIL import Image  
 import numpy as np  
  
 sFile = 'The Complete Works of H.P. Lovecraft.txt'
 iMap = 'Cthulhu.jpg'  

 lDoc = []  
 with open(sFile,'r') as f:  
   sDoc = f.readlines()  
 for s in sDoc:  
   if s.strip() != '':  
     lDoc.append(s.lower())  
 sDoc = []  
 text = ' '.join(lDoc)  
 for l in lReplace:  
   text = text.replace(l[0],l[1])  
 mask = np.array(Image.open(iMap))  
 wc = WordCloud(background_color="white", max_words=2000, mask=mask,  
         contour_width=3, contour_color='steelblue')  
 # generate word cloud  
 wc.generate(text)  
 # store to file  
 wc.to_file("cthulhu.png")  
 # show  
 plt.imshow(wc, interpolation='bilinear')  
 plt.axis("off")  
 plt.figure()  
 plt.imshow(mask, cmap=plt.cm.gray, interpolation='bilinear')  
 plt.axis("off")  
 plt.show()

6 April 2019

I'm a Mac user at home, since 10.3 / Panther. Before that Windows (I worked mostly with NT4, andan early verison of NT5 on Dec Alpha's!). I have experience with Sun SPARC; I once had a 'pizza box' SPARCstation as my home PC, thanks to a company retiring them, and wanting to try it out.

I've used various flavours of Unix in a work environment, but couldn't honestly describe the differences between the main branches. I used to know my way round NetWare reasonably well. I've used BSD on a Dec Alpha, I was proficient with OS/400. I used to test software on Mac OS 8 (I hated pre-OS X Macs).

My first experience with Linux was SuSE 6, 6.0 I think, on an Intel home PC back around 1999. I kind of liked playing round with it, but... I preferred the sparc station once I got it, and to be honest Windows was a lot easier to run. I think I was using Win95 and NT4 at home at the time, and, well, Doom, Unreal &c.

So SuSE didn't have a long tenancy on my home machine.

I played with various versions over the years, but once I got an iBook and then a PowerBook with, a working version of, OS X then that gave me enough of the Unix style underpinnings, and almost annoyingly it did Just Work, mostly. Those were the days of rampant Win viruses, which using OS X became almost an interesting spectator sport. As a contractor I was on site with a Powerbook when the virus de jour took down the whole Windows infrastructure - we ended up running Windows laptop network shared through my Mac - it was kind fo amusing, up 'till then everyone scoffed at the Mac, on Monday a lot of people had powerbooks! I had a black Intel MacBook the day they were released - yes i paid extra just to get one in black. I loved that machine. I've had a few powerbooks, but to be honest up until the one I have now they haven't always seemed as usable, stable or solid as they should have been.

I've used Raspberry Pi's for various home projects over the years, so have often had Debian variants around. They have never been a significant amount of my home computer usage.

Anyway - that's all to say, I'm not a Linux or Unix guru by any stretch, but I'm not a total noob, and I've got some experience of different OS's.

And here we are - new on my desk, because my laptop is getting a bit creaky. Wanting to use tensorflow-gpu, so Linux is recommended, Ubuntu seems the average home install. And that's what I've been using.

Previously on dual boot Win/Linux machines I've ended up booting to Windows more often than not. Just to get stuff done, rather than reconfiguring config, and rebuilding kernals to try and optimise for my machine.

(Hmm, I haven't built my self a new kernel specific to this machine, maybe I should.)

I've only been using this machine for a month or so, but I've only booted to Windows (after my shaky Ubuntu start), a few times. I think pretty much only when I've worked from home. Ubuntu (18.04 as of today), seems to have not really needed much configuration. I installed, minimal install in case you are wondering, added some extra bits and pieces and bells and whistles, but not much. I am running off a PCIE M.2 drive, and 'only' have 250GB for the main OS (most of my home drive is on a 1TB SATA SSD, and a 3TB HDD, but some is on the fast M2 drive), so it's got a bit of oof to it.

Do I like it thought? I think it's unfair to say - it's only been a month for a new, and i don't know the ins and outs of it. With that proviso - I'm both impressed and a bit meh.

Impressed.
It works. I didn't have to go and hand code my WiFi configuration, I had some graphics issues, but I think they were mainly my fault. I had to reinstall due to aforementioned issues, but I manged to save most of my home drive - obviously the non-M2 drive files were all safe, as a given. On install I could resize the M2 partition, install to a new blank partition, then mount the old install to /home, and a bit of playful wrangling later it was mostly back to where i was.

I really do like that Ubuntu recognises all (mostly all), my hardware. That shouldn't be enough, but a) my early experience of Linux is that this wasn't a given, and last time I built a Windows PC it didn't recognise my SATA HDD (SATA was new back then, I think I had to copy a patch on to the install medium, or something - it was quite a while ago now). Using Mac's, especially laptops, mostly hardware just works, unless it doesn't, and then it can be a right pain. So - should everything working be something that is a plus point, rather than something we should expect. I think a plus point.

Installing didn't ask me a million questions about partitions, I didn't have to consider thesize of a swap drive and where it was, thought that is playing on my mind a bit - right now I couldn't tell you if this install actually set itself up to have a separate swap drive, and just hid that from me. It plays on my mind, but I find myself struggling to care. Again - maybe this shouldn't be a plus so much as a minus for my previous experiences, Shrug. It's a plus.

This OS works well enough that I don't feel the need to drop back to Windows to do useful stuff, or just to browse time wasting videos on you tube.

I don't have any music loaded on this machine, but if i did I expect the sound would work fine. Systems sounds and video sounds do, so I see no reason why music players wouldn't. Butnow that does interest me, so brief interlude while i go and check the state of music players on Ubuntu. (See below for a slight annoyance on this). Unsurprisingly VLC is the only media player I have, and it plays music files as expected. I just loaded Audacious and that seems to work fine. These monitor speakers are really pants though.

It's somehting i have no doubt i could do on Windows and OS X, but it's presented well in Ubuntu - custom key combinations. It's very very straightforward to set up key combinations to do things. <win> + t opens a terminal for me. <win> + e a file explorer (muscle memory...) and similar. There are straightforward ways to do this in Windows, and I've never felt the need to in OS X, but still, I like this.

Meh
I connect to my NAS. The OS asks for a username / password and whether it should remember them. I enter and tell it to remember. That only lasts one session. I've no idea why. (I assume i could go and find out).

Probably not a Ubuntu issue, but an Intel/Nvidia one - I was running one monitor off the Intel i5 built in video, and one from the 1070. That just stopped working at some point (I believe it's that the two drivers don't play nice). I'd like to be able to switch driver, so if i use the GPU for tensorflow I turn all video off on it. That's probably not a 'normal' thing to do, but whilst i suspect i could do it, and I can think of ways it may be achievable, I wouldn't really know where to start setting it up.

Once i ruled out the intel graphics, I fed both monitors off the 1070. One HDMI one displayport. Ubuntu for quite a while didn't like the display port monitor, which was annoying as that was my better monitor (second monitor doesn't have display port, my 1070 only has one HDMI out). I did fix this (I don't remember how, but I did).

While checking for music players i remembered that the machine seems to drop my sound output preference. I don't have speakers connected but my main monitor has speakers (pretty rubbish ones, but good enough). On boot it seems to switch output to HDMI, whereas my monitor with speakers is the displayport monitor. I would guess the HDMI monitor registers first or something and that may be fixable, but... it's just a slight annoyance.

This is the big one though, and it is down to me as much as Ubuntu (and maybe should be in the Impressed column). The OS doesn't seem to want me to customise it. On previous Linux installs there's always been control panels with a million options to tweak. Settings in this version of Ubuntu is a very anaemic application. I did add the Tweaks app, but still. This is down to me, because the thing I like is that it's mostly working without playing around, so why do I want to have to tweak it? I no doubt could go and play with the configuraiton files, and I want to. But the OS doesn't seem to want me to do that.

Hardware issues. Everything mostly works. Not everything does though. One issue is definitely not down to Ubuntu but Asus, and one may be Asus, Intel or Ubuntu.

The Asus issue - my motherboard has blinky lights, and a blinky light controller (Aura). The Windows app isn't particularly good. I like playing with RGD LEDs from an arduino (mostly so-called neopixels), and am really quite comfortable building my own effects, triggers and such. Aura just has some built in effects. it doesn't want to let me write my own effects as far as I can see. Aura does seem to sun from the motherboard because it remembers the settings in Ubuntu. Which brings me to - from Linux there appears to be no way to change the LED settings. That is entirely down to Asus, and I suspect if i put in some time and effort I could contribute to one of the projects trying to reverse engineer and API. I suspect in future I may do this. But it is kind of annoying that on a modern motherboard that is built with tweaking in mind, on what I would call a mainstream OS, there is no support for changing settings.

The second issue which again may be down to Asus, Intel (or even Be Quiet - the case manufacturer), when I plug an arduino in to a USB port it doesn't seem to get a usable virtual serial port. I am using the front case USB hence it could be something odd about the front panel. It could be soemthing odd about an Intel chipset (I've no idea if this even makes sense), it could be down to an Asus chip set (ditto), or it could be down to Ubuntu. Something I will look into in future. Given the motherboard doesn't want me to play with the built in RGB LEDs I was going to hook up an arduino with some LEDs to give a visual CPU temp indicator. Idont' want the main RGBs just to show CPU temps, as mostly I like the fading i have set, and for all I am playing with ML on this machine, I don't often stress the CPU. It is a quiet case, so I suspect runs hotter than average, so I would like one set of lights to give me fair warning. My comparison is that on a Wndows machine USB serial ports need a bit of work, but it's a fairly painless process. On my Mac's I've always, or mostly, just plugged the device in and got a port - this isn't the case on some boards as they use a weird driver, but still, I just installed that driver and was good. Thinking about it that's possibly what I need to do here too. This is actually the thing that surprised me most, to be honest. (Reading up, I have various things to try, so all is not lost).

My keyboard has an app on Windows, but the key combo situation on Ubuntu makes that pretty much irrelevant. The keyboard suffers the same complaint as I have with Aura (I can't build my own effects), but - it's a keyboard! It works fine. Ubuntu actually seems to play nicer with Windows as far as keyboards are concerned; I previously used an Apple keyboard on Windows, and it was a bit of a pain - now i use a keyboard with Win keys, and it's all quite nice in Ubuntu. I know this is actually down to most keyboards being made for Windows users, and isn't a Linux thing as much as a Windows / OS X thing, but still.

I remember windows managers having a plethora of options to play with, and being unreasonably tweakable. That doesn't seem to be the case anymore. I do realise i can apparently quite easily change my window manager, and probably get all that tweakiness back. I also know that I used to tweak things beyond sensible limits and have to work out how to reset everything :) So meh, but no great loss.

Grub. I remember grub being very straightforward, and really quite reliable. On this machine every few boots rather than booting to the grub choice of OS, it boots to a grub prompt which I need to exit from. Minor, but annoying. When I wanted to tweak my grub settings there was a whole hierarchy of files to look at, and none of them were particularly obvious as to what they did. An install added some rubbish to grub, and it was absolutely not clear to me how to get rid of it. I know this isn't a Linux thing per se, and is absolutely down to my lack of knowledge of grub. It used to be a lot more intuitive though - and probably less capable. I just need to get up to speed with grub.

Most of the above are things which I could still do (I believe), they just aren't as obvious, or needed, as they used to be, as far as I remember. So calling them meh, when the thing I apprecuiate most is you don't need to do any of this stuff is perverse at best.

I guess this really is an home user OS. Bravo - year of the Linux desktop appears to be utterly plausible - from a non-nerdy, trying not to be a bit of a twat, I think this OS actually is better than Windows. Win10 is subjectively 'nicer' and probably easier to tweak safely, but I actually would recommend Ubuntu over it.

Other thoughts
I think some of the things that make Linux a lot easier now than when I played with it before are not necessarily related to my install on Linux.

Playing music from my computer is mostly irrelevant now as I have a couple of Sonos speakers, and control these from my phone, using Apple Music as a source. I don't actually 'need' music on my computer. Thus it doesn't matter if there are alsa configuration issues or not (there don't appear to be, I don't even know if also is still a thing). I would like a nice Sonos app for Linux, but then the app for Windows is horrible, and I'm not sure I've actually used the Mac app for a long time. I just use my phone to control the speakers. The last main home PC i used had a pretty good Soundblaster surround sound card and speakers to match. Maybe if i was gaming that would still matter but... I just don't feel the need for it. besides, my motherboard being a 'gamer' mb apparently has pretty good audio built in (8 channel, dual OP amps, 32bit playback, 'premium' capacitors, shiedling, blah blah blah - though not 8 channel and 32 bit at the same time).

If i want to play around composing music I'd still just fire up Logic or GarageBand on my Mac (I used to play around with loop editors, but GarageBand scratches that itch, and Logic I just sit playing with effects).

If i want to watch video I do so on my smart TV, or on the TV video an Apple TV or Amazon box.

I don't really play computer games anymore, but I do have a PS4 for the few games that do interest me (Horizon Zero Dawn and Assassins Creed Origins, with a bit of Skyrim or Witcher). maybe that's the legacy of using a Mac, and if I used Windows I'd get in to PC games (Rust looks fun, and I seem to have Portal in a steam library). I may get a retro emulator going though - last time I did that was using a dedicated RPi, so I'm fairly confident it would work just as well on Ubuntu.

I got used to Gimp for image manipulation, and I *assume* that will work just as well on Ubuntu.

I got used to using Google docs for the occasions i use spreadsheets at home. Mostly something i would use excel to model or calculate I just use python for though.

Chrome is my browser of choice, and Chrome is available on Linux. I didn't think it was, though maybe that is just an RPi thing.

I have a small NAS so file sharing isn't really an issue. I also use Dropbox, so again, no real need for file sharing. (It is annoying Windows and Linux on the same machine won't happily and safely share a Dropbox directory, but that isn't a massive issue for me).

Disk space is stupid cheap, so having dupe files (see dropbox), isn't a huge deal. With around 2TB just for random docs and two operating systems, and 3TB for general file storage, and room to add a lot more disks if i really want to, space isn't likely to be an issue any time soon. When it is I'm more likely to drop bigger disks in my NAS, and use the two drives form there in this PC as temporary possibly non-safe local storage. As above this makes the whole install situation a lot more straightforward.

File drivers for Linux appear to be a lot better than they used to be. Annoyingly accessing Mac volumes from Linux is the only problem I've had - NTFS and exFAT pretty much just work.

Flash doesn't seem to be used anywhere. Or much. And I'm not sure Chrome actually supports it anyway - I don't seem to have had to disable it for years :)

I have Hue lights, but wrote python scripts to control them, which should work just as well here as on my Mac. Though I don't really use the scripts any more, I just use a Hue switch, timers and my phone.

I think devices that work well with my mac would work well here. I have a GPS for my bike, and it connects as a mass media device. Should work here. I suspect updating maps won't work, but it is rare I do that, and I do have Windows / my Mac if it is a problem. I have a small GPS tracker I wanted to use to track rides, but that didn't really like Windows or Mac - I wouldn't be that surprised if it worked on Linux to be honest (I think the software I tried on my Mac was actually based off an open source Linux project).

The Apple devices I have (phone and watch), never connect to a computer anyway, so I basically never use iTunes. Which is a good thing, whatever operating system you are using.

I'm pretty sure when I need to compile app's from source the CPU / RAM i have will make that entirely painless. Compiling tensorflow was reasonably fast and painless. Painless as far as the actual compile was concerned, anyway. I suspect a kernel compile would probably be pretty quick, when I feel the urge to royally break my machine.

The world seems to have got less dependent on home PC operating systems, and devices have got more independent and smarter for the most part. It seems everything has a web interface nowadays.

So - I like Ubuntu, but it's a bit plain. Plain isn't bad though, when you actually just want to do stuff other than playign with an operating system and it's tools.

31 March 2019

Speeding up my CNN, on GPU

The story so far: I now have a new PC, self-compiled tensorflow gpu, and a working environment. I have some RGB leds in my PC and on my keyboard (speeds everything up, this is important).

I ran my model just on my CPU and it was taking ~570s per epoch.

On home built tensorflow it was taking 735 (i only had two datapoints, and I dont' think I had configured it).

With GPU it was taking 1009s

Running like that I and viewing nvidia-smi I get:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  8%   62C    P2    44W / 151W |   1291MiB /  8116MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1322      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      1364      G   /usr/bin/gnome-shell                          51MiB |
|    0      1636      G   /usr/lib/xorg/Xorg                           413MiB |
|    0      1755      G   /usr/bin/gnome-shell                         224MiB |
|    0      3112      G   ...quest-channel-token=4965395672649938349   160MiB |
|    0     19665      G   ...than/anaconda3/envs/p36TFGJT/bin/python     3MiB |
|    0     19782      C   ...than/anaconda3/envs/p36TFGJT/bin/python   399MiB |
+-----------------------------------------------------------------------------+

htop gives me:

  1  [||||                                                                              3.3%]   4  [|||||                                                                             4.6%]
  2  [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                       66.4%]   5  [||||                                                                              3.3%]
  3  [||||||||||||||||||||||||||||||||||                                               38.5%]   6  [||||                                                                              3.3%]
  Mem[||||||||||||||||||||||||||||||||                                           5.82G/31.3G]   Tasks: 207, 1066 thr; 2 running
  Swp[                                                                              0K/2.00G]   Load average: 1.00 2.10 3.08 
                                                                                                Uptime: 19:21:32

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
19782 me   20   0 18.8G 1785M  519M S 105.  5.6  0:48.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19922 me   20   0 18.8G 1785M  519M R 99.0  5.6  0:40.12 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 1755 me   20   0 4254M  435M 96436 S  3.9  1.4  5:13.84 /usr/bin/gnome-shell
19665 me   20   0 3068M  233M  107M S  2.0  0.7  0:04.11 /anaconda3/envs/p36TFGJT/bin/python /home/jonathan/anaconda3/envs/p36TFGJT/bin/spyder
 1636 root       20   0  619M  170M 85348 S  1.3  0.5  3:09.35 /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -background none -noreset -keeptty -verbose 3
 2293 me   20   0  797M 46732 28624 S  1.3  0.1  0:05.04 /usr/lib/gnome-terminal/gnome-terminal-server
19893 me   20   0 18.8G 1785M  519M S  1.3  5.6  0:00.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19898 me   20   0 18.8G 1785M  519M S  1.3  5.6  0:00.30 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 3072 me   20   0 1526M  324M  120M S  0.7  1.0 10:02.91 /opt/google/chrome/chrome
18139 me   20   0 41792  5956  3880 R  0.7  0.0  0:16.52 htop
 3112 me   20   0  786M  270M 99900 S  0.7  0.8  4:48.11 /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=4918414677377325872,16218879521054795833,131072 --gpu-preference
19895 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19896 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19820 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.22 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19897 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.83 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 1993 me   20   0  789M 38508 29340 S  0.7  0.1  3:39.50 psensor
19894 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.31 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 3578 me   20   0  816M  163M 71896 S  0.7  0.5  2:19.69 /opt/google/chrome/chrome --type=renderer --field-trial-handle=4918414677377325872,16218879521054795833,131072 --service-pipe-toke
 1642 root       20   0  619M  170M 85348 S  0.7  0.5  0:04.79 /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -background none -noreset -keeptty -verbose 3
19892 me   20   0 18.8G 1785M  519M S  0.7  5.6  0:00.07 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
19891 me   20   0 18.8G 1785M  519M S  0.0  5.6  0:00.29 /anaconda3/envs/p36TFGJT/bin/python -m spyder_kernels.console -f /run/user/1000/jupyter/kernel-8283c73bc975.json
 3240 me   20   0  763M  104M 64516 S  0.0  0.3  1:32.81 /opt/google/chrome/chrome --type=renderer --field-trial-handle=4918414677377325872,16218879521054795833,131072 --service-pipe-toke
19785 me   20   0 3068M  233M  107M S  0.0  0.7  0:00.17 /anaconda3/envs/p36TFGJT/bin/python /home/jonathan/anaconda3/envs/p36TFGJT/bin/spyder
 3093 me   20   0 1526M  324M  120M S  0.0  1.0  3:25.67 /opt/google/chrome/chrome
F1Help  F2Setup F3SearchF4FilterF5Tree  F6SortByF7Nice -F8Nice +F9Kill  F10Quit

I don't seem to be stressing my system, particularly. I want to stress it, it can work harder, dammit.

I need to read this:
https://stackoverflow.com/questions/41948406/why-is-my-gpu-slower-than-cpu-when-training-lstm-rnn-models

But also, this: https://medium.com/@joelognn/improving-cnn-training-times-in-keras-7405baa50e09
Has a quick fix (I like quick fixes!).

In my .fit_generator add:

use_multiprocessing=True,
workers = 6,

So let's try that. (The number of workers recommended is actually higher, but that is on Ryzen 7 with 16 threads available, I have an I5 with six threads available).

Lets try this.

nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  8%   62C    P2    47W / 151W |   1303MiB /  8116MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1322      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      1364      G   /usr/bin/gnome-shell                          51MiB |
|    0      1636      G   /usr/lib/xorg/Xorg                           413MiB |
|    0      1755      G   /usr/bin/gnome-shell                         224MiB |
|    0      3112      G   ...quest-channel-token=4965395672649938349   172MiB |
|    0     20260      G   ...than/anaconda3/envs/p36TFGJT/bin/python     3MiB |
|    0     20376      C   ...than/anaconda3/envs/p36TFGJT/bin/python   399MiB |
+-----------------------------------------------------------------------------+

Definitely seems to be using more GPU
htop was hard to get a view of, because it was hammering all cores :) That's what I like to see.

Epoch time was 198s

However, keras does give me a warning:
UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
UserWarning('Using a generator with `use_multiprocessing=True`'

Hmm, is this really faster, or just doing not so great a job?

Let's try a real run with a lot more epochs, and see what results I get. 1 3/4 Hours for 25 Epochs. Not too bad.

I really do need to start tweaking my model now though. Something for later, I guess.

30 March 2019

And we (may) have lift off / Houston we have a problem.

When last i wrote (last night), I'd run a successful tensorflow gpu build. I'd installed it in to a conda environment.

And on running I had a driver version mismatch. 418 gpu driver, 410 used everywhere else.

I didn't really look to fix that last night. I made a half-hearted attempt to install a default version again, just because I now knew I had all the supporting files, drivers, libraries.

None of that worked, obviously.

So I ate some junk food, watched some Expanse ("I'm that guy"!), wood working videos, went to bed. Got some sleep.

It's morning. Last workday of my weeks vacation. And I have some stuff to do today that isn't computer or youtube watching related.

I fired up my machine. Ensured i had the same error still, deleted some shoddy conda environments, did a little clean-up on the system and thought about it.

I need to fix the driver issue. Let's hiut a search engine with the "failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination" error, and see what I see.

Going through the first hit (https://github.com/tensorflow/tensorflow/issues/19266), just following along for s&g, the responder asks for nvidia-smi output. Hmm, i wonder what that does currently show on my machine.

I run it, and... it errors. Wrong driver version. Oh FFS! Good start. Looks like some of my system is now 410 and some 418 (rather than just having some libs compiled against a different version).

Try reinstalling the 418 driver from nvidia. That errors (in use error).

So I decide i may as well power off / power on, just to see if my machine actually still works.

I do, it does. Yay.

Run nvida-smi and...

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   49C    P0    41W / 151W |    367MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1289      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      1331      G   /usr/bin/gnome-shell                          51MiB |
|    0      1602      G   /usr/lib/xorg/Xorg                           193MiB |
|    0      1719      G   /usr/bin/gnome-shell                          93MiB |
+-----------------------------------------------------------------------------+

So - it's using 410 now. Open nvida server settings. Driver version 410.
My nvidia drivers now all appear to be 410. Interesting.

Let's fire up a clean python 36 environment, install my tensorflow gpu build, and see what happens!

tf.test.is_gpu_available(

    cuda_only=False,

    min_cuda_compute_capability=None

)

Which outputs:

2019-03-29 10:54:29.739032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0

2019-03-29 10:54:29.739060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-03-29 10:54:29.739064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088]      0

2019-03-29 10:54:29.739067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0:   N

2019-03-29 10:54:29.739136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/device:GPU:0 with 7238 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)

Out[8]: True

My, that looks interesting. It seems to be gpu'ing.

I can't run my cnn script yet, as I don't have keras in that env. Let's get myself up and running, and see if i can execute that script without errors!

conda install -c conda-forge keras

Does include the following

The following NEW packages will be INSTALLED:

  <snip>
  tensorflow         conda-forge/linux-64::tensorflow-1.1.0-py36_0

  <snip>

So - I may have some further work to do...

Yeah, everything fails horribly. Let's try installing my own tensorflow gpu build over the top. Just as well I copied it, tmp has deleted the version there So - where did I copy it too? Time to go hunting I guess. Note to self, get some order around where i store self-built tensorflow.

Bugger, bugger, bugger. When i run pip it just reports i already have tf, and does nothing.

Time to try --ignore-installed (on a cloned environment, obviously)

pip install --ignore-installed BackupTF/gpu_wheel/tf_nightly-1.13.1-cp36-cp36m-linux_x86_64.whl

Which in turn did a lot of churn on some things i know can be a bit... stroppy. (protobuf for eg, I've had issues with version before).

Well, it did run, lets try it. The tensorflow-gpu tests script I have seems to be happy. Yay.

My cnn script fails with a PIL.Image import issue (which I've seen recently anyway). Lets go and fix that.

just because I can't resist it, anger being an energy and all that:

pip install pillow

Conda env switch (to force an activate in the env I'm in). My tf gpu tests still pass.

But my cnn script fails witht he following. Still it's progress!
UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
[[loss/mul/_91]]

Let's try and fix that next shall we?

Long python scripts - How do i know when I'm done?

On running long processes, you probably aren't always sitting watching it, or sitting watching a film / eating dinner / going to visit friends (and hence just checking when you get back).

My take on it was to use IFTTT and its Maker channel.

My approach is write a small python script I can import or copy paste in to my longer running scripts/ What I'll do is message once a script starts, and then when it ends. You could do whatever you like within the confines of what IFTTT offers and services you have linked, but I'm going to send a message to slack.

My python code is:

import requests


sEventName = '<myeventname>'
a = 'String1'
b = 'String2'
c = 'String3'
sKey = '<mysecretkey>'



def IFTTT(sKey,sEventName,first, second, third):
    report = {}
    report["value1"] = first
    report["value2"] = second
    report["value3"] = third
    sTrigger = sEventName
    sURL = 'https://maker.ifttt.com/trigger/' + sTrigger + '/with/key/' + sKey
    print(report)
    print(sURL)
    #requests.post(sURL)
    requests.post(sURL, data=report)    



IFTTT(sKey,sEventName,a, b, c)

sEventName is the event name configured in IFTTT, sKey is my secret key (which obviously shouldn't be shared, so don't check it in to a public git repo...)

And when runs slack gets this message:

The event named "PythonNotification" occurred on the Maker Webhooks service String1 String2 String3 March 30 2019 at 10:46AM

You have quite some latitude for what is posted, and you would obviously change the strings.

Something like:
a = 'Started my CNN run'
b = 'Number of Epochs' + str(numEpochs)
c = 'for model My little CNN Model'

IFTTT(sKey,sEventName,a, b, c)
Then on completion
a = 'Completed my CNN run'
IFTTT(sKey,sEventName,a, b, c)

I think I first realised IFTTT had this capability after reading this: https://anthscomputercave.com/tutorials/ifttt/using_ifttt_web_request_email.html - that page goes in to detail on setting up IFTTT, so I don't feel the need to.

29 March 2019

Failed to get convolution algorithm

Running a cnn script for object notification ( a learning exercise on pictures of cats and dogs), which is what drove me to build a new pc in the first place, errors.

classifier.fit_generator(training_set,
                        steps_per_epoch=8000,
                        epochs=iEpochs,
                        validation_data=test_set,
                        validation_steps=2000,
                        callbacks = [tbCallBack]) # Last param is a late addition

Gives this:
Epoch 1/2
2019-03-29 12:33:09.921019: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-03-29 12:33:09.921056: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.104.0
2019-03-29 12:33:09.921062: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-03-29 12:33:09.921070: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.104.0

And a stacktrace, and then this

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_2/convolution}}]]
[[loss_1/mul/_185]]

So - my job for today is working out, like, wtf dude, to try next. As it were.

(Yes - it is only two epochs - that's just while I am checking to see if it works, once it's running that goes way up, which is why I've parameterised the number.)

let's start looking here:
https://github.com/tensorflow/tensorflow/issues/24828]

There seems to be a suggestion to add:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

I'm using tf through keras, so not sure this will help, but let's try
Still doesn't work.

(p36TFGJT) jonathan@Wednesday:~$ nvidia-smi
Fri Mar 29 12:58:16 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   46C    P8    17W / 151W |   1280MiB /  8116MiB |     21%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1289      G   /usr/lib/xorg/Xorg                            26MiB |
|    0      1331      G   /usr/bin/gnome-shell                          51MiB |
|    0      1602      G   /usr/lib/xorg/Xorg                           497MiB |
|    0      1719      G   /usr/bin/gnome-shell                         203MiB |
|    0      3783      G   ...equest-channel-token=798233842220658081   197MiB |
|    0      5478      G   /usr/bin/nvidia-settings                       0MiB |
|    0      5925      G   ...-token=A5215F1CE4347817C36139407E5E1125    58MiB |
|    0     11552      G   ...than/anaconda3/envs/p36TFGJT/bin/python     3MiB |
|    0     11675      C   ...than/anaconda3/envs/p36TFGJT/bin/python   235MiB |
+-----------------------------------------------------------------------------+
(p36TFGJT) jonathan@Wednesday:~$ dpkg -l | grep -i cudnn
ii  libcudnn7    7.5.0.56-1+cuda10.0    amd64        cuDNN runtime libraries
ii  libcudnn7-dev    7.5.0.56-1+cuda10.0    amd64        cuDNN development libraries and headers
(p36TFGJT) jonathan@Wednesday:~$

In case that's of interest (it kind of is to me).

This dude has what sounds like the sam eproblem as me:
https://github.com/tensorflow/tensorflow/issues/22056#issuecomment-470749095
https://github.com/tensorflow/tensorflow/issues/22056#issuecomment-471091775

So - the solution there was:

I found this, after all, to work for me :
-from software & updates > Additional Drivers > choose nvidia-418(downloads and install it)
reboot PC
As result got upgraded to cuda-10.1 (from 10.0). It works for now !

I'm not sure i want to try this though... First off, let's do a full back up (using Deja Dup, back in time didn't really work out for me).

But let's do it. The backup completed, the update ran. I'm back on 418 drivers.

Restart, run GPU tests. Close / reopen spyder.

Results are in:

Using TensorFlow backend.
2019-03-29 14:32:24.335488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0
2019-03-29 14:32:24.335517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-29 14:32:24.335521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088] 0
2019-03-29 14:32:24.335524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0: N
2019-03-29 14:32:24.335761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7262 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)

About to start model training
At: (2019, 3, 29, 14, 32)
Epoch 1/2
2019-03-29 14:32:32.916800: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-03-29 14:32:33.184104: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
8000/8000 [==============================] - 1009s 126ms/step - loss: 0.4253 - acc: 0.7970 - val_loss: 0.6266 - val_acc: 0.7519
Epoch 2/2
8000/8000 [==============================] - 1012s 126ms/step - loss: 0.1642 - acc: 0.9349 - val_loss: 1.0975 - val_acc: 0.7356

Finished in 2028.7929067611694
Finished at: (2019, 3, 29, 15, 6)

Model time was 2021.7227563858032

Yay, it ran on the GPU.

Hmm, it was far slower than my CPU. Time to ponder what's up.

Tensorflow GPU Checks

Once we have, think we have, suspect we have, hope we have or are otherwise inclined to think we may have tensorflow GPU support, it's time to check. The below are snippets that may help us check:

# One git issue indicated this may be required. I lost the issue ID

import os 

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="0" 

# import tensorflow
import tensorflow as tf

# Check no.1
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

# Check no.2
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

# Check no.3
# Built in - this should probably be check no. 1
tf.test.is_gpu_available(
    cuda_only=False,
    min_cuda_compute_capability=None
)

Results I get are:
Check no.1
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2019-03-28 20:54:39.026230: I tensorflow/core/common_runtime/direct_session.cc:316] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

Hmm, CPU not GPU. Fail. We expect all of them to fail, but they are vaguely interesting.

Check no.2
<long stack trace>
InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <ipython-input-7-5d3b23a68111>:4) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
[[MatMul]]

Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
a (defined at <ipython-input-7-5d3b23a68111>:2)
b (defined at <ipython-input-7-5d3b23a68111>:3)

Check no.3
False

So - for me at least, tensorflow isn't using the GPU. Pah!

I actually already knew this, as i get a driver mismatch error. This is just here in case it's useful.

Added Later
Once my home built tensorflow gpu libs are built and up and running this is what i get, if you wish to compare:

import os 

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152 
os.environ["CUDA_VISIBLE_DEVICES"]="


import tensorflow as tf

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

tf.test.is_gpu_available(
    cuda_only=False,
    min_cuda_compute_capability=None
)

Returns:

Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1
[[22. 28.]
[49. 64.]]

2019-03-29 12:28:30.622272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0
2019-03-29 12:28:30.622303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-29 12:28:30.622308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088] 0
2019-03-29 12:28:30.622311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0: N
2019-03-29 12:28:30.622403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6722 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-03-29 12:28:30.622532: I tensorflow/core/common_runtime/direct_session.cc:316] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1

2019-03-29 12:28:30.625908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0
2019-03-29 12:28:30.625938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-29 12:28:30.625943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088] 0
2019-03-29 12:28:30.625946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0: N
2019-03-29 12:28:30.626034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6722 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-03-29 12:28:30.645547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0
2019-03-29 12:28:30.645575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-29 12:28:30.645579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088] 0
2019-03-29 12:28:30.645582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0: N
2019-03-29 12:28:30.645645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/device:GPU:0 with 6722 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
Out[10]: True

Bolding added just to highlight what I was looking for. While I'm in a musical mood - it turns out I have found what I was looking for.