Playing with machine learning: Getting tensorflow working

A week later, we get in to tensorflow again.

Before updating anything I look for reasonable back up solutions. I tried 'Back In Time', which looked like it should do what I want, but... that throws a bunch of errors for me. Possibly because mostly people would want to back up their docs, whereas I'm okay leaving them (the ones i want are already backed up to two other locations), I just want a record of my OS. back in time doesn't seem to really like that.

No matter, I have a week stretching in front of me, with not much I *have* to do, so I decide to go for it. (A lot more carefully than last time).

First - conda environments. I have been very cavalier about virtual envs previously, Not now.

conda create --name python36 python=3.6

(There is a good conda cheat sheet pdf linked to from here which I found quite helpful, just while i got in to the swing of things).

From there a tensorflow (CPU only) and keras install on that environment.

All still works. Times are good (for CPU) again. I do get informed that there are a few options my CPU supports that the binary I'm using doesn't support. They woudl maybe speed things up a bit, and now I have virtual environments sorted out, maybe building tensorflow from source is a good next step on the way to getting a working GPU build!

Building tensorflow.
Basically *all* you need to do is follow these instructions: https://www.tensorflow.org/install/source
How hard can it be? Well, not that hard. I mean, be careful when you check out the source, as someone may have recently checked in something that has problems.

Yeah, I got that. It's frankly sub-optimal.

Things to take in to account:
If you don't already have bazel set up, you'll need to get that. It's not complicated or anything, it's just one more thing to get up and running.
Tensorflow / keras currently supports Python 3.6 but NOT python 3.7. I did know this before the build, hence why I've prepped a 3.6 conda env, and why I've been using that for my standard install checks.
before you install your own tensorflow build, it's probably worth cloning your existing env, or just creating a new one.
If you run the install under a python 3.7 install everything will work until you try and run a pip install in your python 3.6 environment, at which point you'll be told where to go with your dodgy build. (The clue is the cp indicator in the pip wheel name, in case you hit a similar issue with some future version incompatibility)/ Yeah yeah, I knew I needed to run under python 3.6, I just had a bit of a brain freeze, and didn't actually think about having to build under 3.6. Note for future self - build with the python version you are going to run.

The initial build went okay, but the build-pip steps failed. It turns out that there was some new code that didn't include the required import to work. It was fairly easy to fix, and whilst I *really* wanted to commit (ha, spell check just wanted to correct that to vomit, how apropos) a fix to git,
a) I would need to go through a whole approval process (I think) but most importantly
b) someone beat me to it.

The build for me took >15 mins <1 1/2 hours (I went and watched a film while it was running).

Anyway, all built, it should now work. Fire up spyder, run my cnn script and... it failed.
Install keras, refesh the environment (I suspect I don't need to do that, but it makes me feel better to activate a different one, and then activate my actual one again), and retry.
Nope, keras now wants to use theanos.

<anaconda3>/envs/<env name>/etc/keras/load_config.py

Check what backend is set to. I think I've seen a couple of variations of this, one of which would give tensorflow to macs and theano to linux. Just make sure you have tensorflow, and especially for linux when there happens to be an if.

Everything is now hopefully fixed, and I deactivate / reactivate my environment, fire up spyder and run my script.

Yay, it works. Though it is actually slower now, as it is missing some XLA settings (not much slower, but slower regardless).

Regardless - I rebuilt tensorflow from scratch, including writing a fix to a buggy checkin, and fixed my env and it eventually works. That's enough for me in one evening.

Though...

I have the instructions for building with GPU support open. And I've got the build environment all warmed up (yeah, I know it doesn't work like that), and there is probably another film I could go and watch... hey, it's not like it's late, actually by now it's actually really really early, but I'm not workign tomorrow. Let's go for it.

I followed the same principle: follow the isntructions at https://www.tensorflow.org/install/gpu as closely as I can, don't improvise. What could go wrong?

Follow the instructions to get the various nvidia binaries I need. I actually already have these from my first foray a few weeks ago, but I go to the downloads and repeat again just to check it would eb the same files (as soon as I get asked whether I should overwrite, I kill the download).

I choose cuda 10.1 and cudnn 7.5 for cuda 10.1 (when I firts went through this, I didn't check I was getting the correct cudnn, so i got the lib for 10.0).

Put everything in it's place, run the installs, clean bazel. Run the config and... it doesn't find libcudnn

At this point i decide maybe it's time to get some sleep.

We'll got for a tensorflow-gpu build tomorrow, in Part 3.

Playing with machine learning

29 March 2019

Getting tensorflow working - Part 2

No comments:

Post a Comment

And now for a little Cthulhu