My Self Driving Car on a difficult track at speed

Well, in a simulator 🙂

(Sorry about the bad filming – it was the easiest way to get even a semi-decent quality)

Code is here:

I basically follow the NVidia End to End Learning for Self-Driving Cars paper with a few tweaks to add in dropout, slightly larger Dense layers to compensate, and an additional convolution layer to handle the different input image resolution.  It runs in real time on a GTX 980 Ti.

Tips on machine learning with TensorFlow / Keras

I see people new to machine learning make the same sort of mistakes.  Mistakes that I’ve made myself and then painstakingly tried to fix.  So here’s some hints that I’ve had to work out the hard way, along with some code.

As an example, I’ve used my self driving car code, which I’ll put up on the web.

  • Plot your input and expected output data!  For the training, validation and test data, individually.  It’s too easy to accidentally shuffle your data in the wrong place or to have unbalanced data etc.


    def showDrivingAngles(samples, title="samples"):
        plt.hist([sample.driving_angle for sample in samples ], 16)
        plt.title("Driving angle distribution in " + title)
  • Make sure your data is balanced, by over-sampling, under-sampling or generating fake data.

    Example to over-sample:  (Suggestions on making this better are welcome!)

    def duplicateSamplesToRebalanceByDrivingAngle(samples):
        # Bin the data, returning an array of which bin each sample is in
        num_bins = 16
        indexes = np.digitize([sample.driving_angle for sample in samples], np.arange(-1, 1, 2/num_bins))
        # Find how many samples are in the largest bin
        largest_bin_count = np.max(np.bincount(indexes))
        rebalanced_samples = []
        for j in range(num_bins):
            bin_indexes = np.where(indexes==(j+1))[0]
                for i in range(largest_bin_count):
                    rebalanced_samples.append(samples[bin_indexes[i % len(bin_indexes)]])
        return rebalanced_samples

    (And of course, plot your data afterwards like above, to check that it worked correctly)

  • View random samples of your data after preprocessing.  It’s worthwhile to get a bit fancy here.  Show the label for the data here.  For example, I draw an arrow directly on the image to indicate which direction the label says that it should go in:
    image = self.getImage()
     s = image.shape
     line_len = s[0]//2
     angle = self.driving_angle / 360 * np.pi * 100 # Times 100 just to make it more visible
     line_y, line_x = int(line_len * np.cos(angle)), int(line_len * np.sin(angle))
     rr,cc = draw.line(s[0]-1, s[1]//2, s[0]-1-line_y, s[1]//2 + line_x)
     image[rr,cc,:] = 255
     plt.title("{} degrees".format(float(self.driving_angle)))

    view from right driving straight

  • Check your colorspace! Notice that in the image above, the colors are all wrong. This was totally on purpose, to demonstrate another point – Make sure you understand what colorspace your data is in! cv2.imread() return BGR. While almost everything else is going to use RGB. I highly recommend converting to YUV immediately, and using that everywhere. Make sure you do this in both the code that trains the model, and code that uses the model.
  • You can do preprocessing in the model itself.  This is extremely useful if you want to use your model in a program and don’t want to copy the preprocessing code.

    Example to crop and normalize image:

    inputs = keras.layers.Input(shape=(160,320,3))
    output = keras.layers.convolutional.Cropping2D(cropping=((55, 25), (0, 0)))(inputs)
    output = Lambda(lambda x: x/127.5 - 1., input_shape=imageshape, output_shape=imageshape)(output)
  • If you do preprocess in the model, preview the output.  You can do this by creating a temporary model and using predict.

    def showLayerOutput(sample):
        model = Model(inputs=inputs, outputs=output)
        croppedimage = model.predict(np.array([sample.getImage()]))[0]
  • Use asserts on the shape of your model.  For example:
    assert output.shape[1:] == (1,31,64)
  • Print out some outputs!  This is really important to keep an eye on what the predictions are.  Are they never negative when you want them to be?  Are they never larger than one when you want them to be?  (You’re using the wrong activation function on the last layer). Is it always outputting zero or similar (learning rate too high?)
    In keras you do this with a callback.


    class DebugCallback(keras.callbacks.Callback):
        def on_epoch_end(self, batch, logs={}):
            samples = train_samples[:5]
            print("Should be: ", [sample.driving_angle for sample in samples])
            print("Predicted: ", [x[0] for x in model.predict(np.array([sample.getImage() for sample in samples]))] ) #Print predicted driving angle for first example
    debugCallback = DebugCallback()
    model.fit_generator(train_generator, epochs=200, steps_per_epoch= len(train_samples)/batch_size, validation_data=validation_generator, validation_steps =len(validation_samples)/batch_size, callbacks=[tbCallBack, checkpointCallback, reduce_lr, debugCallback])



Opinion polls

This is about a political post, but this post isn’t political but purely about the statistics.  A Lords Labour MP recently wrote:

We often read that there is a plus or minus 2 or 3% statistical margin of error in a poll. But what we are rarely reminded of is that this error applies to each party’s vote. So if a poll shows the Tories on 40% and Labour on 34%, this could mean that the real situation is Tory 43%, Labour 31% – a 12 point lead. Or it could mean both Tory and Labour are on 37%, neck and neck.

But is this true, mathematically?

When we say “Tories on 40% ± 3%”  we mean:


A Normal Distribution with mean 40, and standard deviation of 3/1.96

Let’s plot both on the same graph:


Which was achieved in Wolfram Alpha with:

Plot[{PDF[NormalDistribution[40, 3/1.96], x], 
      PDF[NormalDistribution[34, 3/1.96], x]},
      {x, 30, 44}]

Now, could Labour and Tory really be neck and neck, within our 95% confidence?

If they are not correlated at all, then no:

To subtract normal distributions you have to do:

\sigma^2_{x-y} = \sigma^2_x + \sigma^2_y


$latex  \sqrt{3^2 + 3^2} = 4.2$

So, at 95% confidence, the difference in their lead is:  6 points ± 4.2.  As a plot:


The Neck-and-neck 0 point lead  and the 12 point lead are really unlikely outcomes! (0.3% in fact)

(Caveat:  Of course this all depends on the polls being accurate normal samples on the population).

But of course they are correlated… somewhat

If this was a two party system, with the total adding up to 100%, then the errors would be completely anti-correlated.  And if it was a many party system, with the total adding up to much less 100%, then we’d expect the errors to have a very weak correlation.  But with the errors adding up 70% we’re stuck in an awkward half-correlated stage.  Is there anything better that we can do?

Edit: Response from the Lords MP

I received a message from the MP:

I take your point but don’t entirely agree. The errors are associated ie high Tory is likely to mean low Labour and vice versa so these are linked contingencies.
But thanks for writing. And at least even if your point was accepted entirely it wouldn’t make any material difference to the conclusions of my article.


My self driving car

(This post is a work in progress, sorry. I’ll write it up better a bit later, honest!)

After a weekend of hacking around, I got a 95% accuracy rate on the validation and testing sets for recognizing road signs.  Here’s 5 random google images for ‘road sign photos’, and including one that fails:


My neural network got 4 out of 5 of the signs correct, incorrectly thinking the first image was 120 km/h instead of 20 km/h.  But it gets it correct on its second guess.

This was done by just adapting the LeNet neural network for recognizing digits.  I did add two dropout layers with a 50% probability, and increased the number of classes to 30 and input size to 32x32x3, and that was about it.

I had 35,000 training images, but increased this to 60,000 images by rotating and zooming the images.  My network does overfit badly though (reaching 99% accuracy on the training set, after 10 epochs).

Erasing background from an image

I have two opaque images –  one with an object and a background, and another with just the background.  Like:

I want to subtract the background from the image so that the alpha blended result is visually identical, but the foreground is as transparent as possible.



Desired output (All images under Reuse With Modification license)

I’m sure that this must have been, but I couldn’t find a single correct way of doing this!

I asked a developer from the image editor gimp team, and they replied that the standard way is to create an alpha mask on the front image from the difference between the two images.  i.e. for each pixel in both layers, subtract the rgb values, average that difference between the three channels, and then use that as an alpha.

But this is clearly not correct.  Imagine the foreground has a green piece of semi-transparent glass against a red background.  Just using an alpha mask is clearly not going to subtract the background because you need to actually modify the rgb values in the top layer image to remove all the red.

So what is the correct solution?  Let’s do the calculations.

If we have a solution, the for a solid background with a semi-transparent foreground layer that is alpha blended on top, the final visual color is:

out_{rgb} = src_{rgb} * src_{alpha} + dst_{rgb} \cdot (1-src_{alpha})

We want the visual result to be the same, so we know the value of out_{rgb} – that’s our original foreground+background image.  And we know dst_{rgb} – that’s our background image.  We want to now create a new foreground image, src_{rgb}, with the maximum value of src_{alpha}.

So to restate this again – I want to know how to change the top layer src so that I can have the maximum possible alpha without changing the final visual image at all.  I.e. remove as much of the background as possible from our foreground+background image.

Note that we also have the constraint that for each color channel, that src_{rgb} \le 1 since each rgb pixel value is between 0 and 1.  So:

src_{alpha} \le (out_{rgb} - dst_{rgb})/(1-dst_{rgb})


src_{alpha} = Min((out_r - dst_r)/(1-dst_r), out_g - dst_g)/(1-dst_g), out_b - dst_b)/(1-dst_b))\\ src_{rgb} = (dst_{rgb} \cdot (1-src_{alpha}) - out_{rgb})/src_{alpha}


Add an option for the gimp eraser tool to ‘remove layers underneath’, which grabs the rgb value of the layer underneath and applies the formula using the alpha in the brush as a normal erasure would, but bounding the alpha to be no more than the equation above, and modifying the rgb values accordingly.


I showed this to the Gimp team, and they found a way to do this with the latest version in git.  Open the two images as layers.  For the top layer do: Layer->Transparency->Add Alpha Channel.  Select the Clone tool.  On the background layer, ctrl click anywhere to set the Clone source.  In the Clone tool options, choose Default and Color erase, and set alignment to Registered.  Make the size large, select the top layer again, and click on it to erase everything.

Result is:


When the background is a very different color, it works great – the sky was very nicely erased.  But when the colors are too similar, it goes completely wrong.

Overall..  a failure.  But interesting.

Practical Deep Neural Network in a few lines of code with TensorFlow

This is using the very latest TensorFlow tf.contrib.learn API.  The documentation is extremely thin, and as of writing there are no tutorials out there similar to this.  The API may change, and I don’t actually know if I’m doing things correctly.  This is just trial and error 🙂

So let’s say you want to:

  1. Train a Deep Neural Network by using some existing data, and training it against known labels.  e.g. labeled images.
  2. Use that trained network in your app to predict y given x.  E.g. guess the best label for a given image

    And as a bonus:

  3. Have some nice graphs of how well the training is doing

I assume that the data you have in a plain array-of-arrays or numpy 2D array.  It should be easy to change this to use csv or panda etc data. A row is a single example, and a column is a particular feature (e.g. house price, or pixel intensity at a specific location)

So, without further ado, here’s the function that will be shared between your training and your app

def get_tf_model():
    """ Setup our Deep Neural Network and LOAD any existing model
          ( Our (maybe trained) model, a helper input function)
    # FEATURES is a short description for each column
    # of your data.  Change this to match your data
    FEATURES = ["x1", "x2"]
    # Set this to describe your columns.  If they are all real values,
    # you don't need to change it.
    feature_columns = [tf.contrib.layers.real_valued_column(k) for k in FEATURES]

    # Build 3 layer DNN.  You can change this however you want, or use a
    # linear regressor, or use a classifier etc.
    # NOTE:
    #   This will LOAD any existing model in the "model_dir" directory!
    # The documentation fails to mention this point as of writing
    regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_columns,
                                              hidden_units=[128, 128, 128],

    def input_fn(x_data, y_data = None):
        # Note the 'shape' parameter is to suppress a very noisy warning.
        # You can probably remove this parameter in a month or two.
        feature_cols = {k: tf.constant(x_data[:,i], shape=[len(x_data[:,i]), 1])
                        for i, k in enumerate(FEATURES)}
        if y_data is None:
            return feature_cols
        labels = tf.constant(y_data)
        return feature_cols, labels

    return regressor, input_fn

Now the code for training. Note that it’s absolutely fine to carry on training a model that is already trained, if you have some new data for it.

x_test = None
y_test = None
regressor, input_fn = get_tf_model()

def train(training_data_y, training_data_x):
    global x_test, y_test, regressor, input_fn

    if x_test is None:
        x_train = np.array(training_data_x[:-20], dtype=np.float32)
        x_test = np.array(training_data_x[-20:], dtype=np.float32)
        y_train = np.array(training_data_y[:-20], dtype=np.float32)
        y_test = np.array(training_data_y[-20:], dtype=np.float32)
        x_train = np.array(training_data_x, dtype=np.float32)
        y_train = np.array(training_data_y, dtype=np.float32)

    print("Training model ...")
    # Fit model. input_fn(x_train, y_train), steps=2000)

    ev = regressor.evaluate(input_fn=lambda: input_fn(x_test, y_test), steps=1)
    print('  -  Trained Loss: {0:f}'.format(ev["loss"]))

And now finally, the code to use this model in the app. Note that no explicit loading is needed, because it’s loading it from the model_dir

regressor, input_fn = get_tf_model()
def predict(x_data):
    return regressor.predict(input_fn=lambda:input_fn(x_data))

Isn’t that simple?

We can also view a graph of the loss etc with:

tensorboard tf_model  # or whatever you set model_dir to

Then navigating to in the browser

Worst/Trickiest code I have ever seen

It’s easy to write bad code, but it takes a real genius to produce truly terrible code.  And the guys who wrote the python program hyperopt were clearly very clever.

Have a look at this function:  (don’t worry about what it is doing) from

# These produce conditional estimators for various prior distributions
def ap_uniform_sampler(obs, prior_weight, low, high, size=(), rng=None):
    prior_mu = 0.5 * (high + low)
    prior_sigma = 1.0 * (high - low)
    weights, mus, sigmas = scope.adaptive_parzen_normal(obs,
        prior_weight, prior_mu, prior_sigma)
    return scope.GMM1(weights, mus, sigmas, low=low, high=high, q=None,
size=size, rng=rng)

The details don’t matter here, but clearly it’s calling some function “adaptive_parzen_normal”  which returns three values, then it passes that to another function called “GMM1”  and returns the result.

Pretty straight forward?  With me so far?  Great.

Now here is some code that calls this function:

fn = adaptive_parzen_samplers[]
named_args = [[kw, memo[arg]] for (kw, arg) in node.named_args]
a_args = [obs_above, prior_weight] + aa
a_post = fn(*a_args, **dict(named_args))

Okay this is getting quite messy, but with a bit of thinking we can understand it.  It’s just calling the  ‘ap_uniform_sampler’  function, whatever that does, but letting us pass in parameters in some funky way.

So a_post is basically whatever “GMM1” returns  (which is a list of numbers, fwiw)

Okay, let’s continue!

fn_lpdf = getattr(scope, + '_lpdf')
a_kwargs = dict([(n, a) for n, a in a_post.named_args if n not in ('rng', 'size')])
above_llik = fn_lpdf(*([b_post] + a_post.pos_args), **a_kwargs)

and that’s it.  There’s no more code using a_post.

This took me a whole day to figure out what on earth is going on.  But I’ll give you, the reader, a hint.  This is not running any algorithm – it’s constructing an Abstract Syntax Tree and manipulating it.

If you want, try and see if you can figure out what it’s doing.

Answer: Continue reading