Box for cat

Just a light hearted post today.  My daughter had a great idea to build a felt house for her teddy cat, so we did it together.

It was a fun project, and simple to make.  The wood is just children’s blocks glued together, and the felt is stapled on.  The strap was cut from an ecobag.  A piece of velcro was added to hold the door open.

Advertisements

Robot face

I played about with making a robot face.  I combined a Ultrasonic Sensor (HC-SR04) and NeoPixel leds.  The leds reflect the distance – one pixel per 10 of cm.

I’m wondering if I can do SLAM – Simultaneous Location And Mapping  with a single (or a few) very-cheap ultrasonic sensor. Well, the answer is almost certainly no, but I’m very curious to see how far I can get.

face2

I don’t like PID control

 

I’ve implemented PID control far too many times now, and I’ve never been that happy with the results.

It is simple to implement, but tricky to tune.  My thick control theory book says that 95% of commercial in-use PID systems are sub-optimally tuned.  And I can believe it.

I’ve been recently playing with other systems, and currently I really like Model Predictive Control.  It does require some significant compute power, which was a big problem when it was invented, but not so much any more.   (As a side point, I’d like to implement MPC in tensorflow.  That would be cool, and could probably be done in a weekend).

I wrote a PID control system for a self driving car.  It is dumbly auto-tuned, so to speak, by a simple twiddle method.  It tests out a particular set of PID values for one minute, resets the track, then picks a new set of PID values based on the best seen values, changed slightly.

The results are…  very underwhelming.

 

The same driving but using Model Predictive Control, below, starts to look a lot nicer.  For reference, the PID control took a total of two days to implement, and the MPC took a total of three days.   The MPC version also has an additional complication – there’s a simulated delay of 100ms.

But I’m very pleased with the results.  It’s a bit jerky here due to random latency variations, made worse by the screen recording software.  I’ve got some ideas on trying to solve that by timestamping, but I don’t think I’ll have time to work on it.

I would love to try applying this MPC system to a Quadcopter.  I’ve written quite a bit previously about latency problems with PID in a quadcopter:  (In this video, the ufo-shaped quadcopter is attempting to move to, and maintain, a fixed position above the chair.  Ignore the blue man 🙂 )

 

 

Why Kalman Filters are broken, and how to fix

Okay, so clickbait title, because I’m referring to the most common case of using a single Gaussian distribution to represent the current estimated position (for example) and for the measurement.

Here’s my problem with that:

They don’t take into account that sometimes things go wrong.

Now you may say ‘duh’, but it can cause some really serious problems if something does go wrong.

Say you have a self driving car.  And the following happens:

  • Prior: You currently estimate the position of a person in front of you to be at position:  x = 2m ± 0.5m     aka   X ~ N(2,0.5)
  • Data:  Your instruments return a new measure:  the person is now measured to be at position: x = 10m ± 0.5m   aka    X ~ N(10,0.5)
  • Posterior: You apply your Kalman Filter and it concludes that the position of the person is now at:  x = 6m ± 0.25m  aka   X ~ N(6,0.25)

gaussian1

  • You are now extremely confident that the person is at x=6m despite you previously thinking that they were at a x=2m and your measurement data says that they are at x=10m.  You, the self driving car, now happily drive at x=10m.

 

How confident would you, as a, human, really be that the person is at x=6m?  Barely at all, right?  So something has gone wrong here.

Here’s an example in a simulator.  A car is driving from right to left and we are trying to track its true position with a Extended Kalman Filter.  ◯ The red circles indicate data from the simulated LASER, ◔ blue circle circles are RADAR, and ▲ green triangles are the position estimated by a Kalman filter.  At the bottom of the arc, a single bad LASER data point is injected.

Screenshot_20170801_170644

Credit to Udacity for the simulator.

The result is that the predicted position goes way off, and takes many times steps to recover.

The blame all lies on Sherlock Holmes:

sherlock

The product of N(2,0.5) * N(10,0.5) results in a ridiculous improbable scenerio – we have to scale up the result by 10^{14} .  i.e. treating the highly improbable as truth.  Perhaps what Sherlock should have said is that it is far more likely that you were simply mistaken in elimination!  i.e. your prior or data was wrong!

Fixing the basic Kalman Filter

Here’s a small change we could make.  We can add a small probability of ‘wtf’ to our system.

  • Prior: You currently estimate the position of a person in front of you to be at position:  x_{prior} = 2 \pm 0.5   plus a W_{prior} chance of ‘wtf’:   aka:X \sim (1-W_{prior}) \cdot N(2,0.5) + W_{prior} \cdot \textrm{WTF}
  • Data:  Your instruments return a new measure:  the person is now measured to be at position: x_{data} = 10 \pm 0.5   plus a W_{data} chance of ‘wtf’:   aka:
    X \sim (1-W_{data}) \cdot N(10,0.5) + W_{data} \cdot \textrm{WTF}

  • Posterior: You apply your bayes rule.  What do we get now?  We need the product of these two random variables, and then normalize, by setting C such that the total integral is 1.\displaystyle{ \left( (1-W_{prior}) \cdot N(2,0.5) + W_{prior} \cdot \textrm{WTF} \right) \cdot \left(  (1-W_{data}) \cdot N(10,0.5) + W_{data} \cdot \textrm{WTF} \right) / C}(note that N(10,0.5) \cdot N(2,0.5) is proportionate to a new Gaussian scaled by a factor S)= ( (1-W_{prior}) \cdot (1-W_{data}) \cdot N(6,0.25) \cdot S + (1-W_{prior}) \cdot W_{data} \cdot N(2,0.5) +
    W_{prior} \cdot (1-W_{data}) \cdot N(10,0.5) + W_{prior} \cdot W_{data} \cdot \textrm{WTF}^2 ) / C We know that the area must equal 1 for a PDF, so we can determine the scale factor C.  See Appendix below for any technical details, including the formula for S and C.So we have a solution, but the solution is a gaussian mixture model – a sum of a gaussian for the combination, a gaussian for the prior, and a gaussian for the data.  So let’s simply pick the one with the highest probability, and roll the other two into the WTF weight.

Here’s the result as an gif – the posterior is where the posterior would normally be, and the shaded blue area is our ‘wtf’ version of the posterior as a weighted mixture model:

anim.gif

Various different priors and data.  There’s no updating step here – just showing different mean values for the prior and data.  The posterior is kept centered to make it easier to see.

It looks pretty reasonable!  If the prior and data agree to within their uncertainties, then the mixture model just returns the posterior.  And when they disagree strongly, we return a mixture of them.

Let’s now make an approximation of that mixture model to a single gaussian, by choosing a gaussian that minimizes the KL divergence.

Screenshot_20170726_015120

Which results in the red line ‘Mixture Model Approximated’:

animapprox.gif

Different priors and data, with the red line showing this new method’s output.  As above, there’s no update step here – it’s not a time series.

Now this red line looks very reasonable!  When the prior and data disagree with each other, we have a prediction that is wide and covers both of them.  But when they agree with each other to within their uncertainties, we get a result that properly combines both and gives us a posterior result that is more certain than either – i.e. we are adding information.

In total, the code to just take a prior and data and return a posterior is this:
(Note: I’ve assumed there are no covariances – i.e. no correlation in the error terms)

# prior input
mu1 = 5
variance1 = 0.5
# data input
mu2 = 6
variance2 = 0.5
# This is what you would normally do, in a standard Kalman Filter:
variance3 = 1/(1/variance1 + 1/variance2)
mu3 = (mu1/variance1 + mu2/variance2)*variance3

#But instead we will do:
S = 1/(np.sqrt(2*np.pi*variance1*variance2/variance3)) * np.exp(-0.5 * (mu1 - mu2)**2 * variance3/(variance1*variance2))
Wprior = 0.01 # 1% chance our prior is wrong
Wdata = 0.01 # 1% chance our data is wrong
C = (1-Wprior) * (1-Wdata) * S + (1-Wprior) * Wdata + Wprior * (1-Wdata)
weights = [
Wprior * (1-Wdata) /C, # weight of prior
(1-Wprior) * Wdata /C, # weight of data
(1-Wprior) * (1-Wdata) * S/C ] #weight of posterior
mu4 = weights[0]*mu1 + weights[1]*mu2 + weights[2] * mu3
variance4 = weights[0]*(variance1 + (mu1 - mu4)**2) + \
weights[1]*(variance2 + (mu2 - mu4)**2) + \
weights[2]*(variance3 + (mu3 - mu4)**2)

So at mu3 and variance3  is the green curve in our graphs and is the usual approach, while mu4 and variance4 is my new approach and is a better estimation of posterior.

The full ipython notebook code for everything produced here, is in git, here:

https://github.com/johnflux/Improved-Kalman-Filter/blob/master/kalman.ipynb

Results: Testing out in Simulator

Unfortunately, in my simulator my kalman filter has a velocity component, which I don’t actually take into in my analysis yet.  My math completely ignores any dependent variables.  The problem is that when we increase the uncertainty of the position then we also need to increase the uncertainty of the velocity.

To salvage this, I used just a small part of the code:

<div>
<div>double Scale = 1/(sqrt(2*M_PI*variance1*variance2/variance3)) * exp(-0.5 * (mu1 – mu2)*(mu1 – mu2) * variance3/(variance1*variance2));</div>
<div>
<div>
<div>if (Scale < 0.00000000001)
return; /* Just completely ignore this data point */</div>
</div>
</div>
</div>

The result is this:

Screenshot_20170802_012822

Appendix:

Just some technical notes, for completeness.

  • We can set WTF = 1/(2A) where it can be anywhere between A and -A.  We can take A to be large or even take the limit to infinity.
  • It is a Probability Density Function since the area is 1:  \int^A_{-A} 1/(2A) dx  =  1 , and can set the Cauchy principal value of the integral to be 1 to keep it a PDF in the limit.
  • \int^A_{-A} \textrm{WTF}^2(x) dx \to 0 since:  \int^A_{-A} 1/(2A)^2 dx  =  2A/(4A^2) = 1/(2A)   which tends to 0 as A  \to \infty.
  • The scale factor S is simple but tedious to calculate:
    Screenshot_20170725_224207
  • The scale factor C can then be calculated by just setting the integral to equal 1 since it’s a PDF:  C = (1-W_{prior}) \cdot (1-W_{data}) \cdot S + (1-W_{prior}) \cdot W_{data} + W_{prior} \cdot (1-W_{data})

Python heat map image

For my self-driving car, I detect cars in the image by scanning over the image in windows of varying sizes, and use the detection results to build up a ‘heat map’ of the most likely areas for where the cars are.

For visualization I came up with this idea:

heatmap_integrated_4

In code:

heat_img = plt.cm.hot(heat/np.max(heat))
heat_img[heat&lt;1,3] = 0 image2 = np.copy(image) image2[heat&gt;1,:] = image2[heat&gt;1,:]/3 + (heat_img[heat&gt;1,0:3]*256*2/3).astype(np.uint8)
plt.imshow(image2)

My Self Driving Car on a difficult track at speed

Well, in a simulator 🙂

(Sorry about the bad filming – it was the easiest way to get even a semi-decent quality)

Code is here: https://github.com/johnflux/CarND-Behavioral-Cloning-P3

I basically follow the NVidia End to End Learning for Self-Driving Cars paper with a few tweaks to add in dropout, slightly larger Dense layers to compensate, and an additional convolution layer to handle the different input image resolution.  It runs in real time on a GTX 980 Ti.

Tips on machine learning with TensorFlow / Keras

I see people new to machine learning make the same sort of mistakes.  Mistakes that I’ve made myself and then painstakingly tried to fix.  So here’s some hints that I’ve had to work out the hard way, along with some code.

As an example, I’ve used my self driving car code, which I’ll put up on the web.

  • Plot your input and expected output data!  For the training, validation and test data, individually.  It’s too easy to accidentally shuffle your data in the wrong place or to have unbalanced data etc.

    Example:

    def showDrivingAngles(samples, title="samples"):
        plt.hist([sample.driving_angle for sample in samples ], 16)
        plt.title("Driving angle distribution in " + title)
        plt.show()
    showDrivingAngles(samples)
    
  • Make sure your data is balanced, by over-sampling, under-sampling or generating fake data.

    Example to over-sample:  (Suggestions on making this better are welcome!)

    def duplicateSamplesToRebalanceByDrivingAngle(samples):
        # Bin the data, returning an array of which bin each sample is in
        num_bins = 16
        indexes = np.digitize([sample.driving_angle for sample in samples], np.arange(-1, 1, 2/num_bins))
        # Find how many samples are in the largest bin
        largest_bin_count = np.max(np.bincount(indexes))
        print(largest_bin_count)
        rebalanced_samples = []
        for j in range(num_bins):
            bin_indexes = np.where(indexes==(j+1))[0]
                for i in range(largest_bin_count):
                    rebalanced_samples.append(samples[bin_indexes[i % len(bin_indexes)]])
        random.shuffle(rebalanced_samples)
        return rebalanced_samples
    

    (And of course, plot your data afterwards like above, to check that it worked correctly)

  • View random samples of your data after preprocessing.  It’s worthwhile to get a bit fancy here.  Show the label for the data here.  For example, I draw an arrow directly on the image to indicate which direction the label says that it should go in:
    image = self.getImage()
     s = image.shape
     line_len = s[0]//2
     angle = self.driving_angle / 360 * np.pi * 100 # Times 100 just to make it more visible
     line_y, line_x = int(line_len * np.cos(angle)), int(line_len * np.sin(angle))
     rr,cc = draw.line(s[0]-1, s[1]//2, s[0]-1-line_y, s[1]//2 + line_x)
     image[rr,cc,:] = 255
     plt.imshow(image)
     plt.title("{} degrees".format(float(self.driving_angle)))
     plt.show()
    

    view from right driving straight

  • Check your colorspace! Notice that in the image above, the colors are all wrong. This was totally on purpose, to demonstrate another point – Make sure you understand what colorspace your data is in! cv2.imread() return BGR. While almost everything else is going to use RGB. I highly recommend converting to YUV immediately, and using that everywhere. Make sure you do this in both the code that trains the model, and code that uses the model.
  • You can do preprocessing in the model itself.  This is extremely useful if you want to use your model in a program and don’t want to copy the preprocessing code.

    Example to crop and normalize image:

    inputs = keras.layers.Input(shape=(160,320,3))
    output = keras.layers.convolutional.Cropping2D(cropping=((55, 25), (0, 0)))(inputs)
    output = Lambda(lambda x: x/127.5 - 1., input_shape=imageshape, output_shape=imageshape)(output)
    
  • If you do preprocess in the model, preview the output.  You can do this by creating a temporary model and using predict.
    Example:

    def showLayerOutput(sample):
        model = Model(inputs=inputs, outputs=output)
        sample.showImage()
        croppedimage = model.predict(np.array([sample.getImage()]))[0]
        print(croppedimage)
        plt.imshow(croppedimage)
        plt.show()
    showLayerOutput(train_samples[0])
    
  • Use asserts on the shape of your model.  For example:
    assert output.shape[1:] == (1,31,64)
    
  • Print out some outputs!  This is really important to keep an eye on what the predictions are.  Are they never negative when you want them to be?  Are they never larger than one when you want them to be?  (You’re using the wrong activation function on the last layer). Is it always outputting zero or similar (learning rate too high?)
    In keras you do this with a callback.

    Example:

    class DebugCallback(keras.callbacks.Callback):
        def on_epoch_end(self, batch, logs={}):
            samples = train_samples[:5]
            print()
            print(logs)
            print("Should be: ", [sample.driving_angle for sample in samples])
            print("Predicted: ", [x[0] for x in model.predict(np.array([sample.getImage() for sample in samples]))] ) #Print predicted driving angle for first example
    
    debugCallback = DebugCallback()
    model.fit_generator(train_generator, epochs=200, steps_per_epoch= len(train_samples)/batch_size, validation_data=validation_generator, validation_steps =len(validation_samples)/batch_size, callbacks=[tbCallBack, checkpointCallback, reduce_lr, debugCallback])