Erasing background from an image

I have two opaque images –  one with an object and a background, and another with just the background.  Like:

I want to subtract the background from the image so that the alpha blended result is visually identical, but the foreground is as transparent as possible.

E.g:

lion

Desired output (All images under Reuse With Modification license)

I’m sure that this must have been, but I couldn’t find a single correct way of doing this!

I asked a developer from the image editor gimp team, and they replied that the standard way is to create an alpha mask on the front image from the difference between the two images.  i.e. for each pixel in both layers, subtract the rgb values, average that difference between the three channels, and then use that as an alpha.

But this is clearly not correct.  Imagine the foreground has a green piece of semi-transparent glass against a red background.  Just using an alpha mask is clearly not going to subtract the background because you need to actually modify the rgb values in the top layer image to remove all the red.

So what is the correct solution?  Let’s do the calculations.

If we have a solution, the for a solid background with a semi-transparent foreground layer that is alpha blended on top, the final visual color is:

out_{rgb} = src_{rgb} * src_{alpha} + dst_{rgb} \cdot (1-src_{alpha})

We want the visual result to be the same, so we know the value of out_{rgb} – that’s our original foreground+background image.  And we know dst_{rgb} – that’s our background image.  We want to now create a new foreground image, src_{rgb}, with the maximum value of src_{alpha}.

So to restate this again – I want to know how to change the top layer src so that I can have the maximum possible alpha without changing the final visual image at all.  I.e. remove as much of the background as possible from our foreground+background image.

Note that we also have the constraint that for each color channel, that src_{rgb} \le 1 since each rgb pixel value is between 0 and 1.  So:

src_{alpha} \le (out_{rgb} - dst_{rgb})/(1-dst_{rgb})

So:

src_{alpha} = Min((out_r - dst_r)/(1-dst_r), out_g - dst_g)/(1-dst_g), out_b - dst_b)/(1-dst_b))\\ src_{rgb} = (dst_{rgb} \cdot (1-src_{alpha}) - out_{rgb})/src_{alpha}

Proposal

Add an option for the gimp eraser tool to ‘remove layers underneath’, which grabs the rgb value of the layer underneath and applies the formula using the alpha in the brush as a normal erasure would, but bounding the alpha to be no more than the equation above, and modifying the rgb values accordingly.

Result

I showed this to the Gimp team, and they found a way to do this with the latest version in git.  Open the two images as layers.  For the top layer do: Layer->Transparency->Add Alpha Channel.  Select the Clone tool.  On the background layer, ctrl click anywhere to set the Clone source.  In the Clone tool options, choose Default and Color erase, and set alignment to Registered.  Make the size large, select the top layer again, and click on it to erase everything.

Result is:

Screenshot_20170307_133539.png

When the background is a very different color, it works great – the sky was very nicely erased.  But when the colors are too similar, it goes completely wrong.

Overall..  a failure.  But interesting.

Advertisements

Second order back propagation

This is a random idea that I’ve been thinking about. A reader messaged me to say that this look similar to online l-bgfs. To my inexperienced eyes, I can’t see this myself, but I’m still somewhat a beginner.

photo

Yes, I chose a cat example simply so that I had an excuse to add cats to my blog.

Say we are training a neural network to take images of animals and classify the image as being an image of a cat or not a cat.

You would train the network to output, say, y_{cat} = 1 if the image is that of a cat.

To do so, we can gather some training data (images labelled by humans), and for each image we see what our network predicts (e.g. “I’m 40% sure it’s a cat”).  We compare that against what a human says (“I’m 100% sure it’s a cat”)  find the squared error (“We’re off by 0.6, so our squared error is 0.6^2”) and adjust each parameter, w_i, in the network so that it slightly decreases the error (\Delta w_i  = -\alpha \partial E/\partial w_i).  And then repeat.

It’s this adjustment of each parameter that I want to rethink.  The above procedure is Stochastic Gradient Descent (SGD) – we adjust w_i to reduce the error for our test set (I’m glossing over overfitting, minibatches, etc).

Key Idea

This means that we are also trying to look for a local minimum. i.e. that once trained, we want the property that if we varied any of the parameters w_i by a small amount then it should increase the expected squared error E

My idea is to encode this into the SGD update. To find a local minima for a particular test image we want:

\dfrac{\partial y_{cat}}{\partial w_i} = 0

\dfrac{\partial^2y_{cat}}{\partial w_i^2}  < 0 (or if it equals 0, we need to consider the third differential etc).

Let’s concentrate on just the first criteria for the moment. Since we’ve already used the letter E to mean the half squared error of y, we’ll use F to be the half squared error of \dfrac{\partial y_{cat}}{\partial w_i}.

So we want to minimize the half squared error F:

F = \dfrac{1}{2}\left(\dfrac{\partial y_{cat}}{\partial w_i}\right)^2

So to minimize we need the gradient of this error:

\dfrac{\partial F}{\partial w_i} = \dfrac{1}{2} \dfrac{\partial}{\partial w_i} \left(\dfrac{\partial y_{cat}}{\partial w_i}\right)^2 = 0

Applying the chain rule:

\dfrac{\partial F}{\partial w_i} = \dfrac{\partial y_{cat}}{\partial w_i} \dfrac{\partial^2 y_{cat}}{\partial w_i^2} = 0

SGD update rule

And so we can modify our SGD update rule to:

\Delta w_i = -\alpha \partial E/\partial w_i - \beta \dfrac{\partial y_{cat}}{\partial w_i} \dfrac{\partial^2 y_{cat}}{\partial w_i^2}

Where \alpha and \beta are learning rate hyperparameters.

Conclusion

We finished with a new SGD update rule. I have no idea if this actually will be any better, and the only way to find out is to actually test. This is left as an exercise for the reader 😀

Best-fitting to a Cumulative distribution function in python TensorFlow

I wanted to find a best fit curve for some data points when I know that the true curve that I’m predicting is a parameter free Cumulative Distribution Function.  I could just do a linear regression on the points, but the resulting function, y might not have the properties that we desire from a CDF, such as:

  • Monotonically increasing
  • y(x) \xrightarrow{x\to\infty} 1      i.e. That it tends to 1 as x approaches positive infinity
  • y(x) \xrightarrow{x\to-\infty} 0      i.e. That it tends to 0 as x approaches negative infinity

First, I take the second two points.  To deal with this, I use the sigmoid function to create a new parameter, x’:

x'(x) = \textrm{sigmoid}(x)

This has the nice property that x'(x) \xrightarrow{x\to\infty} 1 and x'(x) \xrightarrow{x\to-\infty} 0

So now we can find a best fit polynomial on x'.

So:

y(x') = a + bx' + cx'^2 + dx'^3 + .. + zx'^n

But with the boundary conditions that:

y(0) = 0 and y(1) = 1

Which, solving for the boundary conditions, gives us:

y(0) = 0 = a
b = (1-c-d-e-...-z)

Which simplifies our function to:

y(x') = (1-c-d-e-..-z)x' + cx'^2 + dx'^3 + ex'^4 .. + zx'^n

Implementing this in tensorflow using stochastic gradient descent (code is below) we get:

figure_1-2

20 data samples

figure_1-1

100 data samples

(The graph title equation is wrong.  It should be   x' = sigmoid(x) then y(x') = ax' +...  I was too lazy to update the graphs sorry)

Unfortunately this curve doesn’t have one main property that we’d like – it’s not monotonically increasing – it goes above 1. I thought about it for a few hours and couldn’t think of a nice solution.

I’m sure all of this has been solved properly before, but with a quick google I couldn’t find anything.

Edit: I’m updating this 12 days later. I have been thinking in the background about how to enforce that I want my function to be monotonic. (i.e. always increasing/decreasing) I was certain that I was just being stupid and that there was a simple answer. But by complete coincidence, I watched the youtube video Breakthroughs in Machine Learning – Google I/O 2016 in which the last speaker mentioned this restriction.  It turns out that this a very difficult problem – with the first solutions having a time complexity of O(N^4):

I didn’t understand what exactly google’s solution for this is, but they reference their paper on it:  Monotonic Calibrated Interpolated Look-up Tables, JMLR 2016.  It seems that I have some light reading to do!

#!/usr/bin/env python

import tensorflow as tf

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from scipy.stats import norm

# Create some random data as a binned normal function
plt.ion()
n_observations = 20
fig, ax = plt.subplots(1, 1)
xs = np.linspace(-3, 3, n_observations)

ys = norm.pdf(xs) * (1 + np.random.uniform(-1, 1, n_observations))

ys = np.cumsum(ys)

ax.scatter(xs, ys)
fig.show()
plt.draw()

highest_order_polynomial = 3

# We have our data now, so on with the tensorflow
# Setup the model graph

# Our input is an arbitrary number of data points (That's what the 'None dimension means)
# and each input has just a single value which is a float

X = tf.placeholder(tf.float32, [None])
Y = tf.placeholder(tf.float32, [None])

# Now, we know data fits a CDF function, so we know that
# ys(-inf) = 0   and ys(+inf) = 1

# So let's set:

X2 = tf.sigmoid(X)

# So now X2 is between [0,1]

# Let's now fit a polynomial like:
#
#   Y = a + b*(X2) + c*(X2)^2 + d*(X2)^3 + .. + z(X2)^(highest_order_polynomial)
#
# to those points.  But we know that since it's a CDF:
#   Y(0) = 0 and y(1) = 1
#
# So solving for this:
#   Y(0) = 0 = a
#   b = (1-c-d-e-...-z)
#
# This simplifies our function to:
#
#   y = (1-c-d-e-..-z)x + cx^2 + dx^3 + ex^4 .. + zx^(highest_order_polynomial)
#
# i.e. we have highest_order_polynomial-2  number of weights
W = tf.Variable(tf.zeros([highest_order_polynomial-1]))

# Now set up our equation:
Y_pred = tf.Variable(0.0)
b = (1 - tf.reduce_sum(W))
Y_pred = tf.add(Y_pred, b * tf.sigmoid(X2))

for n in xrange(2, highest_order_polynomial+1):
    Y_pred = tf.add(Y_pred, tf.mul(tf.pow(X2, n), W[n-2]))

# Loss function measure the distance between our observations
# and predictions and average over them.
cost = tf.reduce_sum(tf.pow(Y_pred - Y, 2)) / (n_observations - 1)

# Regularization if we want it.  Only needed if we have a large value for the highest_order_polynomial and are worried about overfitting
# It's also not clear to me if regularization is going to be doing what we want here
#cost = tf.add(cost, regularization_value * (b + tf.reduce_sum(W)))

# Use stochastic gradient descent to optimize W
# using adaptive learning rate
optimizer = tf.train.AdagradOptimizer(100.0).minimize(cost)

n_epochs = 10000
with tf.Session() as sess:
    # Here we tell tensorflow that we want to initialize all
    # the variables in the graph so we can use them
    sess.run(tf.initialize_all_variables())

    # Fit all training data
    prev_training_cost = 0.0
    for epoch_i in range(n_epochs):
        sess.run(optimizer, feed_dict={X: xs, Y: ys})

        training_cost = sess.run(
            cost, feed_dict={X: xs, Y: ys})

        if epoch_i % 100 == 0 or epoch_i == n_epochs-1:
            print(training_cost)

        # Allow the training to quit if we've reached a minimum
        if np.abs(prev_training_cost - training_cost) &amp;lt; 0.0000001:
            break
        prev_training_cost = training_cost
    print &quot;Training cost: &quot;, training_cost, &quot; after epoch:&quot;, epoch_i
    w = W.eval()
    b = 1 - np.sum(w)
    equation = &quot;x' = sigmoid(x), y = &quot;+str(b)+&quot;x' +&quot; + (&quot; + &quot;.join(&quot;{}x'^{}&quot;.format(val, idx) for idx, val in enumerate(w, start=2))) + &quot;)&quot;;
    print &quot;For function:&quot;, equation
    pred_values = Y_pred.eval(feed_dict={X: xs}, session=sess)
    ax.plot(xs, pred_values, 'r')
    plt.title(equation)

fig.show()
plt.draw()
#ax.set_ylim([-3, 3])
plt.waitforbuttonpress()

Woodworking – Make a bridge

I wanted to make a bridge out of wooden bricks, with no glue or connectors, held up by gravity only.

The bridge is an inverted catenary:

Which follows the equation:

y = a \cdot \cosh(x/a)

Now, turning this into IPython code:

L = 300.0 # 300mm wide bridge.  All lengths in millimetres
A = 100.0 # Scaling factor

def y(x):
    return -A*( math.cosh(x/A) - math.cosh((L/2)/A))

X = linspace(-L/2, L/2)
Y = [y(x) for x in X]

fig, ax = subplots(1,1)
fig.set_size_inches(14,7)
ax.plot(X,Y,'r')
xlabel('x')
ylabel('height')
title('Bridge curve')
show()

Which produces:

bridge

This is looking good.

Now, I want to build this out of wood, so we won’t actually have this curve, but a ‘blocky’ version of this.

class Block:
    '''A wooden block'''
    def __init__(self, start_x, end_x):
        self.start_x = start_x
        self.start_y = y(start_x)
        self.end_x = end_x
        self.end_y = y(end_x)
        self.length_x = end_x - start_x
        self.midpoint_x = start_x + self.length_x/2.0
        self.tangent = (self.end_y - self.start_y)  / self.length_x        
        self.midpoint_y = self.start_y + (self.end_y - self.start_y)/2.0
        self.coords = [] # filled in later
    def angle(self):
        return math.atan(self.tangent)

step = 60.0

def plotTangent(block):
    xstart = block.midpoint_x-block.length_x/2
    xend = block.midpoint_x+block.length_x/2
    ystart = block.midpoint_y + block.tangent * (xstart - block.midpoint_x)
    yend = block.midpoint_y + block.tangent * (xend - block.midpoint_x)
    ax.plot([block.start_x, block.end_x], [block.start_y, block.end_y], 'k-', lw=1)

blocks = []

for start_x in linspace(-L/2, L/2, L/step, endpoint=False):
    block = Block(start_x, start_x+step)
    blocks.append(block)
    plotTangent(block)
fig

Which produces:

bridge2
Now we have to decide how to cut the wood. To produce the arch, the angle of each piece of wood (the tangent lines), relative to the horizontal, is just: \tan(\theta) = tangent. So the angle between two adjacent pieces of wood is the difference between their angles.

The wood will be parallel with those tangent lines, so we can draw on the normal lines for those tangent lines, of the thickness of the wood:

def dy(x):
    '''return an approximation of dy/dx'''
    return (y(x+0.05) - y(x-0.05)) / 0.1

def normalAngle(x):
    if x == -L/2:
        return 0
    elif x == L/2:
        return math.pi
    return math.atan2(-1.0, dy(x));

WoodThickness = 50.0 # in millimeters

def getNormalCoords(x, ymidpoint, lineLength):
    tangent = dy(x)
    if tangent != 0:
        if x == -L/2 or x == L/2:
            normalSlope = 0
        else:
            normalSlope = -1 / dy(x)
        xlength = lineLength * math.cos(normalAngle(x))
        xstart = x-xlength/2
        xend = x+xlength/2
        ystart = ymidpoint + normalSlope * (xstart - x)
        yend = ymidpoint + normalSlope * (xend - x)
    else:
        xstart = x
        xend = x
        ystart = ymidpoint + lineLength/2
        yend = ymidpoint - lineLength/2
    return (xstart, ystart, xend, yend)

for block in blocks:
    (xstart, ystart, xend, yend) = getNormalCoords(block.midpoint_x, block.midpoint_y, WoodThickness)
    ax.plot([xstart, xend], [ystart, yend], 'b-', lw=1)
fig

bridge3

Now, the red curve is going to represent the line of force. My reasoning is that if there was any force normal to the red curve, then the string/bridge/weight would thus accelerate and change position.

So to minimize the lateral forces where the wood meets, we want the wood cuts to be normal to the red line.  We add that, then draw the whole block:

def getWoodCorners(block, x):
    ''' Return the top and bottom of the piece of wood whose cut passes through x,y(x), and whose middle is at xmid '''
    adjusted_thickness = WoodThickness / math.cos(normalAngle(x) - normalAngle(block.midpoint_x))
    return getNormalCoords(x, y(x), adjusted_thickness)

def drawPolygon(coords, linewidth):
    xcoords,ycoords = zip(*coords)
    xcoords = xcoords + (xcoords[0],)
    ycoords = ycoords + (ycoords[0],)
    ax.plot(xcoords, ycoords, 'k-', lw=linewidth)

#Draw all the blocks
for block in blocks:
    (xstart0, ystart0, xend0, yend0) = getWoodCorners(block, block.start_x) # Left side of block
    (xstart1, ystart1, xend1, yend1) = getWoodCorners(block, block.end_x) # Right side of block
    block.coords = [ (xstart0, ystart0), (xstart1, ystart1), (xend1, yend1), (xend0, yend0) ]
    drawPolygon(block.coords, 2) # Draw block

fig

bridge4

There’s a few things to note here:

  • The code appears to do some pretty redundant calculations when calculating the angles – but much of this is to make the signs of the angles correct.  It’s easier and nicer to let atan2 handle the signs for the quadrants, than to try to simplify the code and handle this ourselves.
  • The blocks aren’t actually exactly the same size at the points where they meet.  The differences are on the order of 1%, and not noticeable in these drawings however.

Now we need to draw and label these blocks in a way that makes it easiest to cut:

fig, ax = subplots(1,1)
fig.set_size_inches(17,7)
axis('off')

def rotatePolygon(polygon,theta):    
    """Rotates the given polygon which consists of corners represented as (x,y), around the ORIGIN, clock-wise, theta radians"""
    return [(x*math.cos(theta)-y*math.sin(theta) , x*math.sin(theta)+y*math.cos(theta)) for (x,y) in polygon]

def translatePolygon(polygon, xshift,yshift):
    """Shifts the polygon by the given amount"""
    return [ (x+xshift, y+yshift) for (x,y) in polygon]

sawThickness = 3.0 # add 3 mm gap between blocks for saw thickess
#Draw all the blocks
xshift = 10.0 # Start at 10mm to make it easier to cut by hand
topCutCoords = []
bottomCutCoords = []
for block in blocks:
    coords = translatePolygon(block.coords, -block.midpoint_x, -block.midpoint_y)
    coords = rotatePolygon(coords, -block.angle())
    coords = translatePolygon(coords, xshift - coords[0][0], -coords[3][1])
    xshift = coords[1][0] + sawThickness
    drawPolygon(coords,1)
    (topLeft, topRight, bottomRight, bottomLeft) = coords
    itopLeft = int(round(topLeft[0]))
    itopRight = int(round(topRight[0]))
    ibottomLeft = int(round(bottomLeft[0]))
    ibottomRight = int(round(bottomRight[0]))
    topCutCoords.append(itopLeft)
    topCutCoords.append(itopRight)
    bottomCutCoords.append(ibottomLeft)
    bottomCutCoords.append(ibottomRight)
    ax.text(topLeft[0], topLeft[1], itopLeft)
    ax.text(topRight[0], topRight[1], itopRight, horizontalalignment='right')
    ax.text(bottomLeft[0], bottomLeft[1], ibottomLeft, verticalalignment='top', horizontalalignment='center')
    ax.text(bottomRight[0], bottomRight[1], ibottomRight, verticalalignment='top', horizontalalignment='center')

print "Top coordinates:", topCutCoords
print "Bottom Coordinates:", bottomCutCoords

Which produces:

Top coordinates: [10, 141, 144, 228, 231, 307, 310, 394, 397, 528]
Bottom Coordinates: [43, 131, 156, 214, 247, 291, 324, 382, 407, 495]

bridge5

Now to cut this!

Flight aerodynamics simulator

I wanted to have a modern aerodynamics simulator, to test out my flight control hardware and software.

Apologies in advance for a very terse post.  I spent a lot of hours in a very short timespan to do this as a quick experiment.

So I used Unreal Engine 4 for the graphics engine, and built on top of that.

The main forces for a low speed aircraft that I want to model:

  1. The lift force due to the wing
  2. The drag force due to the wind
  3. Gravity
  4. Any horizontal forces from propellers in aircraft-style propulsion
  5. Any vertical forces from propellers in quadcopter-style propulsion

Lift force due to the wing

The equation for the lift force is:

L = C_l \cdot A \cdot .5 \cdot r \cdot V^2

I created a function for the lift coefficient, C_l, based on the angle, by calculating it theoretically.  To get proper results, I would need to actually measure this in a wind tunnel, but this is good enough for a first approximation:

micron lift coefficients

The horizontal axis is the angle of attack, in degrees.  When the aircraft is flying “straight”, the angle of attack is not usually 0, but around 5 to 10 degrees, thus still providing an upward force.

I repeat this for each force in turn.

3D Model

To visualize it nicely, I modelled the craft in blender, manually set the texture space, and painted the texture in gimp.  As you can tell from the texture space, there are several horrible problems with the geometry ‘loops’.  But it took a whole day to get the top and bottom looking decent, and it was close enough for my purposes.

 

I imported the model into Unreal Engine 4, and used a high-resolution render for the version in the top right, and used a low-resolution version for the game. screenshot2

textured_micron_screenshot

Next, here’s the underneath view.  You can see jagged edges in the model here, because at the time I didn’t understand how normal smoothing worked.  After a quick google, I selected those vertexes in blender and enabled normal smoothing on them and fixed that.

ue4_2(1)

and then finally, testing it out on the real thing:

Screenshot_20180626_232400

Screenshot_20180626_232350

Architecture

The architecture is a fairly standard Hardware-in-loop system.

The key modules are:

simulator

  • The Flight Controller which controls the craft.  This is not used if we are connected to external real hardware, such as the my controller.
  • The Communication Module to the real hardware, receiving information about the desired thrust of the engines and the Radio Control inputs from the user, and sending information about the current simulated position and speed.
  • The Physics Simulator which calculates the physical forces on the craft and applies them.
  • The User Interface which displays a lot of information about the craft as well as the internal controller.

The low level flight controller and network communication code is written in C++.  Much of the high level logic is written in a visual language called ‘Blueprint’.

User Interface

From the GUI User Interface, you can control:

  • The aerodynamic forces on the wings
  • The height above ground to air pressure curve
  • The wing span and aerofoil chord length
  • The moment of inertia
  • The thrust of the turbines
  • The placement of the turbines
  • The PID values for the internal controller

Result

It worked pretty well.

It uses Hardware-In-Loop to allow the real hardware to control this, including a RC-Transmitter.
In this video I am allowing the PID algorithms to control the roll, pitch, yaw and height, which I then add to in order to control it.

 

 

PID Tuning

I implemented an auto-tuner for the PID algorithm, which you can tune and trigger from the GUI.

 Hardware

I used two arduinos to control the system:

overview

And set up the Arduinos as so:  (Btw, I used the Fritzing software for this – it’s pretty cool).

neva arduino_bb(1)

 

And putting together the hardware for testing (sorry for the mess).  (I’m using QGroundControl to test it out).

And then mounting the hardware on a bit of wood as a base to keep it all together and make it more tidy:

20140822_202401.jpg

I will hopefully later make a post about the software controlling this.

ESC Motor Controller Delay

I was particularly worried about the delay that the controller introduces.

I modified the program and used a basic UFO style quadcopter, then added in a 50ms buffer, to test the reaction.

For reference, here’s a photo of the real thing that I work on:

The quadcopter is programmed to try to hover over the chair.

 

I also tested with different latencies:

 

50ms really is the bare minimum that you can get away with.

These are manually tuned PIDs.

I did also simulate this system in 1D in Matlab’s Simulink:

simulink_roll

A graph of the amplitude:

matlab estimated transfer function sin

And finally, various bode plots etc.  Just click on any for a larger image.  Again, apologies for the awful terseness.

Nokia 6110 Part 3 – Algorithms

This is a four part series:

We need an AI that can play a perfect game of Snake.  To collect the food without crashing into itself.

But it needs to run in real time on an 8Mhz processor, and use around 1 KB of memory.  The extreme memory and speed restrictions means that we do not want to allocate any heap memory, and should run everything on the stack.

None of the conventional path finding algorithms would work here, so we need a new approach.  The trick here is to realise that in the end part of the game, the snake will be following a planar Hamiltonian Cycle of a 2D array:

Hamiltonian cycle

So what we can do is precompute a random hamiltonian cycle at the start of each game, then have the snake follow that cycle.  It will thus pass through every point, without risk of crashing, and eventually win the game.

There are various way to create a hamiltonian cycle, but one way is to use Prim’s Algorithm to generate a maze of half the width and half the height.  Then walk the maze by always turning left when possible.  The resulting path will be twice the width and height and will be a hamiltonian cycle.

Now, we could have the snake just follow the cycle the entire time, but the result is very boring to watch because the snake does not follow the food, but instead just follows the preset path.

To solve this, I invented a new algorithm which I call the pertubated hamiltonian cycle.  First, we imagine the cycle as being a 1D path.  On this path we have the tail, the body, the head, and (wrapping back round again) the tail again.

The key is realising that as long as we always enforce this order, we won’t crash.  As long as all parts of the body are between tail and the head, and as long as our snake does not grow long enough for the head to catch up with the tail, we will not crash.

This insight drastically simplifies the snake AI, because now we only need to worry about the position of the head and the tail, and we can trivially compute (in O(1)) the distance between the head and tail since that’s just the difference in position in the linear 1D array:

maze layout

Now, this 1D array is actually embedded in a 2D array, and the head can actually move in any of 3 directions at any time.  This is equivalent to taking shortcuts through the 1D array.  If we want to stick to our conditions, then any shortcut must result in the head not overtaking the tail, and it shouldn’t overtake the food, if it hasn’t already.  We must also leave sufficient room between the head and tail for the amount that we expect to grow.

Now we can use a standard shortest-path finding algorithm to find the optimal shortcut to the food.  For example, an A* shortest path algorithm.

However I opted for the very simplest algorithm of just using a greedy search.  It doesn’t produce an optimal path, but visually it was sufficiently good.  I additionally disable the shortcuts when the snake takes up 50% of the board.

To test, I started with a simple non-random zig-zag cycle, but with shortcuts:

What you can see here is the snake following the zig-zag cycle, but occasionally taking shortcuts.

We can now switch to a random maze:

And once it’s working on the PC, get it working on the phone:

Code

The code is remarkably simple.  First, the code to pre-generate the hamiltonian circuit at the start of each game and provide some helper functions:


UTourNumber tourToNumber[ARENA_SIZE];

/* Take an x,y coordinate, and turn it into an index in the tour */
TourNumber getPathNumber(Coord x, Coord y) {
  return tourToNumber[x + ARENA_WIDTH*y];
}

Distance path_distance(Coord a, Coord b) {
  if(a<b)
    return b-a-1;
  return b-a-1+ARENA_SIZE;
}

struct Maze {
  struct Node {
    bool visited:1;
    bool canGoRight:1;
    bool canGoDown:1;
  };
  Node nodes[ARENA_SIZE/4];
  void markVisited(Coord x, Coord y) {
    nodes[x+y*ARENA_WIDTH/2].visited = true;
  }
  void markCanGoRight(Coord x, Coord y) {
    nodes[x+y*ARENA_WIDTH/2].canGoRight = true;
  }
  void markCanGoDown(Coord x, Coord y) {
    nodes[x+y*ARENA_WIDTH/2].canGoDown = true;
  }
  bool canGoRight(Coord x, Coord y) {
    return nodes[x+y*ARENA_WIDTH/2].canGoRight;;
  }
  bool canGoDown(Coord x, Coord y) {
    return nodes[x+y*ARENA_WIDTH/2].canGoDown;
  }
  bool canGoLeft(Coord x, Coord y) {
    if(x==0) return false;
    return nodes[(x-1)+y*ARENA_WIDTH/2].canGoRight;
  }

  bool canGoUp(Coord x, Coord y) {
    if(y==0) return false;
    return nodes[x+(y-1)*ARENA_WIDTH/2].canGoDown;
  }

  bool isVisited(Coord x, Coord y) {
    return nodes[x+y*ARENA_WIDTH/2].visited;
  }

  void generate() {
    memset(nodes, 0, sizeof(nodes));
    generate_r(-1,-1,0,0);
    generateTourNumber();
#ifdef LOG_TO_FILE
    writeMazeToFile();
    writeTourToFile();
#endif
  }
  void generate_r(Coord fromx, Coord fromy, Coord x, Coord y) {
    if(x < 0 || y < 0 || x >= ARENA_WIDTH/2 || y >= ARENA_HEIGHT/2)
      return;
    if(isVisited(x,y))
      return;
    markVisited(x,y);

    if(fromx != -1) {
      if(fromx < x)
        markCanGoRight(fromx, fromy);
      else if(fromx > x)
        markCanGoRight(x,y);
      else if(fromy < y)
        markCanGoDown(fromx, fromy);
      else if(fromy > y)
        markCanGoDown(x,y);

      //Remove wall between fromx and fromy
    }

    /* We want to visit the four connected nodes randomly,
     * so we just visit two randomly (maybe already visited)
     * then just visit them all non-randomly.  It's okay to
     * visit the same node twice */
    for(int i = 0; i < 2; i++) {
      int r = rand()%4;
      switch(r) {
        case 0: generate_r(x, y, x-1, y); break;
        case 1: generate_r(x, y, x+1, y); break;
        case 2: generate_r(x, y, x, y-1); break;
        case 3: generate_r(x, y, x, y+1); break;
      }
    }
    generate_r(x, y, x-1, y);
    generate_r(x, y, x+1, y);
    generate_r(x, y, x, y+1);
    generate_r(x, y, x, y-1);
  }

  SnakeDirection findNextDir(Coord x, Coord y, SnakeDirection dir) {
    if(dir == Right) {
      if(canGoUp(x,y))
          return Up;
      if(canGoRight(x,y))
        return Right;
      if(canGoDown(x,y))
        return Down;
      return Left;
    } else if(dir == Down) {
      if(canGoRight(x,y))
          return Right;
      if(canGoDown(x,y))
        return Down;
      if(canGoLeft(x,y))
        return Left;
      return Up;
    } else if(dir == Left) {
      if(canGoDown(x,y))
        return Down;
      if(canGoLeft(x,y))
        return Left;
      if(canGoUp(x,y))
          return Up;
      return Right;
    } else if(dir == Up) {
      if(canGoLeft(x,y))
        return Left;
      if(canGoUp(x,y))
          return Up;
      if(canGoRight(x,y))
        return Right;
      return Down;
    }
    return (SnakeDirection)-1; //Unreachable
  }
  void setTourNumber(Coord x, Coord y, TourNumber number) {
    if(getPathNumber(x,y) != 0)
      return; /* Back to the starting node */
    tourToNumber[x + ARENA_WIDTH*y] = number;
  }

  void generateTourNumber() {
    const Coord start_x = 0;
    const Coord start_y = 0;
    Coord x = start_x;
    Coord y = start_y;
    const SnakeDirection start_dir = canGoDown(x,y)?Up:Left;
    SnakeDirection dir = start_dir;
    TourNumber number = 0;
    do {
      SnakeDirection nextDir = findNextDir(x,y,dir);
      switch(dir) {
        case Right:
          setTourNumber(x*2,y*2,number++);
          if(nextDir == dir || nextDir == Down || nextDir == Left)
            setTourNumber(x*2+1,y*2,number++);
          if(nextDir == Down || nextDir == Left)
            setTourNumber(x*2+1,y*2+1,number++);
          if(nextDir == Left)
            setTourNumber(x*2,y*2+1,number++);
          break;
        case Down:
          setTourNumber(x*2+1,y*2,number++);
          if(nextDir == dir || nextDir == Left || nextDir == Up)
            setTourNumber(x*2+1,y*2+1,number++);
          if(nextDir == Left || nextDir == Up)
            setTourNumber(x*2,y*2+1,number++);
          if(nextDir == Up)
            setTourNumber(x*2,y*2,number++);
          break;
        case Left:
          setTourNumber(x*2+1,y*2+1,number++);
          if(nextDir == dir || nextDir == Up || nextDir == Right)
            setTourNumber(x*2,y*2+1,number++);
          if(nextDir == Up || nextDir == Right)
            setTourNumber(x*2,y*2,number++);
          if(nextDir == Right)
            setTourNumber(x*2+1,y*2,number++);
          break;
        case Up:
          setTourNumber(x*2,y*2+1,number++);
          if(nextDir == dir || nextDir == Right || nextDir == Down)
            setTourNumber(x*2,y*2,number++);
          if(nextDir == Right || nextDir == Down)
            setTourNumber(x*2+1,y*2,number++);
          if(nextDir == Down)
            setTourNumber(x*2+1,y*2+1,number++);
          break;
      }
      dir = nextDir;

      switch(nextDir) {
        case Right: ++x; break;
        case Left: --x; break;
        case Down: ++y; break;
        case Up: --y; break;
      }

    } while(number != ARENA_SIZE); //Loop until we return to the start
  }
#ifdef LOG_TO_FILE
  void writeTourToFile() {
    FILE *f = fopen("maps.txt", "w+");
    for(Coord y = 0; y < ARENA_HEIGHT; ++y) {
      for(Coord x = 0; x < ARENA_WIDTH; ++x)
        fprintf(f, "%4d", getPathNumber(x,y));
      fprintf(f, "\n");
    }
    fclose(f);
  }
  void writeMazeToFile() {
    FILE *f = fopen("maze.txt", "w+");
    for(Coord y = 0; y < ARENA_HEIGHT/2; ++y) {
      fprintf(f, "#");
      for(Coord x = 0; x < ARENA_WIDTH/2; ++x)
        if(canGoRight(x,y) && canGoDown(x,y))
          fprintf(f, "+");
        else if(canGoRight(x,y))
          fprintf(f, "-");
        else if(canGoDown(x,y))
          fprintf(f, "|");
        else
          fprintf(f, " ");
      fprintf(f, "#\n");
    }
    fclose(f);
  }
#endif
};

void aiInit() {
  Maze maze;
  maze.generate();
}

And now the AI code:


SnakeDirection aiGetNewSnakeDirection(Coord x, Coord y) {
  const TourNumber pathNumber = getPathNumber(x,y);
  const Distance distanceToFood = path_distance(pathNumber, getPathNumber(food.x, food.y));
  const Distance distanceToTail = path_distance(pathNumber, getPathNumber(snake.tail_x, snake.tail_y));
  Distance cuttingAmountAvailable = distanceToTail - snake.growth_length - 3 /* Allow a small buffer */;
  const Distance numEmptySquaresOnBoard = ARENA_SIZE - snake.drawn_length - snake.growth_length - food.value;
  // If we don't have much space (i.e. snake is 75% of board) then don't take any shortcuts */
  if (numEmptySquaresOnBoard < ARENA_SIZE / 2)
    cuttingAmountAvailable = 0;
  else if(distanceToFood < distanceToTail) { /* We will eat the food on the way to the tail, so take that into account */
    cuttingAmountAvailable -= food.value;
    /* Once we ate that food, we might end up with another food suddenly appearing in front of us */
    if ((distanceToTail - distanceToFood) * 4 > numEmptySquaresOnBoard) /* 25% chance of another number appearing */
      cuttingAmountAvailable -= 10;
  }
  Distance cuttingAmountDesired = distanceToFood;
  if(cuttingAmountDesired < cuttingAmountAvailable)
    cuttingAmountAvailable = cuttingAmountDesired;
  if(cuttingAmountAvailable < 0)
    cuttingAmountAvailable = 0;
  // cuttingAmountAvailable is now the maximum amount that we can cut by

  bool canGoRight = !check_for_collision(x+1, y);
  bool canGoLeft =  !check_for_collision(x-1, y);
  bool canGoDown =  !check_for_collision(x, y+1);
  bool canGoUp =    !check_for_collision(x, y-1);

  SnakeDirection bestDir;
  int bestDist = -1;
  if(canGoRight) {
    Distance dist = path_distance(pathNumber, getPathNumber(x+1, y));
    if(dist <= cuttingAmountAvailable && dist > bestDist) {
      bestDir = Right;
      bestDist = dist;
    }
  }
  if(canGoLeft) {
    Distance dist = path_distance(pathNumber, getPathNumber(x-1, y));
    if(dist <= cuttingAmountAvailable && dist > bestDist) {
      bestDir = Left;
      bestDist = dist;
    }
  }
  if(canGoDown) {
    Distance dist = path_distance(pathNumber, getPathNumber(x, y+1));
    if(dist <= cuttingAmountAvailable && dist > bestDist) {
      bestDir = Down;
      bestDist = dist;
    }
  }
  if(canGoUp) {
    Distance dist = path_distance(pathNumber, getPathNumber(x, y-1));
    if(dist <= cuttingAmountAvailable && dist > bestDist) {
      bestDir = Up;
      bestDist = dist;
    }
  }
  if (bestDist >= 0)
    return bestDir;

  if (canGoUp)
    return Up;
  if (canGoLeft)
    return Left;
  if (canGoDown)
    return Down;
  if (canGoRight)
    return Right;
  return Right;
}

Self Balancing Robot

This was my first electronics project – a simple self-balancing robot.

The wheels and chassis are from a toy truck.  Onto that I salotaped a breadboard with two Light-Dependant-Resistors, attached in series.  On one end I feed 5V, on the other end, ground.  This gives a voltage divider.  And in the middle I connect that to a resistor and a capacitor in parallel.  This gives me a Proportional Derivative signal (out of a PID controller).

This means that we can read off the current ratio of brightness that the two LDRs see, then add on to that signal the rate at which they are changing.  This helps prevent driving the motors too fast when we already moving rapidly towards the balancing point.

We then read this analog voltage on the Arduino, and use it to control the direction of the motors.

There is a huge amount of gear slop which is why it can only barely balance.