# math

# Laplace Transform – visualized

The Laplace Transform is a particular tool that is used in mathematics, science, engineering and so on. There are many books, web pages, and so on about it.

And yet I cannot find a single decent visualization of it! Not a single person that I can find appears to have tried to actually visualize what it is doing. There are plenty of animations for the Fourier Transform like:

But nothing for Laplace Transform that I can find.

So, I will attempt to fill that gap.

# What is the Laplace Transform?

It’s a way to represent a function that is 0 for time < 0 (typically) as a sum of many waves that look more like:

Note that what I just said isn’t *entirely* true, because there’s an imaginary component here too, and we’re actually integrating. So take this as a crude idea just to get started, and let’s move onto the math to get a better idea:

# Math

The goal of this is to visualize how the Laplace Transform works:

To do this, we need to look at the definition of the inverse Laplace Transform:

While pretty, it’s not so nice to work with, so let’s make the substitution:

so that our new limits are just to , and giving:

Which we will now **approximate **as:

# Code

The code turned out to be a bit too large for a blog post, so I’ve put it here:

https://github.com/johnflux/laplace-transform-visualized/blob/master/Laplace%20Transform.ipynb

# Results

**Note:** The graphs say “Next frequency to add: … where “, but really it should be “Next two frequencies to add: … where ” since we are adding two frequencies at a time, in such a way that their imaginary parts cancel out, allowing us to keep everything real in the plots. I fixed this comment in the code, but didn’t want to rerender all the videos.

A cubic polynomial:

A cosine wave:

Now a square wave. This has infinities going to infinity, so it’s not technically possible to plot. But I tried anyway, and it seems to visually work:

Now some that it absolutely can’t handle, like: . (A function that is 0 everywhere, except a sharp peak at exactly time = 0). In the S domain, this is a constant, meaning that we never converge. But visually it’s still cool.

Note that this never ‘settles down’ (converges) because the frequency is constantly increasing while the magnitude remains constant.

There is visual ‘aliasing’ (like how a wheel can appear to go backwards as its speed increases). This is not “real” – it is an artifact of trying to render high frequency waves. If we rendered (and played back) the video at a higher resolution, the effect would disappear.

At the very end, it appears as if the wave is just about to converge. This is not a coincidence and it isn’t real. It happens because the frequency of the waves becomes too high so that we just don’t see them, making the line appear to go smooth, when in reality the waves are just too close together to see.

The code is automatically calculating this point and setting our time step such that it only breaksdown at the very end of the video. If make the timestep smaller, this effect would disappear.

And a simple step function:

A sawtooth:

# Erasing background from an image

I have two opaque images – one with an object and a background, and another with just the background. Like:

I want to subtract the background from the image so that the alpha blended result is **visually identical**, but the foreground is as transparent as possible.

E.g:

I’m sure that this must have been, but I couldn’t find a single correct way of doing this!

I asked a developer from the image editor gimp team, and they replied that the standard way is to create an **alpha mask** on the front image from the **difference** between the two images. i.e. for each pixel in both layers, subtract the rgb values, average that difference between the three channels, and then use that as an alpha.

But this is clearly not correct. Imagine the foreground has a green piece of semi-transparent glass against a red background. Just using an alpha mask is clearly not going to subtract the background because you need to actually modify the rgb values in the top layer image to remove all the red.

So what is the correct solution? Let’s do the calculations.

If we have a solution, the for a solid background with a semi-transparent foreground layer that is alpha blended on top, the final visual color is:

We want the visual result to be the same, so we know the value of – that’s our original foreground+background image. And we know – that’s our background image. We want to now create a new foreground image, , with the maximum value of .

So to restate this again – I want to know how to change the top layer so that I can have the maximum possible alpha without changing the final visual image at all. I.e. remove as much of the background as possible from our foreground+background image.

Note that we also have the constraint that for each color channel, that since each rgb pixel value is between 0 and 1. So:

So:

# Proposal

Add an option for the gimp eraser tool to ‘remove layers underneath’, which grabs the rgb value of the layer underneath and applies the formula using the alpha in the brush as a normal erasure would, but bounding the alpha to be no more than the equation above, and modifying the rgb values accordingly.

# Result

I showed this to the Gimp team, and they found a way to do this with the latest version in git. Open the two images as layers. For the top layer do: Layer->Transparency->Add Alpha Channel. Select the Clone tool. On the background layer, ctrl click anywhere to set the Clone source. In the Clone tool options, choose Default and Color erase, and set alignment to Registered. Make the size large, select the top layer again, and click on it to erase everything.

Result is:

When the background is a very different color, it works great – the sky was very nicely erased. But when the colors are too similar, it goes completely wrong.

Overall.. a failure. But interesting.

# Second order back propagation

This is a random idea that I’ve been thinking about. A reader messaged me to say that this look similar to online l-bgfs. To my inexperienced eyes, I can’t see this myself, but I’m still somewhat a beginner.

Say we are training a neural network to take images of animals and classify the image as being an image of a cat or not a cat.

You would train the network to output, say, if the image is that of a cat.

To do so, we can gather some training data (images labelled by humans), and for each image we see what our network predicts (e.g. “I’m 40% sure it’s a cat”). We compare that against what a human says (“I’m 100% sure it’s a cat”) find the squared error (“We’re off by 0.6, so our squared error is 0.6^2”) and adjust each parameter, , in the network so that it slightly decreases the error (). And then repeat.

It’s this adjustment of each parameter that I want to rethink. The above procedure is Stochastic Gradient Descent (SGD) – we adjust to reduce the error for our test set (I’m glossing over overfitting, minibatches, etc).

Key Idea
This means that we are also trying to look for a local |

My idea is to encode this into the SGD update. To find a local minima for a particular test image we want:

(or if it equals 0, we need to consider the third differential etc).

Let’s concentrate on just the first criteria for the moment. Since we’ve already used the letter to mean the half squared error of , we’ll use to be the half squared error of .

So we want to minimize the half squared error :

So to minimize we need the gradient of this error:

Applying the chain rule:

SGD update rule
And so we can modify our SGD update rule to: Where and are learning rate hyperparameters. |

# Conclusion

We finished with a new SGD update rule. I have no idea if this actually will be any better, and the only way to find out is to actually test. This is left as an exercise for the reader 😀

# Best-fitting to a Cumulative distribution function in python TensorFlow

I wanted to find a best fit curve for some data points when I know that the true curve that I’m predicting is a parameter free Cumulative Distribution Function. I could just do a linear regression on the points, but the resulting function, might not have the properties that we desire from a CDF, such as:

- Monotonically increasing
- i.e. That it tends to 1 as x approaches positive infinity
- i.e. That it tends to 0 as x approaches negative infinity

First, I take the second two points. To deal with this, I use the sigmoid function to create a new parameter, x’:

This has the nice property that and

So now we can find a best fit polynomial on .

So:

But with the boundary conditions that:

and

Which, solving for the boundary conditions, gives us:

Which simplifies our function to:

Implementing this in tensorflow using stochastic gradient descent (code is below) we get:

(The graph title equation is wrong. It should be then . I was too lazy to update the graphs sorry)

Unfortunately this curve doesn’t have one main property that we’d like – it’s not monotonically increasing – it goes above 1. I thought about it for a few hours and couldn’t think of a nice solution.

I’m sure all of this has been solved properly before, but with a quick google I couldn’t find anything.

**Edit: ** I’m updating this 12 days later. I have been thinking in the background about how to enforce that I want my function to be monotonic. (i.e. always increasing/decreasing) I was certain that I was just being stupid and that there was a simple answer. But by complete coincidence, I watched the youtube video Breakthroughs in Machine Learning – Google I/O 2016 in which the last speaker mentioned this restriction. It turns out that this a very difficult problem – with the first solutions having a time complexity of :

I didn’t understand what exactly google’s solution for this is, but they reference their paper on it: Monotonic Calibrated Interpolated Look-up Tables, JMLR 2016. It seems that I have some light reading to do!

#!/usr/bin/env python import tensorflow as tf import numpy as np import tensorflow as tf import matplotlib.pyplot as plt from scipy.stats import norm # Create some random data as a binned normal function plt.ion() n_observations = 20 fig, ax = plt.subplots(1, 1) xs = np.linspace(-3, 3, n_observations) ys = norm.pdf(xs) * (1 + np.random.uniform(-1, 1, n_observations)) ys = np.cumsum(ys) ax.scatter(xs, ys) fig.show() plt.draw() highest_order_polynomial = 3 # We have our data now, so on with the tensorflow # Setup the model graph # Our input is an arbitrary number of data points (That's what the 'None dimension means) # and each input has just a single value which is a float X = tf.placeholder(tf.float32, [None]) Y = tf.placeholder(tf.float32, [None]) # Now, we know data fits a CDF function, so we know that # ys(-inf) = 0 and ys(+inf) = 1 # So let's set: X2 = tf.sigmoid(X) # So now X2 is between [0,1] # Let's now fit a polynomial like: # # Y = a + b*(X2) + c*(X2)^2 + d*(X2)^3 + .. + z(X2)^(highest_order_polynomial) # # to those points. But we know that since it's a CDF: # Y(0) = 0 and y(1) = 1 # # So solving for this: # Y(0) = 0 = a # b = (1-c-d-e-...-z) # # This simplifies our function to: # # y = (1-c-d-e-..-z)x + cx^2 + dx^3 + ex^4 .. + zx^(highest_order_polynomial) # # i.e. we have highest_order_polynomial-2 number of weights W = tf.Variable(tf.zeros([highest_order_polynomial-1])) # Now set up our equation: Y_pred = tf.Variable(0.0) b = (1 - tf.reduce_sum(W)) Y_pred = tf.add(Y_pred, b * tf.sigmoid(X2)) for n in xrange(2, highest_order_polynomial+1): Y_pred = tf.add(Y_pred, tf.mul(tf.pow(X2, n), W[n-2])) # Loss function measure the distance between our observations # and predictions and average over them. cost = tf.reduce_sum(tf.pow(Y_pred - Y, 2)) / (n_observations - 1) # Regularization if we want it. Only needed if we have a large value for the highest_order_polynomial and are worried about overfitting # It's also not clear to me if regularization is going to be doing what we want here #cost = tf.add(cost, regularization_value * (b + tf.reduce_sum(W))) # Use stochastic gradient descent to optimize W # using adaptive learning rate optimizer = tf.train.AdagradOptimizer(100.0).minimize(cost) n_epochs = 10000 with tf.Session() as sess: # Here we tell tensorflow that we want to initialize all # the variables in the graph so we can use them sess.run(tf.initialize_all_variables()) # Fit all training data prev_training_cost = 0.0 for epoch_i in range(n_epochs): sess.run(optimizer, feed_dict={X: xs, Y: ys}) training_cost = sess.run( cost, feed_dict={X: xs, Y: ys}) if epoch_i % 100 == 0 or epoch_i == n_epochs-1: print(training_cost) # Allow the training to quit if we've reached a minimum if np.abs(prev_training_cost - training_cost) &lt; 0.0000001: break prev_training_cost = training_cost print "Training cost: ", training_cost, " after epoch:", epoch_i w = W.eval() b = 1 - np.sum(w) equation = "x' = sigmoid(x), y = "+str(b)+"x' +" + (" + ".join("{}x'^{}".format(val, idx) for idx, val in enumerate(w, start=2))) + ")"; print "For function:", equation pred_values = Y_pred.eval(feed_dict={X: xs}, session=sess) ax.plot(xs, pred_values, 'r') plt.title(equation) fig.show() plt.draw() #ax.set_ylim([-3, 3]) plt.waitforbuttonpress()

# Four point gradient as a GLSL shader

For a client I needed a way to draw a gradient in realtime generated from four points, on a mobile phone. To produce an image like this, where the four gradient points can be moved in realtime:

I wanted to implement this using a shader, but after googling for a while I couldn’t find an existing implementation.

The problem is that I have the desired colours at four points, and need to calculate the colour at any other arbitrary point.

So we have three simultaneous equations (vectors in bold):

The key to solving this is to reduce the number of freedoms by setting .

Solving for gives us:

And solving for and :

so:

Defining:

Lets us now rewrite as:

Which written out for x and y is:

Now we have to consider the various different cases. If then we have:

Likewise if then we have:

Otherwise we are safe to divide by and so:

Which can be solved with the quadratic formula when A is not zero: (Note that only the positive solution makes physical sense for us, and we can assume )

and when A is zero (or, for computational rounding purposes, close to zero):

# Math Summary

We can determine and with:

# Implementation as a shader:

This can be implemented as a shader as following: (The following can be run directly on https://www.shadertoy.com for example)

void mainImage( out vec4 fragColor, in vec2 fragCoord ) { //Example colors. In practise these would be passed in // as parameters vec4 color0 = vec4(0.918, 0.824, 0.573, 1.0); // EAD292 vec4 color1 = vec4(0.494, 0.694, 0.659, 1.0); // 7EB1A8 vec4 color2 = vec4(0.992, 0.671, 0.537, 1.0); // FDAB89 vec4 color3 = vec4(0.859, 0.047, 0.212, 1.0); // DB0C36 vec2 uv = fragCoord.xy / iResolution.xy; //Example coordinates. In practise these would be passed in // as parameters vec2 P0 = vec2(0.31,0.3); vec2 P1 = vec2(0.7,0.32); vec2 P2 = vec2(0.28,0.71); vec2 P3 = vec2(0.72,0.75); vec2 Q = P0 - P2; vec2 R = P1 - P0; vec2 S = R + P2 - P3; vec2 T = P0 - uv; float u; float t; if(Q.x == 0.0 &amp;&amp; S.x == 0.0) { u = -T.x/R.x; t = (T.y + u*R.y) / (Q.y + u*S.y); } else if(Q.y == 0.0 &amp;&amp; S.y == 0.0) { u = -T.y/R.y; t = (T.x + u*R.x) / (Q.x + u*S.x); } else { float A = S.x * R.y - R.x * S.y; float B = S.x * T.y - T.x * S.y + Q.x*R.y - R.x*Q.y; float C = Q.x * T.y - T.x * Q.y; // Solve Au^2 + Bu + C = 0 if(abs(A) < 0.0001) u = -C/B; else u = (-B+sqrt(B*B-4.0*A*C))/(2.0*A); t = (T.y + u*R.y) / (Q.y + u*S.y); } u = clamp(u,0.0,1.0); t = clamp(t,0.0,1.0); // These two lines smooth out t and u to avoid visual 'lines' at the boundaries. They can be removed to improve performance at the cost of graphics quality. t = smoothstep(0.0, 1.0, t); u = smoothstep(0.0, 1.0, u); vec4 colorA = mix(color0,color1,u); vec4 colorB = mix(color2,color3,u); fragColor = mix(colorA, colorB, t); }

And the same code again, but formatted as a more typical shader and with parameters:

uniform lowp vec4 color0; uniform lowp vec4 color1; uniform lowp vec4 color2; uniform lowp vec4 color3; uniform lowp vec2 p0; uniform lowp vec2 p1; uniform lowp vec2 p2; uniform lowp vec2 p3; varying lowp vec2 coord; void main() { lowp vec2 Q = p0 - p2; lowp vec2 R = p1 - p0; lowp vec2 S = R + p2 - p3; lowp vec2 T = p0 - coord; lowp float u; lowp float t; if(Q.x == 0.0 &amp;&amp; S.x == 0.0) { u = -T.x/R.x; t = (T.y + u*R.y) / (Q.y + u*S.y); } else if(Q.y == 0.0 &amp;&amp; S.y == 0.0) { u = -T.y/R.y; t = (T.x + u*R.x) / (Q.x + u*S.x); } else { float A = S.x * R.y - R.x * S.y; float B = S.x * T.y - T.x * S.y + Q.x*R.y - R.x*Q.y; float C = Q.x * T.y - T.x * Q.y; // Solve Au^2 + Bu + C = 0 if(abs(A) < 0.0001) u = -C/B; else u = (-B+sqrt(B*B-4.0*A*C))/(2.0*A); t = (T.y + u*R.y) / (Q.y + u*S.y); } u = clamp(u,0.0,1.0); t = clamp(t,0.0,1.0); // These two lines smooth out t and u to avoid visual 'lines' at the boundaries. They can be removed to improve performance at the cost of graphics quality. t = smoothstep(0.0, 1.0, t); u = smoothstep(0.0, 1.0, u); lowp vec4 colorA = mix(color0,color1,u); lowp vec4 colorB = mix(color2,color3,u); gl_FragColor = mix(colorA, colorB, t); }

# Result

On the left is the result using smoothstep to smooth t and u, and on the right is the result without it. Although the non-linear smoothstep is computationally-expensive and a hack, I wasn’t happy with the visual lines that resulted in not using it.