I wanted to find a best fit curve for some data points when I know that the true curve that I’m predicting is a parameter free Cumulative Distribution Function. I could just do a linear regression on the points, but the resulting function, might not have the properties that we desire from a CDF, such as:
- Monotonically increasing
i.e. That it tends to 1 as x approaches positive infinity
i.e. That it tends to 0 as x approaches negative infinity
First, I take the second two points. To deal with this, I use the sigmoid function to create a new parameter, x’:
This has the nice property that and
So now we can find a best fit polynomial on .
So:
But with the boundary conditions that:
and
Which, solving for the boundary conditions, gives us:
Which simplifies our function to:
Implementing this in tensorflow using stochastic gradient descent (code is below) we get:

20 data samples

100 data samples
(The graph title equation is wrong. It should be then
. I was too lazy to update the graphs sorry)
Unfortunately this curve doesn’t have one main property that we’d like – it’s not monotonically increasing – it goes above 1. I thought about it for a few hours and couldn’t think of a nice solution.
I’m sure all of this has been solved properly before, but with a quick google I couldn’t find anything.
Edit: I’m updating this 12 days later. I have been thinking in the background about how to enforce that I want my function to be monotonic. (i.e. always increasing/decreasing) I was certain that I was just being stupid and that there was a simple answer. But by complete coincidence, I watched the youtube video Breakthroughs in Machine Learning – Google I/O 2016 in which the last speaker mentioned this restriction. It turns out that this a very difficult problem – with the first solutions having a time complexity of :

Screenshot from Breakthroughs in Machine Learning – Google I/O 2016 at 24:18
I didn’t understand what exactly google’s solution for this is, but they reference their paper on it: Monotonic Calibrated Interpolated Look-up Tables, JMLR 2016. It seems that I have some light reading to do!
#!/usr/bin/env python import tensorflow as tf import numpy as np import tensorflow as tf import matplotlib.pyplot as plt from scipy.stats import norm # Create some random data as a binned normal function plt.ion() n_observations = 20 fig, ax = plt.subplots(1, 1) xs = np.linspace(-3, 3, n_observations) ys = norm.pdf(xs) * (1 + np.random.uniform(-1, 1, n_observations)) ys = np.cumsum(ys) ax.scatter(xs, ys) fig.show() plt.draw() highest_order_polynomial = 3 # We have our data now, so on with the tensorflow # Setup the model graph # Our input is an arbitrary number of data points (That's what the 'None dimension means) # and each input has just a single value which is a float X = tf.placeholder(tf.float32, [None]) Y = tf.placeholder(tf.float32, [None]) # Now, we know data fits a CDF function, so we know that # ys(-inf) = 0 and ys(+inf) = 1 # So let's set: X2 = tf.sigmoid(X) # So now X2 is between [0,1] # Let's now fit a polynomial like: # # Y = a + b*(X2) + c*(X2)^2 + d*(X2)^3 + .. + z(X2)^(highest_order_polynomial) # # to those points. But we know that since it's a CDF: # Y(0) = 0 and y(1) = 1 # # So solving for this: # Y(0) = 0 = a # b = (1-c-d-e-...-z) # # This simplifies our function to: # # y = (1-c-d-e-..-z)x + cx^2 + dx^3 + ex^4 .. + zx^(highest_order_polynomial) # # i.e. we have highest_order_polynomial-2 number of weights W = tf.Variable(tf.zeros([highest_order_polynomial-1])) # Now set up our equation: Y_pred = tf.Variable(0.0) b = (1 - tf.reduce_sum(W)) Y_pred = tf.add(Y_pred, b * tf.sigmoid(X2)) for n in xrange(2, highest_order_polynomial+1): Y_pred = tf.add(Y_pred, tf.mul(tf.pow(X2, n), W[n-2])) # Loss function measure the distance between our observations # and predictions and average over them. cost = tf.reduce_sum(tf.pow(Y_pred - Y, 2)) / (n_observations - 1) # Regularization if we want it. Only needed if we have a large value for the highest_order_polynomial and are worried about overfitting # It's also not clear to me if regularization is going to be doing what we want here #cost = tf.add(cost, regularization_value * (b + tf.reduce_sum(W))) # Use stochastic gradient descent to optimize W # using adaptive learning rate optimizer = tf.train.AdagradOptimizer(100.0).minimize(cost) n_epochs = 10000 with tf.Session() as sess: # Here we tell tensorflow that we want to initialize all # the variables in the graph so we can use them sess.run(tf.initialize_all_variables()) # Fit all training data prev_training_cost = 0.0 for epoch_i in range(n_epochs): sess.run(optimizer, feed_dict={X: xs, Y: ys}) training_cost = sess.run( cost, feed_dict={X: xs, Y: ys}) if epoch_i % 100 == 0 or epoch_i == n_epochs-1: print(training_cost) # Allow the training to quit if we've reached a minimum if np.abs(prev_training_cost - training_cost) < 0.0000001: break prev_training_cost = training_cost print "Training cost: ", training_cost, " after epoch:", epoch_i w = W.eval() b = 1 - np.sum(w) equation = "x' = sigmoid(x), y = "+str(b)+"x' +" + (" + ".join("{}x'^{}".format(val, idx) for idx, val in enumerate(w, start=2))) + ")"; print "For function:", equation pred_values = Y_pred.eval(feed_dict={X: xs}, session=sess) ax.plot(xs, pred_values, 'r') plt.title(equation) fig.show() plt.draw() #ax.set_ylim([-3, 3]) plt.waitforbuttonpress()