Learning Machine Learning

Heh, funny title.

I became interested in machine learning working for Nokia.  I worked on Nokia’s Z Launcher application for Android.  You can scribble a letter (or multiple), and it would recognize it and search for it.  The app is available for download in the Play Store.

screenshot_2014-06-20-17-16-41

I worked on the Nokia Z Launcher’s handwriting recognition

Specifically I was tasked with optimizing the speed of the recognition.  I don’t know if I can state any specifics on how the character recognition was done, but I will say that I managed to increase the speed of the recognition a hundred fold.

But this recognition was actually a relatively simple task, compared to modern day deep neural networks, but it really whet my appetite to understand more.

When Alpha Go beat Lee Sedol, I knew that I simply must understand Deep Neural Networks.

Below is my journey in understanding, along with my reflective thoughts:

  1. I started off naively implementing an LSTM neural network without any training.
    I wanted to have a crude appreciation of the problems before I read about the solutions.  My results are documented in my previous post here, but looking back it’s quite embarrassing to read.  I don’t regret doing this at all however.
  2. Next I did the Andrew Ng Coursera Machine Learning course.
    This is an 11 week course in the fundamentals.  I completed it along with all the coursework, but by the end I felt that my knowledge was about 20 years out of date.  It was really nice to get the fundamentals, but none of the modern discoveries were discussed at all.  Nothing about LSTM, or Markov Chains, or dropouts, etc.
    The exercises are also all done in Matlab/Octave, which has fallen out of favour and uses a lot of support code.  I certainly didn’t feel comfortable in implementing a neural network system from scratch after watching the course.

    coursera_machine_learning_certificate

    Passed the Coursera Machine learning course with 97.6% score.

    The lecturer, Andrew Ng, was absolutely awesome.  My complaints, really, boil down to that I wish the course was twice as long and that I could learn more from him!  I now help out in a machine learning chat group and find that most of the questions that people ask about TensorFlow, Theano etc are actually basics that are answered very well by Andrew Ng’s course.  I constantly direct people to the course.

  3. Next, Reinforcement Learning.  I did a 4 week Udacity course UD820.
    This was a done by a pair of teachers that are constantly joking with each other.  I first I thought it would annoy me, but I actually really enjoyed the course – they work really well together.  They are a lot more experienced and knowledgeable than they pretend to be.  (They take it in turns to be ignorant, to play the role of a student).
  4. I really wanted to learn about TensorFlow, so I did a 4 week Udacity Course UD730
    Again I did all the coursework for it.  I thought the course was pretty good, but it really annoyed me that each video was about 2 minutes long!  Resulting in a 30 second pause every 2 minutes while it loaded up the next video.  Most frustrating.
  5. At this point, I started reading papers and joined up for the Visual Doom AI competition.
    I have come quite far in my own Visual Doom AI implementation, but the vast majority of the work is the ‘setup’ required.  For example, I had to fix bugs in their doom engine port to get the built-in AI to work.  And it was a fair amount of work to get the game to run nicely with TensorFlow, with mini-batch training, testing and verification stages.
    I believe I understand how to properly implement a good AI for this, with the key coming from guided policy search, in a method similar to that pioneered by google for robotic control.  (Link is to a keynote at the International Conference on Learning Representations 2016).   The idea is to hack the engine to give me accurate internal data of the positions of enemies, walls, health, etc that I can use to train a very simple ‘teacher’.  Then use that teacher to supervise a neural network that has only visual information, thus allowing us to train a deep neural network with back-propagation.  By alternating between teacher and student, we can converge upon perfect solutions.  I hope to write more about this in a proper blog post.
  6. The International Conference on Learning Representations (ICLR) 2016
    Videos were absolutely fascinating, and were surprisingly easy to follow with the above preparation.
  7. I listened to the entire past two years of the The Talking Machines podcast.  It highlighted many areas that I was completely unfamiliar with, and highlighted many things that I knew about but just didn’t realise were important.
  8. I did the Hinton Coursera course on Neural Networks for Machine Learning, which perfectly complemented the Andrew Ng Coursera Course.  I recommend these two courses the most, for the foundations.  It is about 5 years out of date, but is all about the fundamentals.
    coursera
  9. I did the Computational Neuroscience course.  The first half was interesting and was about, well, neuroscience.  But the math lectures were in a slow monotonic tone that really put me straight to sleep.  The second half of the course was just a copy of the Andrew Ng course (they even said so), so I just skipped all the lectures and did the exams with no problems.  I really liked that they let you do the homework in python.  It is easier in matlab, but I really wanted to improve my python datascience skills.  The exams took way way longer than their predicted times.  They would have 20 questions, requiring you to download, run and modify 4 programs, and say that you should be able to do it in an hour!
    coursera
  10. I completed the IBM Artificial Intelligence class.  This is the second most expensive AI class that I’ve done, at $1600 plus $50 for the book.   I was really not at all impressed by it.  Most of the videos are 1 minute long, or less, each.  Which means that I’m spending half the time waiting for the next video to load up.  The main lecturer gets on my own nerves – he wears a brightly colored Google Glass for no apparent reason other than to show it off, and speaks in the most patronizing way.  You get praised continually for signing up to the course at the start.  It’s overly specialized.  You use their specific libraries which aren’t at all production ready, and you use toy data with a tiny number of training examples.  (e.g. trying to train for the gesture ‘toy’ with a single training example!).  Contrast this against the Google Self Driving course:
  11. The Google Self Driving course, which is $2400.  This is much better than the IBM course.  The main difference is that they’ve made the theme self driving cars, but you do it all in TensorFlow and you learn generic techniques that could be applied to any machine learning field.  You quickly produce code that could be easily made production ready.  You work with large realistic data, with large realistic neural networks, and they teach you use the Amazon AWS servers to train the data with.  The result is code that can be (and literally is!) deployed to a real car.
    nd013
Advertisements

I worked on mitigating CPU ‘Meltdown’ bug, 5 years before it was known about

Maybe a clickbait title, sorry, but I couldn’t think of a better title.

The CPU ‘Meltdown’ bug affects Intel CPUs, and from Wikipedia:

Since many operating systems map physical memory, kernel processes, and other running user space processes into the address space of every process, Meltdown effectively makes it possible for a rogue process to read any physical, kernel or other processes’ mapped memory—regardless of whether it should be able to do so. Defenses against Meltdown would require avoiding the use of memory mapping in a manner vulnerable to such exploits (i.e. a software-based solution) or avoidance of the underlying race condition (i.e. a modification to the CPUs’ microcode and/or execution path).

This separation of user and kernel memory space is exactly what I worked on from 2012 to 2014 on behalf on Deutsch Telekom using the L4 hypervisor:

arch-gen

The idea was to give each service its own separate memory space, designing in a way such that you assume that the main OS has been compromised and is not trustworthy (e.g. because of the Meltdown bug). I personally worked on the graphics driver – splitting the kernel graphics driver into two parts – one side for the app to talk to and has to be considered compromised, and one side that actually talks to the hardware.

Here’s my work in action:

anddycccqaaqn84

1309_simko_s3

simko3

Yes, I did actually use Angry Birds as my test. Fully hardware accelerated too 🙂

Unfortunately the problem was that it took too long to port each phone across.  It took me a year to port across graphics driver changes, and a similar time for my colleagues to do the other drivers.  And then another year for it to actually hit the market.  The result is that the phone was always over 2 years out of date by the time it hit the market, which is a long time in mobile phone times.

Still, our software would be immune to this type of bug, and that’s kinda cool.  Even if it did fail commercially 😉

TypeScript + lodash map and filter

I love TypeScript.  I use it whenever I can.  That said, sometimes it can be…  interesting.  Today, out of the blue, I got the typescript error in code that used to work:

[06:53:30]  typescript: src/mycode.ts, line: 57 
            Property 'video' does not exist on type 'number | (<U>(callbackfn: (value: Page, index: number, 
            array: Page[]) => U, thisA...'. Property 'video' does not exist on type 'number'. 

 

The code looks like:

return _.chain(pages)
        .filter((s, sIdx) => s.video || s.videoEmbedded)
        .map((s, sIdx) => {
            if (s.video) { ... }

Can you spot the ‘error’?

The problem is that s.video || s.videoEmbedded isn’t returning a boolean. It’s return a truthy value, but not a boolean. And the lodash typescript developers made a change 1 month ago that meant that filter() would only accept booleans, not any truthy value. And the lodash typescript developers are finding that fixing this becomes very complicated and complex. See the full conversation here:

https://github.com/DefinitelyTyped/DefinitelyTyped/issues/21485

(Open issue at time of writing. Please leave me feedback or message me if you see this bug get resolved)

The workaround/fix is to just make sure it’s a boolean. E.g. use !! or Boolean(..) or:

return _.chain(pages)
        .filter((s, sIdx) => s.video !== null || s.videoEmbedded !== null )
        .map((s, sIdx) => {
            if (s.video) { ... }

Worst/Trickiest code I have ever seen

It’s easy to write bad code, but it takes a real genius to produce truly terrible code.  And the guys who wrote the python program hyperopt were clearly very clever.

Have a look at this function:  (don’t worry about what it is doing) from tpe.py

# These produce conditional estimators for various prior distributions
@adaptive_parzen_sampler('uniform')
def ap_uniform_sampler(obs, prior_weight, low, high, size=(), rng=None):
    prior_mu = 0.5 * (high + low)
    prior_sigma = 1.0 * (high - low)
    weights, mus, sigmas = scope.adaptive_parzen_normal(obs,
        prior_weight, prior_mu, prior_sigma)
    return scope.GMM1(weights, mus, sigmas, low=low, high=high, q=None,
size=size, rng=rng)

The details don’t matter here, but clearly it’s calling some function “adaptive_parzen_normal”  which returns three values, then it passes that to another function called “GMM1”  and returns the result.

Pretty straight forward?  With me so far?  Great.

Now here is some code that calls this function:

fn = adaptive_parzen_samplers[node.name]
named_args = [[kw, memo[arg]] for (kw, arg) in node.named_args]
a_args = [obs_above, prior_weight] + aa
a_post = fn(*a_args, **dict(named_args))

Okay this is getting quite messy, but with a bit of thinking we can understand it.  It’s just calling the  ‘ap_uniform_sampler’  function, whatever that does, but letting us pass in parameters in some funky way.

So a_post is basically whatever “GMM1” returns  (which is a list of numbers, fwiw)

Okay, let’s continue!

fn_lpdf = getattr(scope, a_post.name + '_lpdf')
a_kwargs = dict([(n, a) for n, a in a_post.named_args if n not in ('rng', 'size')])
above_llik = fn_lpdf(*([b_post] + a_post.pos_args), **a_kwargs)

and that’s it.  There’s no more code using a_post.

This took me a whole day to figure out what on earth is going on.  But I’ll give you, the reader, a hint.  This is not running any algorithm – it’s constructing an Abstract Syntax Tree and manipulating it.

If you want, try and see if you can figure out what it’s doing.

Answer: Continue reading

Tensorflow for Neurobiologists

I couldn’t find anyone else that has done this, so I made this really quick guide.  This uses tensorflow which is a complete overkill for this specific problem, but I figure that a simple example is much easier to follow.

Install and run python3 notebook, and tensorflow.  In Linux, as a user without using sudo:

$ pip3 install --upgrade --user ipython[all] tensorflow matplotlib
$ ipython3  notebook

Then in the notebook window, do New->Python 3

Here’s an example I made earlier. You can download the latest version on github here: https://github.com/johnflux/Spike-Triggered-Average-in-TensorFlow

Spike Triggered Average in TensorFlow

The data is an experimentally recorded set of spikes recorded from the famous H1 motion-sensitive neuron of the fly (Calliphora vicina) from the lab of Dr Robert de Ruyter van Steveninck.

This is a complete rewrite of non-tensorflow code in the Coursera course Computational Neuroscience by University of Washington. I am thoroughly enjoying this course!

Here we use TensorFlow to find out how the neuron is reacting to the data, to see what causes the neuron to trigger.

%matplotlib inline
import pickle
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()

FILENAME = 'data.pickle'

with open(FILENAME, 'rb') as f:
    data = pickle.load(f)

stim = tf.constant(data['stim'])
rho = tf.constant(data['rho'])
sampling_period = 2 # The data was sampled at 500hz = 2ms
window_size = 150 # Let's use a 300ms / sampling_period sliding window

We now have our data loaded into tensorflow as a constant, which means that we can easily ‘run’ our tensorflow graph. For example, let’s examine stim and rho:

print("Spike-train time-series =", rho.eval(),
      "\nStimulus time-series     =", stim.eval())
Spike-train time-series = [0 0 0 ..., 0 0 0] 
Stimulus time-series    = [-111.94824219  -81.80664062 
    10.21972656 ...,  9.78515625 24.11132812 50.25390625]

rho is an binary array where a 1 indicates a spike. Let’s turn that into an array of indices of where the value is 1, but ignoring the first window_size elements.

Note: We can use the [] and + operations on a tensorflow variable, and it correctly adds those operations to the graph. This is equivalent to using the tf.slice and tf.add operations.

spike_times = tf.where(tf.not_equal(rho[window_size:-1], 0)) + window_size
print("Time indices where there is a spike:\n", spike_times.eval())
Time indices where there is a spike:
 [[   158]
 [   160]
 [   162]
 ..., 
 [599936]
 [599941]
 [599947]]
def getStimWindow(index):
    i = tf.cast(index, tf.int32)
    return stim[i-window_size+1:i+1]
stim_windows = tf.map_fn(lambda x: getStimWindow(x[0]), spike_times, dtype=tf.float64)
spike_triggered_average = tf.reduce_mean(stim_windows, 0).eval()
print("Spike triggered averaged is:", spike_triggered_average[0:5], "(truncated)")
Spike triggered averaged is: [-0.33083048 -0.29083503 -0.23076012 -0.24636984 -0.10962767] (truncated)

Now let’s plot this!

time = (np.arange(-window_size, 0) + 1) * sampling_period
plt.plot(time, spike_triggered_average)
plt.xlabel('Time (ms)')
plt.ylabel('Stimulus')
plt.title('Spike-Triggered Average')

plt.show()

output_8_0

It’s… beautiful!

What we are looking at here, is that we’ve discovered that our neuron is doing a leaky integration of the stimulus. And when that integration adds up to a certain value, it triggers.

Do see the github repo for full source: https://github.com/johnflux/Spike-Triggered-Average-in-TensorFlow

Update: I was curious how much noise there was. There’s a plot with 1 standard deviation in light blue:

mean, var = tf.nn.moments(stim_windows,axes=[0])
plt.errorbar(time, spike_triggered_average, yerr=tf.sqrt(var).eval(), ecolor="#0000ff33")

spike2

Yikes!  This is why the input signal MUST be Gaussian, and why we need lots of data to average over.  For this, we’re averaging over 53583 thousand windows.

Biped Robot

I’ve always wanted to make a walking robot.  I wanted to make something fairly rapidly and cheaply that I could try to get walking.

And so, 24 hours of hardware and software hacking later:

8jiqqy

He’s waving only by a short amount because otherwise he falls over 🙂  Took a day and half to do, so overall I’m pretty pleased with it.  It uses 17 MG996R servos, and a Chinese rtrobot 32 servo controller board.

Reverse Engineering Servo board

The controller board amazingly provides INCOMPLETE instructions.  The result is that anyone trying to use this board will find that it just does not work because the board completely ignores the commands that are supposed to work.

I downloaded the example software that they provide, which does work.  I ran the software through strace like:

$ strace  ./ServoController 2>&1 | tee dump.txt

Searching in dump.txt for ttyACM0 reveals the hidden initialization protocol.  They do:

open("/dev/ttyACM0", O_RDWR|O_NOCTTY|O_NONBLOCK) = 9
write(9, "~RT", 3)                      = 3
read(9, "RT", 2)                        = 2
read(9, "\27", 1)                       = 1
ioctl(9, TCSBRK, 1)                     = 0
write(9, "~OL", 3)                      = 3
write(9, "#1P1539\r\n", 9)              = 9

(The TCSBRK  ioctl basically just blocks until nothing is left to be sent).  Translating this into python we get:


#!/usr/bin/python
import serial
from time import sleep

ser = serial.Serial('/dev/ttyACM0', 9600)
ser.write('~RT')
print(repr(ser.read(3)))
ser.write('~OL')
ser.flush()
ser.write("#1P2000\r\n")  # move motor 1 to 2000
sleep(1)
ser.write("#1P1000\r\n")  # move motor 1 to 1000
print("done")

(Looking at the strace more, running it over multiple runs, sometimes it writes “~OL” and sometimes “OL”.  I don’t know why.  But it didn’t seem to make a difference.  That’s the capital letter O btw.)

Feedback

I wanted to have a crude sensor measurement of which way up it is.  After all, how can it stand up if it doesn’t know where up is?  On other projects, I’ve used an accelerometer+gyro+magnetometer, and fused the data with a kalman filter or similar.  But honestly it’s a huge amount of work to get right, especially if you want to calibrate them (the magnetometer in particular).  So I wanted to skip all that.

Two possible ideas:

  1. There’s a really quick hack that I’ve used before – simply place the robot underneath a ceiling light, and use a photosensitive diode to detect the light (See my Self Balancing Robot).  Thus its resistance is at its lowest when it’s upright 🙂   (More specifically, make a voltage divider with it and then measure the voltage with an Arduino).  It’s extremely crude, but the nice thing about it is that it’s dead cheap, and insensitive to vibrational noise, and surprisingly sensitive still.  It’s also as fast as your ADC.
  2. Use an Android phone.

I want to move quickly on this project, so I decided to give the second way a go.  Before dealing with vibration etc, I first wanted to know whether it could work, and what the latency would be if I just transmitted the Android fused orientation data across wifi (UDP) to my router, then to my laptop, which then talks via USB to the serial board which then finally moves the servo.

So, I transmitted the data and used the phone tilt to control the two of the servos on the arm, then recorded with the same phone’s camera at the same time.   The result is:

I used a video editor (OpenShot) to load up the video, then measured the time between when the camera moved and when the arm moved.  I took 6 such measurements, and found 6 or 7 frames each time – so between 200ms and 233ms.

That is..  exactly what TowerKing says is the latency of the servo itself (Under 0.2s).  Which means that I’m unable to measure any latency due to the network setup.  That’s really promising!

I do wonder if 200ms is going to be low enough latency though (more expensive hobby servos go down to 100ms), but it should be enough.  I did previously do quite extensive experimental tests on latency on the stabilization of a PID controlled quadcopter in my own simulator, where 200ms delay was found to be controllable, but not ideal.  50ms was far more ideal.  But I have no idea how that lesson will transfer to robot stabilization.

But it is good enough for this quick and dirty project.  This was done in about 0.5 days, bringing the total so far up to 2 full days of work.

Cost and Time Breakdown so far

Metal skeleton $99 USD
17x MG996R servo motors $49 USD
RT Robot 32ch Servo control board $25 USD
Delivery from China $40 USD
USB Cable $2 USD
Android Phone (used own phone)
Total: $215 USD
Parts cost:

For tools, I used nothing more than some screwdrivers and needle-nosed pliers, and a bench power supply. Around $120 in total. I could have gotten 17x MG995 servos for a total of $45, but I wanted the metal gears that the MG996R provide.

Time breakdown:
Mechanical build 1 day
Reverse engineering servo board 0.5 days
Hooking up to Android phone + writing some visualization code 0.5 days
Blogging about it 🙂 0.5 days
Total: 2.5 days

Future Plans – Q Learning

My plan is to hang him loosely upright by a piece of string, and then make a neural network in tensor flow to control him to try to get him to stand full upright, but not having to deal with recovering from a collapsed lying-down position.

Specifically, I want to get him to balance upright using Q learning.  One thing I’m worried about is the sheer amount of time required to physically do each tests.  When you have a scenario where each test takes a long time compared to the compute power, this just screams out for Bayesian learning.   So…  Bayesian Q-parameter estimation?  Is there such a thing?  A 10 second google search doesn’t find anything.  Or Bayesian policy network tuning?    I need to have a think about it 🙂