Custom-trained GPT-3 / GPT-4

Say you have a bunch of company-specific documents/pages/pdfs that you want a user to be able to query with GPT-3.

Without actually pre-training GPT-3, you can get an effect that is pretty close by using embeddings.

Note: Code is available on github:

Here’s how:

  1. Ahead of time, generate an embedding vector for each of your company-specific document / page / pdf . You can even break up large documents and generate an embedding vector for each chunk individually.

    For example for a medical system:

    Gastroesophageal reflux disease (GERD) occurs when stomach acid repeatedly flows back into the tube connecting your mouth and stomach (esophagus). This backwash (acid reflux) can irritate the lining of your esophagus.
    Many people experience acid reflux from time to time. However, when acid reflux happens repeatedly over time, it can cause GERD.
    Most people are able to manage the discomfort of GERD with lifestyle changes and medications. And though it’s uncommon, some may need surgery to ease symptoms.

    [0.22, 0.43, 0.21, 0.54, 0.32……]
  2. When a user asks a question, generate an embedding vector for that too:
    “I’m getting a lot of heartburn.” or
    “I’m feeling a burning feeling behind my breast bone”
    “I keep waking up with a bitter tasting liquid in my mouth”

    [0.25, 0.38, 0.24, 0.55, 0.31……]
  3. Compare the query vector against all the document vectors and find which document vector is the closest (cosine or manhatten is fine). Show that to the user:

    User: I'm getting a lot of heartburn.
    Document: Gastroesophageal reflux disease (GERD) occurs when stomach acid repeatedly flows back into the tube connecting your mouth and stomach (esophagus). This backwash (acid reflux) can irritate the lining of your esophagus.
    Many people experience acid reflux from time to time. However, when acid reflux happens repeatedly over time, it can cause GERD.
    Most people are able to manage the discomfort of GERD with lifestyle changes and medications. And though it's uncommon, some may need surgery to ease symptoms.

    Pass that document and query to GPT-3, and ask it to reword it to fit the question:

Implementation code

Note: Code is available on github:

Generate embeddings

Here’s an example in node javascript:

const { Configuration, OpenAIApi } = require('openai');
const fs = require('fs');

const configuration = new Configuration({
  apiKey: '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',
const openai = new OpenAIApi(configuration);

/** Generates embeddings for all guides in the database
 *  Then run:
 *   node generate_embeddings.js ./prod_backup_20230119.json > embeddings.json
 *   node embeddings_search.js ./embeddings.json

var filenames = process.argv.slice(2);

if (filenames.length === 0) {
  console.log('Usage: node generate_embeddings_from_txt_files.js *.txt > embeddings.json');

async function run() {
  const embeddings = [];
  for (const filename of filenames) {
    const input = fs.readFileSync(filename, 'utf8');

    try {
      const response = await openai.createEmbedding({ input, model: 'text-embedding-ada-002' });
      const output = { embedding:[0].embedding };
      console.error('Success:', filename);
    } catch (e) {
      console.error('Failed:', filename);
  console.log(JSON.stringify(embeddings, null, 2));


This outputs a file like:

{"filename": "gerd.txt", "embedding": [-0.021665672, 0.00097308296, 0.027932819, -0.027959095,....<snipped>]},

Do search

const { Configuration, OpenAIApi } = require('openai');

const configuration = new Configuration({
  apiKey: 'YOUR-API-KEY',
const openai = new OpenAIApi(configuration);

var jsonfilename = './embeddings.json';
const searchTerm = process.argv.slice(2).join(' ');

if (!searchTerm) {
  console.log('Usage: node embeddings_search.js diabetes referral guide');

const data = require('./' + jsonfilename);

async function run() {
  const response = await openai.createEmbedding({ input: searchTerm, model: 'text-embedding-ada-002' });
  const embedding =[0].embedding;
  const results = getScores(data, embedding);

  console.log( => `${a.score.toFixed(2)}: ${a.filename}]`).join('\n'));

function getScores(data, embedding) {
  const results = data
    .map((doc) => ({ score: cosinesim(doc.embedding, embedding), doc }))
    .sort((a, b) => b.score - a.score)
    .filter((doc, index) => index < 3 || (index < 10 && doc.score > 0.7));

  const titles = => doc.doc.title);
  const results_uniq = results.filter((x, index) => titles.indexOf(x.doc.title) === index);
  return results_uniq;

function cosinesim(A, B) {
  var dotproduct = 0;
  var mA = 0;
  var mB = 0;
  for (let i = 0; i < A.length; i++) {
    dotproduct += A[i] * B[i];
    mA += A[i] * A[i];
    mB += B[i] * B[i];
  mA = Math.sqrt(mA);
  mB = Math.sqrt(mB);
  var similarity = dotproduct / (mA * mB);
  return similarity;

You can now run like:

./node embeddings_search.js I am getting a lot of heartburn.

And it will output the filename of the closest document. You can display this directly the user user, or pass it to GPT-3 to fine tune the output.


ChatGPT clone: Open Assistant + RWKV

I’ve been working on an open source ChatGPT clone, as a small part of a large group.

Open Assistant is, very roughly speaking, one possible front end and data collection, and RWKV is one possible back end. The two projects can work together.

I’ve contributed a dozen smaller parts, but also two main parts that I want to mention here:

  • React UI for comparing the outputs of different models, to compare them. I call it: Open Assistant Model Comparer!
  • Different decoding schemes in javascript for RWKV-web – a way to run RWKV in the web browser – doing the actual model inference in the browser.

Open Assistant Model Comparer

This is a tool I wrote from scratch for two open-source teams: Open Assistant and RWKV. Behold its prettiness!

It’s hosted here: and github code here:

You pass it the urls of json files that are in a specific format, each containing the inference output of a model for various prompts. It then collates these, and presents them in a way to let you easily compare them. You can also drag-and-drop local files for comparison.

Update: Now with syntax highlighting, math support, markdown support,, url-linking and so much more:

And latex:

and recipes:

Javascript RWKV inference

This is an especially cool project. RWKV is a RNN LLM. A guy in the team, josephrocca, got it running in the browser, as in doing the actual inference in the browser, by running the python code in Wasm (WebAssembly).

I worked on cleaning up the code, and making it a library suitable for other projects to use.

Project is here:

Then I looked inference decoding:

When we run inference on a model, at each step the model is providing confidence values for each token, and from those confidence values we pick a particular token, before repeating for the next token.

We could pick token with the highest confidence, aka greedy search. But this has various downsides – we will never pick tokens that are also valid (and perhaps more rarer and more interesting), and in smaller LLM results in very repetitive output.

I implemented some alternatives:

  • multinomial sampling. Take the top few more confident outputs, apply a softmax and normalize to produce probabilities, then treat them as probabilities and sample from them. We can apply a temperature, to make lower-probable results more or less likely.
  • Repetitive penalty. We can explicitly look back at the tokens that we’ve generated so far, and penalize similar tokens to make them less likely to appear again.

The result:

The screenshots are from a pretty small model. Here’s a comparison from a larger RWKV (RL LLM) model:

Greedy (no penalty):

"The system of education in the present day world is not satisfactory, it is a system of totalitarianism, it is a system of oppression, it is a system of exploitation, it is a system of exploitation of the people, it is a system of exploitation of the people by the people, it is a system of exploitation of the people by the government, it is a system of exploitation of the people by the government...(repeated text)"

Greedy+Repetition penalty:

"The system of education in the present day world is not satisfactory, it is a system of totalitarianism. The system of modern education is based on the theory of class struggle and the aim of this theory is to control and enslave people. The purpose of this system is to make people submissive and docile. It makes them accept all kinds of injustices, crimes and atrocities as normal conditions. It forces people to accept their own degradation as a result of their own acts. This kind of mentality makes it impossible for any person to think independently, for him to be critical about his own condition, his environment or his culture. "

Sampling top-p=0.8 temperature=1:

"and say something about schools and their role in today's world. I will try to show there are shortcomings in both the system and people. Some flaws of the system are as follows: There are many unruly and disobedient people, especially males, who cannot be controlled. It is impossible for those who lack common sense and insight to appreciate the significance of any decision made with a view to the overall good, such as the deployment of new technologies and machinery. Uneducated persons cannot make the right decisions. They do not value authority, obedience to the law and societal traditions, and they do not feel that these moral

Most people preferred the Greedy + Repetition penalty output.

Acrylic Bender

I wanted to laser cut acrylic plastic, and then bend it into shapes.

So I mocked something up:

(The pink glow was my attempt to render a glow hot wire)

It’s extremely simple. It’s just a base and a channel along one edge with a nichrome wire (the wire used in e-cigarettes).

I actually built 5 of these, in different styles, methods and sizes.

And here’s a version built:

Here you can see how the wire is secured. It wraps around two bolts, and held in place with nuts. One end is connected to a spring because the wire shrinks as the wire gets hot.

I apply 12V across the wire, wait a minute for it to become hot, then place a piece of acrylic across the wire, wait a few seconds for it to become soft, then bend.

Folding version

Out of the five I built, three of them used a second piece with a hinge, like below, and also used an actual metal channel. My thinking is that I would heat up the plastic, hold it in place, then bend it smoothly with the second piece. However I found that I never used the hinge – it was far easier to just bend against the table or bend it around a piece of wood etc


The main use was to bend the casing for my wire bender and mars rover, but these aren’t done yet.

So here’s a cat’s water bowl holder that I made, by bending acrylic around a 2×4 block. I laser cut a circle first to allow the water bowl to fit.

Fast development

The real key to fast development is outsourcing the hard work:

Wire bender

I wanted to bend a large amount of wire for another project.

So I made this, a phone controlled wire bender. You plug it, establish a Bluetooth connection to it, and use the nifty android app I made to make it bend wire.


I had an idea that an 3d printer’s extruder could also be used to extrude wire. So mocked something up:

And then laser cut it.


I decided to mount everything to top acrylic, except for the power connector.

Also, I didn’t do much wire management 🙂

The “project box” is actually a flower pot 🙂

One thing I didn’t foresee with mounting everything upside down is that one of the heatsinks on the motor controller fell off. I had to add an acrylic plate on top to hold them in place. Also, I think I need some active cooling. I haven’t had any actual problems yet, despite bending a lot of wire, but I’m sure I’m doing the controllers and motors no favors.

Previous iterations

I actually went through quite a few iterations. Here was one of the first designs, before I realized that I needed the wire bending part to be much further away from the extruder:

I went through a few different iterations. The set of 11 feeder ball-bearings are there to straighten the wire. It’s not obvious, but they actually converge at approximately a 2 degree angle, and I find this works best. So when the wire is initially fed in, the large spaced bearings smooth out the large kinks, and then the closer spaced bearings smooth out the small kinks. Try trying to do it all in one pass doesn’t work because the friction ends up being too high.

I replaced the extruder feeder with one with a much more ‘grippy’ surface. The grooved metal needs to be harder than the wire you’re feeding into it, so that it can grip it well. This did result in marks in the metal, but that was okay for my purpose. Using two feeder motors could help with this.


The algorithm to turn an arbitrary shape into a set of motor controls was actually pretty interesting, and a large part of the project. Because you have to bend the wire further than the angle you actually want, because it springs back. I plan to write this part up properly later.

Software control

For computer control, I connect the stepper motors to a stepper motor driver, which I connect to an Arduino, which communicates over bluetooth serial to an android app. For prototyping I actually connected it to my laptop, and wrote a program in python to control it.

Both programs were pretty basic, but the android app has a lot more code for UI, bluetooth communication etc. The python code is lot easier to understand:

#!/usr/bin/env python3

import serial
import time
from termcolor import colored
from typing import Union
    import gnureadline as readline
except ImportError:
    import readline

readline.parse_and_bind('tab: complete')
baud=9600 # We override Arduino/libraries/grbl/config.h to change to 9600
# because that's the default of the bluetooth module

    s = serial.Serial('/dev/ttyUSB0',baud)
    print("Connected to /dev/ttyUSB0")
    s = serial.Serial('/dev/ttyUSB1',baud)
    print("Connected to /dev/ttyUSB1")

# Wake up grbl
time.sleep(2)   # Wait for grbl to initialize
s.flushInput()  # Flush startup text in serial input

def readLineFromSerial():
    grbl_out: bytes = s.readline() # Wait for grbl response with carriage return
    print(colored(grbl_out.strip().decode('latin1'), 'green'))

def readAtLeastOneLineFromSerial():
    while (s.inWaiting() > 0):

def runCommand(cmd: Union[str, bytes]):
    if isinstance(cmd, str):
        cmd = cmd.encode('latin1')
    cmd = cmd.strip() # Strip all EOL characters for consistency
    print('>', cmd.decode('latin1'))
    s.write(cmd + b'\n') # Send g-code block to grbl

motor_angle: float = 0.0
MICROSTEPS: int = 16
YSCALE: float = 1000.0

def sign(x: float):
    return 1 if x >= 0 else -1

def motorYDeltaAngleToValue(delta_angle: float):
    return delta_angle / YSCALE

def motorXLengthToValue(delta_x: float):
    return delta_x

def rotateMotorY_noFeed(new_angle: float):
    global motor_angle
    delta_angle = new_angle - motor_angle
    runCommand(f"G1 Y{motorYDeltaAngleToValue(delta_angle):.3f}")
    motor_angle = new_angle

def rotateMotorY_feed(new_angle: float):
    global motor_angle
    delta_angle = new_angle - motor_angle
    motor_angle = new_angle
    Y = motorYDeltaAngleToValue(delta_angle)

    wire_bend_angle = 30 # fixme
    bend_radius = 3
    wire_length_needed = 3.1415 * bend_radius * bend_radius * wire_bend_angle / 360
    X = motorXLengthToValue(wire_length_needed)
    runCommand(f"G1 X{X:.3f} Y{Y:.3f}")

def rotateMotorY(new_angle: float):
    print(colored(f'{motor_angle}°→{new_angle}°', 'cyan'))
    if new_angle == motor_angle:

    if sign(new_angle) != sign(motor_angle):
        # We are switching from one side to the other side.
        if abs(motor_angle) > 45:
            # First step is to move to 45 on the initial side, feeding the wire
            rotateMotorY_feed(sign(motor_angle) * 45)
        if abs(new_angle) > 45:
            rotateMotorY_noFeed(sign(new_angle) * 45)
        if abs(motor_angle) < 45 and abs(new_angle) < 45:
            # both start and end are less than 45, so no feeding needed
        elif abs(motor_angle) < 45:
            rotateMotorY_noFeed(sign(motor_angle) * 45)
        elif abs(new_angle) < 45:
            rotateMotorY_feed(sign(motor_angle) * 45)
        else: # both new and old angle are >45, so feed

def feed(delta_x: float):
    X = motorXLengthToValue(delta_x)
    runCommand(f"G1 X{X:.3f}")

def zigzag():
    for i in range(3):

def s_shape():
    for i in range(6):
    for i in range(6):

def paperclip():


runCommand('F32000') # Feed rate - affects X and Y
runCommand('G21')  # millimeters
runCommand(f'$100={6.4375 * MICROSTEPS}') # Number of steps per mm for X
runCommand(f'$101={YSCALE * 0.5555 * MICROSTEPS}') # Number of steps per YSCALE degrees for Y
while True:
    line = input('> ("stop" to quit): ').upper()
    if line == 'STOP':
    if len(line) == 0:
    cmd = line[0]
    if cmd == 'R':
        val = int(line[1:])
    elif cmd == 'F':
        val = int(line[1:])

runCommand('G4P0') # Wait for pending commands to finish



My daughter loved this ‘Flamingo’ song on youtube, so I laser cut out the ‘shrimp’ in wood, acrylic and pink paper, then mounted them together with an android tablet all inside of a frame. The result was pretty cool imho. (Unfortunately I can’t show the full thing for copyright reasons.)

Laser cutting a map

I laser cut a map of where I lived in Tokyo. I made so many mistakes, but I was happy with the final result.

I thought it would be easy – download nasa height map data, turn it into an svg, and laser cut. But when I downloaded the height map data for the area, I found there was missing data. The contour lines weren’t close loops, but open loops. I had to painstakingly manually correct, the manually fix the train lines, removing overlapping lines that would causing the laser cutter to burn too deeply, and simplify some terrain. I even had to write a python program to help fix the data file. But I was happy with the final result.

Un-hardwrap a text file

I made a python script that un-hardwraps text, and put it up on github:

Take a txt file that has been hard-wrapped and remove the hardwrapping.

Use like:

./ < example_in.txt > example_out.txt

This takes text, for example: (line numbers added for clarity)

1. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
2. incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
3. nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
4. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
5. fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
6. culpa qui officia deserunt mollit anim id est laborum.

And produces output without the hardwrapped newlines, like: (line numbers added for clarity)

1. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
2. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

The rules are:

  • Two adjacent lines are considered to have been hardwrapped and should be remerged, if the first line ends with a letter, comma or hyphen.
  • But don’t merge if the second line starts with a space or utf8 opening quote.
  • Everything between utf8 speechmarks “..” will be treated as one line.

Mars Rover Robot

This is an ongoing multi-year project.  I need to write this up properly, but I’ve got so much stuff that I’m going to attack this in stages.

Current Status

14 motors, controlled by a raspberry pi.  Arm doesn’t quite work yet.  Everything else works.

Front end – I wrote this in ReactJS, and it communicates with the robot via a websocket to RobotOS, using ‘rosbridge’.  Sorry it’s so pink – it uses infrared LEDs to work at night, but they are kinda overpowering.  Also sorry for the mess – I have two children…

React frontend

In the top left, the green circle is a ‘nipple’ – to let you move the rover about via webpage. Either via mouse or finger press.

In the top right, the xbox controller image shows what buttons are being pressed on an xbox controller, to control the robot. The xbox controller can either be connected to the rover, or connected to the PC / mobile phone – the webpage relays controller commands through to the rover.

Beneath the xbox controller is the list of running processes (“ros nodes”), with cpu and mem information.

Below that is the error messages etc.

Console UI

I’m strong believer in trying to make my projects ‘transparent’ – as in, when you connect, it should be obvious what is going on, and how to do things.

With that in mind, I always create nice colored scripts that show the current status.

Below is the output when I ssh into the raspberry pi.  It shows:

– Wifi status and IP address
– Currently running ROS Node processes, and their mem and cpu status.  Colored Green for good, Red for stopped/crashed.
– Show that ‘rosmon’ is running on ‘screen’ and how to connect to it.
– The command line shows the current git status


I put this script in git in two parts – the first is a bash script to source from bash, and the second is a python script to use ros to get the current status.

See those links for the source code.  I think that every ROS project can benefit from these.

Boogie Rocker Design

Ultrasonic Sensor

I played about with making a robot face.  I combined a Ultrasonic Sensor (HC-SR04) and NeoPixel leds.  The leds reflect the distance – one pixel per 10 of cm.

I’m wondering if I can do SLAM – Simultaneous Location And Mapping  with a single (or a few) very-cheap ultrasonic sensor. Well, the answer is almost certainly no, but I’m very curious to see how far I can get.

NOTE: This is an old design – from more than two years ago


6DOF head tracking in python (Part 2 of Samurai Robot)

Samurai Robot Series:

  1. Part 1 – Hardware
  2. Part 2 – 6DOF head tracking in python


I wanted a simple way to track the movement and orientation of an object (In this case, my Samurai robot).  TrackIR’s SDK is not public, and has no python support as far as I know.

I looked at the symbols in dll that is distributed with the trackir app, and with some googling around on the function names, I could piece together how to use this API.

Here’s the result – my console program running in a window in the middle.  I have pressed ctrl+c after 70ms, to cancel, to give a better screenshot:trackir_screenshot

Implementation Details

Python lets you call functions in any .dll library pretty easily (although the documentation for doing so is pretty awful).  The entire code is approximately 500 lines, with half of that being comments, and I’ve put on github here:

But here’s a high overview of the implementation:

  1. Query the Windows registry for where the TrackIR software is installed.
  2. Load the NPClient64.dll in that TrackIR software folder, using python ctypes.WinDLL (Note, I’m using 64 bit python, so I want the 64 bit library)
  3. Expose each function in the dll that we want to use.  This can get a bit cryptic, and took a bit of messing about to get right.  For example, there is a function in the dll ‘NP_StopCursor’, that in C would look like:
    int NP_StopCursor(void)

    This translates in python to:

    NP_StopCursor = ctypes.WINFUNCTYPE(ctypes.c_int)(
    ("NP_StopCursor", trackIrDll), ())

    where ‘NP_StopCursor’ is now a function that can then be called with:  NP_StopCursor().  See the file for more, including on how to pass parameters which are data structures etc.

    Note: Python ctypes supports a way to mark a parameter as an ‘output’, but I didn’t fully understand what it was doing.  It seemed to just work, and appeared to automatically allocate the memory for you, but the documentation didn’t explain it, and I didn’t entirely trust that I wasn’t just corrupting random memory, so I stuck to making all the parameters input parameters and ‘new’ing the data structure myself.

  4. Create a graphical window and get its handle (hWnd), and call the functions in the TrackIR  .dll to setup the polling, and pass the handle along.  The TrackIR software will poll every 10 seconds ish to check that the given graphic window is still alive, and shutdown the connection if not.
  5. Poll the DLL at 120hz to get the 6DOF position and orientation information.

ROS Integration

I wanted to get this to stream 6DOF data to ROS as well, to open up possibilities such as being able to determine the position of a robot, or to control the robot with your head etc.

The easiest way is to run a websocket server on the linux ROS server, and send the data via ROS message via the websocket.

A good place to start: