This page is a feed of all my #projects posts in reverse chronological order. You can subscribe to this feed in your favourite feed reader through the icon above. You can also get a weekly digest of all of my posts via email by subscribing here:
I've had a problem for a while around organising articles and bookmarks that I collect. I try not to bookmark things arbitrarily, and instead be mindful of the purpose of doing so, but I still have thousands of bookmarks, and they grow faster than I can process them.
I've tried automated approaches (for example summarising the text of these webpages and clustering vector embeddings of these) with limited success so far. I realised that maybe I should simply eat the frog and work my way through these, then develop a system for automatically categorising any new/inbound bookmark on the spot so they stay organised in the future.
A new problem was born: how can I efficiently manually organise my bookmarks? The hardest step in my opinion is having a good enough overview of the kinds of things I bookmark such that I can holistically create a hierarchy of categories, rather than the greedy approach where I tag things on the fly.
I decided to first focus on bookmarks that I would categorise as "tools", which are products or services that I currently use, may use in the future, want to look at to see if they're worth using, or may want to recommend to others in the future if they express a particular need. These are a bit more manageable as they're a small subset; the bigger part of my bookmarks are general knowledge resources (articles etc).
At the moment, I rely on my memory for the above use cases. Often I don't remember the name of a tool, but I can usually find it with a substring search of the summaries. Often I don't remember tools in the first place, and am surprised to find that I bookmarked something that I wish I would have remembered existed.
Eventually, I landed on a small script to convert all my notes into files, and then using different file browsers to drag and drop files into the right place. This was still very cumbersome.
On the front page of my public notes I have two different visualisations for browsing these notes. I find them quite useful for getting an overview. I thought it might be quite useful to use the circles view for organisation too. So I thought I should make a minimal file browser that displays files in this way, for easy organisation.
Originally, I took this as an excuse to try Tauri (a lighter Electron equivalent built on Rust that uses native WebViews instead of bundled Chromium), and last month I did get an MVP working, but then I realised that I'm making things hard on myself, especially since the development workflow for Tauri apps wasn't very smooth with my setup.
So instead, I decided to write this as an Obsidian plugin, since Obsidian is my main PKM tool. Below is a video demo of how far I got.
You can:
Unlike the visualisation on my front page, which uses word count for node size, this version uses file size. So far, it helps with organisation, although I would like to work on a few quality-of-life things to make this properly useful.
Today was the "Build a Website in an Hour" IndieWeb event (more info here). I went in not quite knowing what I wanted to do. Then, right as we began, I remembered learning about Gemini and Astrobotany from Jo. I thought this would be the perfect opportunity to explore Gemini, and build a Gemini website!
Gemini is a simple protocol somewhere between HTTP and Gopher. It runs on top of TLS and is deliberately quite minimal. You normally need a Gemini client/browser in order to view gemini://
pages, but there's an HTTP proxy here.
I spent the first chunk of the hour trying to compile Gemini clients on my weird setup. Unfortunately, this proved to be quite tricky on arm64 (I also can't use snap or flatpak because of reasons that aren't important now). I eventually managed to install a terminal client called Amfora and could browse the Geminispace!
Then, I tried to get a server running. I started in Python because I thought this was going to be hard as-is, and I didn't want to take more risks than needed, but then I found that it's actually kind of easy (you only need socket
and ssl
). Once I had a server working in Python, I thought that I actually would prefer if I could run this off of the same server that this website (yousefamar.com) uses. Most of this website is static, but there's a small Node server that helps with rebuilding, wiki pages, and testimonial submission.
So for the next chunk of time, I implemented the server in Node. You can find the code for that here. I used the tls
library to start a server and read/write text directly from/to a socket.
Everything worked fine on localhost with self-signed certificates that I generated with openssl, but for yousefamar.com I needed to piggyback off of the certificates I already have for that domain (LetsEncrypt over Caddy). I struggled with this for most of the rest of the time. I also had an issue where I forgot to end the socket after writing, causing requests to time out.
I thought I might have to throw in the towel, but I fixed it just as the call was about to end, after everyone had shown their websites. My Gemini page now lives at gemini://yousefamar.com/ and you can visit it through the HTTP proxy here!
I found some Markdown to Gemini converters, and I considered having all my public pages as a capsule in Geminispace, but I think many of them wouldn't quite work under those constraints. So instead, in the future I might simply have a gemini/
directory in the root of my notes or similar, and have a little capsule there separate from my normal web stuff.
I'm quite pleased with this. It's not a big deal, but feels a bit like playing with the internet when it was really new (not that I'm old enough to have done that, but I imagine this is what it must have felt like).
A while ago I wrote about discovering a long-forgotten project from 2014 I had worked on in the past called Mini Conquest. As the kind of person who likes to try a lot of different things all the time, over my short 30 years on this earth I have forgotten about most of the things I've tried. It can therefore be quite fun to forensically try and piece together what my past self was doing. I thought I had gotten to the bottom of this project and figured it out: an old Java gamedev project that allowed me to play around with the 2.5D mechanic.
Well, it turns out that's not where it ended. Apparently, I had ported the whole thing to Javascript shortly after, meaning it actually runs in the browser, even today somehow! I had renamed it to "Conquest" by then. As was my style back then, I had 0 dependencies and wanted to write everything from scratch.
If you've read what I wrote about the last one, you might be wondering why the little Link character is no longer there, and what the deal with that house and the stick figure is. Well, turns out I decided to change genres too! It was no longer a MOBA/RTS but more like a civilisation simulator / god game.
The player can place buildings, but the units have their own AI. The house, when place, can automatically spawn a "Settler". I imagine that I probably envisioned these settlers mining and gathering resources on their own, with which you can decide what things to build next, and eventually fight other players with combat units. To be totally honest though, I no longer remember what my vision was. This forgetfulness is why I write everything down now!
The way I found out about this evolution of Mini Conquest was also kind of weird. On the 24th of January 2023, a GitHub user called markeetox forked my repo, and added continuous deployment to it via Vercel. The only evidence I have of this today is the notification email from Vercel's bot; all traces of this repo/deployment disappeared shortly after. Maybe he was just curious what this is.
I frankly don't quite understand how this works. The notification came from his repo on a thread related to a commit or something, that is apparently authored by me (since I authored the commit?) and I've been automatically subscribed in his fork? Odd!
Around two months ago, I was talking to a friend about these games that involve programming in some form (mainly RTSs where you program your units). Some examples of these are:
Needless to say, I'm a fan of these games. However, during that conversation, I suddenly remembered: I made a game like this once! I had completely forgotten that for a game jam, I had made a simple game called Homebound. You can learn more about it at that link!
At the time, you could host static websites off of Dropbox by simply putting your files in the Public
folder. That no longer worked, so the link was broken. I dug everywhere for the project files, but just couldn't find them. I was trying to think if I was using git or mercurial back then and where I could have put them. I think it's likely I didn't bother because it was something small to put out in a couple of hours.
Eventually, in the depths of an old hard drive, I found a backup of my old Dropbox folder, and in that, the Homebound source code! Surprisingly, it still worked perfectly in modern browsers (except for a small CSS tweak) and it now lives on GitHub pages.
Then, I forgot about this again (like this project is the Silence), until I saw the VisionScript project by James, which reminded me of making programming languages! So I decided to create a project page for Homebound here.
I doubt I will revisit this project in the future, but I might play with this mechanic again in the context of other projects. In that case, I might add to this devlog to reference that. I figured I should put it out there for posterity regardless!
I made some small changes to the Miniverse project. It still feels a bit boring, but I'm trying different experiments, and I think I want to try a different strategy, similar to Voyager for Minecraft. Instead of putting all the responsibility on the LLM to decide what to do each step of the simulation, I want to instead allow it to modify its own imperative code to change its behaviour when need be. Unlike the evolutionary algos of old, this would be like intelligent design, except the intelligence is the LLM, rather than the LLM controlling the agents directly.
Before I do this however, I decided to clean the codebase up a little, and make the GitHub repo public, as multiple people have asked me for the code. It could use a bit more cleanup and documentation, but at least there's a separation into files now, rather than my original approach of letting the code flow through me into a single file:
I also added some more UI to the front end so you can see when someone's talking and what they're saying, and some quality of life changes, like loading spinners when things are loading.
There's still a lot that I can try here, and the code will probably shift drastically as I do, but feel free to use any of it. You need to set the OPENAI_KEY
environment variable and the fly.io config is available too if you want to deploy there (which I'm doing). The main area of interest is probably NPC.js which is where the NPC prompt is built up.
(Skip to the end for the conclusion on how this has affected my website, or continue reading for the backstory).
For as long as I can remember, I've been almost consistently engaged in some form of education or mentorship. Going back to my grandparents, and potentially great-grandparents, my family on both sides has all been teachers, professors, and even a headmaster, so perhaps it's something in my blood. I started off teaching in university as a TA (and teaching at least one module outright where the lecturer couldn't be bothered). Later, I taught part-time in a pretty rough school (which was quite exhausting) and even later at a much fancier private school (which wasn't as exhausting, but much less fulfilling) and finally I went into tutoring and also ran a related company. I wound this business up when covid started.
Over the years I found that, naturally, the smaller the class, the more disproportional impact you can have when teaching. I also found that that personal impact goes up exponentially not when I teach directly, but zoom out and find out what it is the student actually needs (especially adult students), and help them unblock those problems for themselves. As the proverb goes,
"Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime."
There's also a law of diminishing returns at play here. By far the biggest impact you can have when guiding someone to reach their goals (academic or otherwise) comes at the very start. This immediate impact has gotten bigger and bigger over time as I've learned more and more myself. Sometimes it's a case of simply reorienting a person, and sending them on their way, rather than holding their hand throughout their whole journey.
This is how I got into mentoring. I focused mainly on supporting budding entrepreneurs and developers from underpriviledged groups, mainly organically through real-life communities, but also through platforms like Underdog Devs, ADPList, Muslamic Makers and a handful of others. If you do this for free (which I was), you can only really do on the side, with a limited amount of time. I wasn't very interested in helping people who could actually afford paying me for my time, paradoxically enough...
I decided recently that there ought to be an optimal middle ground that maximises impact. 1:1 mentoring just doesn't scale, and large workshop series aren't effective. I wanted to test a pipeline of smaller cohorts and mix peer-based support with standard coaching. I have friends who I've worked with before who are willing to help with this, and I think I can set up a system that would be very, very cheap and economically accessible to the people I care about helping.
Anyway, I've started planning a funnel, and building a landing page. Of course, any landing page worth its salt ought to have social proof. So I took to LinkedIn. I never post to LinkedIn (in fact, this might actually have been my very first post in the ~15 years I've been on there). I found a great tool for collecting testimonials in my toolbox called Famewall, set up the form/page, and asked my LinkedIn network to leave me testimonials.
There were a handful of people that I thought would probably respond, but I was surprised that instead other people I completely hadn't expected were responding. In some cases, people that I genuinely didn't remember, and in other cases people where I didn't realise just how much of an impact I had on them. This was definitely an enlightening experience!
I immediately hit the free tier limit of Famewall and had to upgrade to a premium tier to access newer testimonials that were rolling in. It's not cheap, and I'm only using a tiny fraction of the features, but the founder is a fellow indie hacker building it as a solo project and doing a great job, and we chatted a bit, so I figured I should support him.
I cancelled my subscription a few days later when I got around to re-implementing the part that I needed on my own site. That's why this post is under the Website project; the review link (https://amar.io/review) now redirects to a bog standard form for capturing testimonials (with a nice Lottie success animation at the end, similar to Famewall) and in the back end it simply writes the data to disk, and notifies me that there's a new testimonial to review. If it's ok, I tweak the testimonial JSON and trigger an eleventy rebuild (this is a static site). In the future, I might delegate this task to Sentinel!
The testimonials then show up on this page, or any other page onto which I include testimonials.njk
(like the future mentoring landing page). For the layout, I use a library called Colcade which is a lighter alternative to Masonry recommended to me by ChatGPT when I asked for alternatives, after Masonry was giving me some grief. It works beautifully!
Amarbot no longer has a WhatsApp number. This number now belongs to Sentinel, the custodian of Sanctum.
This number was originally wired up directly to Sanctum functions, as well as Amarbot's brain; a fine-tuned GPT-J model trained on my chat history. Since this wiring was through Matrix it became cumbersome to have to use multiple Matrix bridges for various WhatsApp instances. I eventually decided use that model on my actual personal number instead, which left Amarbot's WhatsApp number free.
Whenever Amarbot responds on my behalf, there's a small disclaimer. This is to make it obvious to other people whether it's actually me responding or not, but also so when I retrain, I can filter out artificial messages from the training data.
I mentioned recently that I've been using OpenAI's new functions API in the context of personal automation, which is something I've explored before without the API. The idea is that this tech can short-circuit going from a natural language command, to an actuation, with nothing else needed in the middle.
The natural language command can come from speech, or text chat, but almost universally, we're using conversation as an interface, which is probably the most natural medium for complex human interaction. I decided to use chat in the first instance.
Introducing: Sentinel, the custodian of Sanctum.
No longer does Sanctum process commands directly, but rather is under the purview of Sentinel. If I get early access to Lakera (the creators of Gandalf), he would also certainly make my setup far more secure than it currently is.
I repurposed the WhatsApp number that originally belonged to Amarbot. Why WhatsApp rather than Matrix? So others can more easily message him -- he's not just my direct assistant, but like a personal secretary too, so e.g. people can ask him for info if/when I'm busy. The downside is that he can't hang out with the other Matrix bots in my Neurodrome channel.
A set of WhatsApp nodes for Node-RED were recently published that behave similarly to the main Matrix bridge for WhatsApp, without all the extra Matrix stuff in the way, so I used that to connect Sentinel to my existing setup directly. The flow so far looks like this:
The two main branches are for messages that are either from me, or from others. When they're from others, their name and relationship to me are injected into the prompt (this is currently just a huge array that I hard-coded manually into the function node). When it's me, the prompt is given a set of functions that it can invoke.
If it decides that a function should be invoked, the switchResponse
node redirects the message to the right place. So far, there are only three possible outcomes: (1) doing nothing, (2) adding information to a list, and (3) responding normally like ChatGPT. I therefore sometimes use Sentinel as a quicker way to ask ChatGPT one-shot questions.
The addToList
function is defined like this:
{
name: "addToList",
description: "Adds a string to a list",
parameters: {
type: "object",
properties: {
text: {
type: "string",
description: "The item to add to a list",
},
listName: {
type: "string",
description: "The name of the list to which the item should be added",
enum: [
"movies",
"books",
"groceries",
]
},
},
required: ["text", "listName"],
},
}
I don't actually have a groceries list, but for the other two (movies and books), my current workflow for noting down a movie to watch or a book to read is usually opening the Obsidian app on my phone and actually adding a bullet point to a text file note. This is hardly as smooth as texting Sentinel "Add Succession to my movies list". Of course, Sentinel is quite smart, so I could also say "I want to watch the first Harry Potter movie" and he responds "Added "Harry Potter and the Sorcerer's Stone" to the movies list!".
The actual code for adding these items to my lists is by literally appending a bullet point to their respective files (I have endpoints for this) which are synced to all my devices via the excellent Syncthing. In the future, I could probably make this fancier, e.g. query information about the movie/book and include a poster/cover and metadata, and also potentially publish these lists.
I've been experimenting with OpenAI's new functions API recently, mostly in the context of personal automation, which is something I've explored before without the API (more on that in the future). However, something else I thought might be interesting would be to give NPCs in a virtual world a more robust brain, like the recent Stanford paper. This came in part from the thinking from yesterday's post.
The Stanford approach had many layers of complexity and they were attempting to create something that is close to real human behaviour. I'm less interested in that, and would instead like to design an environment with much higher constraints based on simple rules. I think finding the right balance there leads to the most interesting emergent results.
So my first goal was to create a very tightly scoped environment. I decided to start with a 32x32 grid, made of emojis, with 5 agents randomly spawned. The edge of the grid is made of walls so they don't fall off.
Agents after they had walked towards each other for a chat
When I was originally scoping this out, I thought I would add mechanisms for interacting with items too. These items could be summoned in some way perhaps. I built a small API for getting the nearest emoji to item text as well, which is still up at e.g. https://gen.amar.io/emoji/green apple (replace "green apple" with whatever). It also caches the emojis so overall not expensive to run.
I also explored various models for generating emoji-like images, for the more fantastical items, and landed on emoji diffusion. It was at this point that I quickly realised I'm losing control of the scope, and decided to focus on NPCs only, and no items.
Each simulation step (tick) would iterate over all agents and compute their actions. I planned for these possible actions:
I wanted the response from OpenAI to only be function calls, which unfortunately you can't control, so I had to add to the prompt You MUST perform a single function in response to the above information
. If I get any bad response, we either retry or fall back to "do nothing", depending.
The prompt contained some basic info, a goal, the agent's surroundings, the agent's memory (a truncated list of "facts"), and information on events of the past round. I found that I couldn't quite rely on OpenAI to make good choices, so I selectively build the list of an agent's capabilities on the fly each tick. E.g. if there's nobody in speaking distance, we don't even give the agent the ability to speak. If there's a wall ahead of the agent, we don't even give the agent the chance to step forward. And if the agent just spoke, they lose the ability to speak again for the following round or else they speak over each other.
I had a lot of little problems like that. Overall, the more complicated the prompt, the more off the rails it goes. Originally, I tried The message from God is: "Make friends"
as I envisioned interaction from the user coming in the form of divine intervention. But then some of the agents tried speaking to God and such, so I replaced that with Your goal is: "Make friends"
, and later Your goal is: "Walk to someone and have interesting conversations"
so they don't just walk randomly forever.
They would also feel compelled to try and remember a lot. Often the facts they remembered were quite useless, like the goal, or their current position. The memory was small, so I tried prompt engineering to force them to treat memory as more precious, but it didn't quite work. Similarly, they would sometimes go on endless loops remembering the same useless fact over and over. I originally had all information in their memory (like their name) but I didn't want them to forget their name, so put the permanent facts outside.
Eventually, I removed the remember
action, because it really wasn't helping. They could have good conversations, but everything else seemed a bit stupid, like I might as well program it procedurally instead of with LLMs.
I did however focus a lot on having a very robust architecture for this project, and made all the different parts easy to build on. The server does the simulation (in the future, asynchronously, but today, through the "tick" button) and stores world state in a big JSON object that I write to disk so I can rewind through past states. There is no DB, we simply read/write from/to JSON files as the world state changes. The structure of the data is flexible enough that I don't need to modify the schemas, and it can remain pretty forwards-compatible as I make additions, so I can run the server off of older states and it picks up those states gracefully.
Anyway, I'll be experimenting some more and writing up more details on the different parts as they develop!
Ever since I discovered the Discord server for AI music generation, I knew I needed to train a model to make my voice a great singer. It took some figuring out, but now I'm having a lot of fun making myself sing every kind of song. I've tried dozens now, but here are some ones that are particularly notable or fun (I find it funniest when things glitch out especially around the high notes):
When people visit my website, it's not very clear to them what it is I actually do. It used to be that my website doubled as my CV, but after a while that became sort of useless as I no longer needed to apply to jobs, and I stopped maintaining it.
Recently, I began revamping the landing page of my website (yet again) and started off by cleaning up the hero section. I had a bunch of icons that represented links, and thought I was being clever and language-agnostic by using icons instead of text, but then realised that even I was forgetting what icon meant what, so I couldn't begin to hope that others would know what they meant. So I added text. And I made the WebGL avatar have an image fallback in case the browser doesn't support WebGL. Similarly, I froze the parallax effects under the same conditions as those rely on hardware acceleration.
Then, I decided to create a project grid right under the hero. I wanted to treat this as a showcase of the things I am involved in, or have been involved in, loosely in order of importance. This has been inspired by:
I began by adding the most important stuff for now, and might add some more over time. I keep pretty good notes of everything I do, so this list could become very very long, as I've worked on a lot of things over time. I realised that a few, cool, recent things are more meaningful than many, old, arbitrary things, so I'll try not to make this grid too large, and link to the full directory of projects (at least the ones that have made their way online) in the last tile.
It would have also been quite boring I think if I listed all my publications, as they're largely related. The same holds true for every weekend project, hackathon, game jam, utility script, game mod, etc. The projects I worked on during university I think are similarly just too old now. I also left out my volunteering work because it felt a bit too vain to include, and I'm not sure it would actually spur on any meaningful conversation. I also left out the things where I don't have significant enough involvement.
Please let me know your thoughts on the above and/or on how to improve this!
My Red Maple and Wisteria seeds haven't sprouted yet, but I was left with all this extra soil! So I decided that I ought to plant the other species too. The remaining seeds I have are for Black Pine, Cherry Blossom, and Japanese Cedar. This is what they look like respectively:
I only had three Cherry Blossom seeds, and unlike the Red Maple, I decided to only plant one seed in that pot. Besides that, I've largely only used half of the seeds I have of each species so far, and I'm thinking that even that is unnecessary, but let's see!
As I was soaking them for 48 hours, they kind of got mixed up a bit, and I had a bit of a challenge separating the Black Pine from the Cherry Blossom, but I think I got there in the end. To better keep track of everything, as I was really starting to forget which is which, I put in some little wooden sticks:
The soil had dried quite a bit, so I made it wetter, maybe even a little too wet, as it was soaking the cotton on the bottom and created some condensation on the plastic. I also used tap water, which I didn't do for the first two, as it's pretty hard / rich in calcium. For my tomato plant, the effects of this were soon obvious as calcium residue was visible on the top of the soil and edges of the soil where it meets the pot. I didn't want to have the same for these plants, but it should hopefully all be OK.
If you'd like to learn some more about each species, here are their sections in my little book:
So now we have 5 different pots stratifying -- let's see which sprout first!
My bonsai seeds have soaked for 48 hours and I'm ready to move on to the next step! The reveal: I picked Wisteria and Red Maple. They ticked all the right boxes for me as my first try.
The Wisteria seeds are the small ones and the Red Maple are the two big ones. I used half of the seeds that I had of each species.
I assembled the "Auto Irrigation Growing Pot" and tried to ignore the conflicting instructions. I think you're not meant to fill the reservoir with any water at all until after the Stratification step (which I'll explain in a sec), and it's ambiguous how deep the seeds should go beyond "same depth as the size" (the size of what, the seeds?), so I just used my best judgement.
It turns out that I actually have a lot of soil. I didn't even use up a full peat disc so far. I have three more pots, so I'm considering getting some more seedlings started in the meantime and increase the chances of success...
At any rate, I sowed the current seeds and sprinkled a tiny bit of water into the soil to keep it moist, as it had dried out a bit in the meantime. I don't think the instructions should have the soil bit as step 1 if you're then going to soak the seeds for 48 hours after that, it should really be the second step.
And now that they're sown, I put them in the fridge. In the fridge, the one on the left is the Red Maple (this is more of a note to myself -- I should label them really; there are little wooden sticks for that in the kit). Putting them in the fridge is the first part of the Stratification step, which is meant to simulate winter conditions, then spring, so that they can germinate as they would in nature.
I'll be checking on them every few days and keeping the soil damp. Hopefully in two or three weeks they will start sprouting and I can remove them from the fridge. I set some calendar events. So now we wait!
I finally decided to start on my bonsai project. To read more about what this is all about, check out the project page. I haven't written anything about the tomato project, or any of the other (failed) horticulture projects, but I will eventually, since documenting failures is important too! This is the first log of what is probably going to be rather perennial chronicles.
The kit that I'm using to get a start with bonsai is really quite neat. It comes with 5 different species of seeds: Japanese Wisteria, Cherry Blossom, Japanese Cedar, Red Maple Tree, and Black Pine Tree.
This is a great set of tools in such a small package and I'm quite excited! The instruction booklet goes into a decent amount of detail, though I already know a bunch from YouTube and other places as I had a general interest in bonsai before deciding to try myself.
It came with two peat pucks that you put in some water and watch as they slowly grow while they absorb the water.
I decided to do both of them, as I wanted to try multiple species at the same time, and they grow to about 3x their original size! It's actually quite a lot of soil.
I then decided on two species that I wanted to grow. The next step was to put some of the seeds in warm water for 48 hours, such that they can soften, which makes it easier for the seedling to break through the shell. The two that I picked had seeds that looked very distinct from each other!
If you would like to know what species I picked, check back in 48 hours when I document the sowing process! I'll give you a hint: I didn't pick the mainstream choice (Black Pine).
Some weeks ago I built the "Muslim ChatGPT". From user feedback, I very quickly realised that this is one use case that absolutely won't work with generative AI. Thinking about it some more, I came to a soft conclusion that at the moment there are a set of use cases that are overall not well suited.
There's a class of computational problems with NP complexity. What this means is not important except that these are hard to solve but easy to verify. For example, it's hard to solve a Sudoku puzzle, but easy to check that it's correct.
Similarly, I think that there's a space of GPT use cases where the results can be verified with variable difficulty, and where having correct results is of variable importance. Here's an attempt to illustrate what some of these could be:
The top right here (high difficulty to verify, but important that the results are correct) is a "danger zone", and also where deen.ai lives. I think that as large language models become more reliable, the risks will be mitigated somewhat, but in general not enough, as they can still be confidently wrong.
In the bottom, the use cases are much less risky, because you can easily check them, but the product might still be pretty useless if the answers are consistently wrong. For example, we know that ChatGPT still tends to be pretty bad at maths and things that require multiple steps of thought, but crucially: we can tell.
The top left is kind of a weird area. I can't really think of use cases where the results are difficult to verify, but also you don't really care if they're super correct or not. The closest use case I could think of was just doing some exploratory research about a field you know nothing about, to make different parts of it more concrete, such that you can then go and google the right words to find out more from sources with high verifiability.
I think most viable use cases today live in the bottom and towards the left, but the most exciting use cases live in the top right.
Another important spectrum is when your use case relies on more on recall versus synthesis. Asking for the capital of France is recall, while generating a poem is synthesis. Generating a poem using the names of all cities in France is somewhere in between.
At the moment, LLMs are clearly better at synthesis than recall, and it makes sense when you consider how they work. Indeed, most of the downfalls come from when they're a bit too loose with making stuff up.
Personally, I think that recall use cases are very under-explored at the moment, and have a lot of potential. This contrast is painted quite well when comparing two recent posts on HN. The first is about someone who trained nanoGPT on their personal journal here and the output was not great. Similarly, Projects/amarbot used GPT-J fine-tuning and the results were also hit and miss.
The second uses GPT-3 Embeddings for searching a knowledge base, combined with completion to have a conversational interface with it here. This is brilliant! It solves the issues around needing the results to be as correct as possible, while still assisting you with them (e.g. if you wanted to ask for the nearest restaurants, they better actually exist)!
Somebody in the comments linked gpt_index so you can do this yourself, and I really think that this kind of architecture is the real magic dust that will revolutionise both search and discovery, and give search engines a run for their money.
Welp, looks like I'm a month late for the N-O-D-E Christmas Giveaway. You might be thinking "duh, Christmas is long gone", and I also found it weird that the deadline was the 31st of January, but it turns out that that was a mistake in the video and he corrected it in the comments.
Since I keep up with YouTube via RSS, I didn't see that comment until it was too late. I only thought to check again when my submission email bounced.
Oh well! At least it gave me a reason to finally write up my smart home setup! This also wasn't the first time that participating in N-O-D-E events really didn't work out for me -- in 2018 I participated in the N-O-D-E Secret Santa and sent some goodies over to the US, and really put some effort into it I remember. Unfortunately I never got anything back which was a little disappointing, but hey, maybe next time!
I've been planning to start this project for a while, as well as document the journey, but never really got around to it. I had a calendar reminder that tomorrow the N-O-D-E Christmas Giveaway closes, which finally gave me the kick in the butt needed to start this one! I also want to use this as an opportunity to create short-form videos on TikTok to learn more about it (in this case, documenting the journey). The project page is here.
Last weekend I built a small AI product: https://deen.ai. Over the course of the week I've been gathering feedback from friends and family (Muslim and non-Muslim). In the process I learned a bunch and made things that will be quite useful for future projects too. More info here!
Amarbot was using GPT-J (fine-tuned on my chat history) in order to talk like me. It's not easy to do this if you follow the instructions in the main repo, plus you need a beefy GPU. I managed to do my training in the cloud for quite cheap using Forefront. I had a few issues (some billing-related, some privacy-related) but it seems to be a small startup, and the founder himself helped me resolve these issues on Discord. As far as I could see, this was the cheapest and easiest way out there to train GPT-J models.
Unfortunately, they're shutting down.
As of today, their APIs are still running, but the founder says they're winding down as soon as they send all customers their requested checkpoints (still waiting for mine). This means Amarbot might not have AI responses for a while soon, until I find a different way to run the model.
As for fine-tuning, there no longer seems to be an easy way to do this (unless Forefront open sources their code, which they might, but even then someone has to host it). maybe#6742 on Discord has made a colab notebook that fine-tunes GPT-J in 8-bit and kindly sent it to me.
I've always thought that serverless GPUs would be the holy grail of the whole microservices paradigm, and it might be close, but hopefully that would make fine-tuning easy and accessible again.
It's official — Amarbot has his own number. I did this because I was using him to send some monitoring messages to WhatsApp group chats, but since it was through my personal account, it would mark everything before those messages as read, even though I hadn't actually read them.
My phone allows me to have several separate instances of WhatsApp out of the box, so all I needed was another number. I went for Fanytel to get a virtual number and set up a second WhatsApp bridge for Matrix. Then I also put my profile picture through Stable Diffusion a few times to make him his own profile picture, and presto: Amarbot now has his own number!
In case the profile picture is not clear enough, the status message also says that he's not real. I have notifications turned off for this number, so if you interact with him, don't expect a human to ever reply!
As of today, if you react to a message you send me on WhatsApp with a robot emoji (🤖), Amarbot will respond instead of me. As people in the past have complained about not knowing when I'm me and when I'm a bot, I added a very clear disclaimer to the bottom of all bot messages. This is also so I can filter them out later if/when I want to retrain the model (similar to how DALL-E 2 has the little rainbow watermark).
The reason I was able to get this to work quite easily is thanks to my existing Node-RED setup. I'll talk more about this in the future, but essentially I have my WhatsApp connected to Matrix, and Node-RED also connected to Matrix. I watch for message reactions but because those events don't tell you what the actual text of the message is that was reacted to was, only the ID, I store a small window of past messages to check against. Then I query the Amarbot worker with the body of that message and format and respond with the reply.
This integrates quite seamlessly with other existing logic I had, like what happens if you ask me to tell you a joke!
Amarbot has been trained on the entirety of my WhatsApp chat logs since the beginning of 2016, which I think is when I first installed it. There are a handful of days of logs missing here and there as I've had mishaps with backing up and moving to new phones. It was challenging to extract my chat logs from my phone, so I wrote an article about this.