This page is a feed of all my #projects posts in reverse chronological order. You can subscribe to this feed in your favourite feed reader through the icon above. You can also get a weekly digest of all of my posts via email by subscribing here:
Some weeks ago I built the "Muslim ChatGPT". From user feedback, I very quickly realised that this is one use case that absolutely won't work with generative AI. Thinking about it some more, I came to a soft conclusion that at the moment there are a set of use cases that are overall not well suited.
There's a class of computational problems with NP complexity. What this means is not important except that these are hard to solve but easy to verify. For example, it's hard to solve a Sudoku puzzle, but easy to check that it's correct.
Similarly, I think that there's a space of GPT use cases where the results can be verified with variable difficulty, and where having correct results is of variable importance. Here's an attempt to illustrate what some of these could be:
The top right here (high difficulty to verify, but important that the results are correct) is a "danger zone", and also where deen.ai lives. I think that as large language models become more reliable, the risks will be mitigated somewhat, but in general not enough, as they can still be confidently wrong.
In the bottom, the use cases are much less risky, because you can easily check them, but the product might still be pretty useless if the answers are consistently wrong. For example, we know that ChatGPT still tends to be pretty bad at maths and things that require multiple steps of thought, but crucially: we can tell.
The top left is kind of a weird area. I can't really think of use cases where the results are difficult to verify, but also you don't really care if they're super correct or not. The closest use case I could think of was just doing some exploratory research about a field you know nothing about, to make different parts of it more concrete, such that you can then go and google the right words to find out more from sources with high verifiability.
I think most viable use cases today live in the bottom and towards the left, but the most exciting use cases live in the top right.
Another important spectrum is when your use case relies on more on recall versus synthesis. Asking for the capital of France is recall, while generating a poem is synthesis. Generating a poem using the names of all cities in France is somewhere in between.
At the moment, LLMs are clearly better at synthesis than recall, and it makes sense when you consider how they work. Indeed, most of the downfalls come from when they're a bit too loose with making stuff up.
Personally, I think that recall use cases are very under-explored at the moment, and have a lot of potential. This contrast is painted quite well when comparing two recent posts on HN. The first is about someone who trained nanoGPT on their personal journal here and the output was not great. Similarly, Amarbot used GPT-J fine-tuning and the results were also hit and miss.
The second uses GPT-3 Embeddings for searching a knowledge base, combined with completion to have a conversational interface with it here. This is brilliant! It solves the issues around needing the results to be as correct as possible, while still assisting you with them (e.g. if you wanted to ask for the nearest restaurants, they better actually exist)!
Somebody in the comments linked gpt_index so you can do this yourself, and I really think that this kind of architecture is the real magic dust that will revolutionise both search and discovery, and give search engines a run for their money.
Welp, looks like I'm a month late for the N-O-D-E Christmas Giveaway. You might be thinking "duh, Christmas is long gone", and I also found it weird that the deadline was the 31st of January, but it turns out that that was a mistake in the video and he corrected it in the comments.
Since I keep up with YouTube via RSS, I didn't see that comment until it was too late. I only thought to check again when my submission email bounced.
Oh well! At least it gave me a reason to finally write up my smart home setup! This also wasn't the first time that participating in N-O-D-E events really didn't work out for me -- in 2018 I participated in the N-O-D-E Secret Santa and sent some goodies over to the US, and really put some effort into it I remember. Unfortunately I never got anything back which was a little disappointing, but hey, maybe next time!
I've been planning to start this project for a while, as well as document the journey, but never really got around to it. I had a calendar reminder that tomorrow the N-O-D-E Christmas Giveaway closes, which finally gave me the kick in the butt needed to start this one! I also want to use this as an opportunity to create short-form videos on TikTok to learn more about it (in this case, documenting the journey). The project page is here.
Last weekend I built a small AI product: https://deen.ai. Over the course of the week I've been gathering feedback from friends and family (Muslim and non-Muslim). In the process I learned a bunch and made things that will be quite useful for future projects too. More info here!
Amarbot was using GPT-J (fine-tuned on my chat history) in order to talk like me. It's not easy to do this if you follow the instructions in the main repo, plus you need a beefy GPU. I managed to do my training in the cloud for quite cheap using Forefront. I had a few issues (some billing-related, some privacy-related) but it seems to be a small startup, and the founder himself helped me resolve these issues on Discord. As far as I could see, this was the cheapest and easiest way out there to train GPT-J models.
Unfortunately, they're shutting down.
As of today, their APIs are still running, but the founder says they're winding down as soon as they send all customers their requested checkpoints (still waiting for mine). This means Amarbot might not have AI responses for a while soon, until I find a different way to run the model.
As for fine-tuning, there no longer seems to be an easy way to do this (unless Forefront open sources their code, which they might, but even then someone has to host it). maybe#6742 on Discord has made a colab notebook that fine-tunes GPT-J in 8-bit and kindly sent it to me.
I've always thought that serverless GPUs would be the holy grail of the whole microservices paradigm, and it might be close, but hopefully that would make fine-tuning easy and accessible again.
It's official — Amarbot has his own number. I did this because I was using him to send some monitoring messages to WhatsApp group chats, but since it was through my personal account, it would mark everything before those messages as read, even though I hadn't actually read them.
My phone allows me to have several separate instances of WhatsApp out of the box, so all I needed was another number. I went for Fanytel to get a virtual number and set up a second WhatsApp bridge for Matrix. Then I also put my profile picture through Stable Diffusion a few times to make him his own profile picture, and presto: Amarbot now has his own number!
In case the profile picture is not clear enough, the status message also says that he's not real. I have notifications turned off for this number, so if you interact with him, don't expect a human to ever reply!
As of today, if you react to a message you send me on WhatsApp with a robot emoji (🤖), Amarbot will respond instead of me. As people in the past have complained about not knowing when I'm me and when I'm a bot, I added a very clear disclaimer to the bottom of all bot messages. This is also so I can filter them out later if/when I want to retrain the model (similar to how DALL-E 2 has the little rainbow watermark).
The reason I was able to get this to work quite easily is thanks to my existing Node-RED setup. I'll talk more about this in the future, but essentially I have my WhatsApp connected to Matrix, and Node-RED also connected to Matrix. I watch for message reactions but because those events don't tell you what the actual text of the message is that was reacted to was, only the ID, I store a small window of past messages to check against. Then I query the Amarbot worker with the body of that message and format and respond with the reply.
This integrates quite seamlessly with other existing logic I had, like what happens if you ask me to tell you a joke!
Amarbot has been trained on the entirety of my WhatsApp chat logs since the beginning of 2016, which I think is when I first installed it. There are a handful of days of logs missing here and there as I've had mishaps with backing up and moving to new phones. It was challenging to extract my chat logs from my phone, so I wrote an article about this.