This page is a feed of all my posts in reverse chronological order. You can subscribe to this feed in your favourite feed reader through the icon above.
The ficus has been a bit thirsty, as you can tell by the leaves that have shrivelled slightly:
So I thought it's high time to first clean up the dead leaves, and also add water to the terrarium. The moss too seemed to be very dry (and I trimmed some of the long freaky moss prongs). I accidentally cut a little leaf, so I put it back into the soil, however it seems to be immune to decomposition:
This was all mid-September, and I was a tiny bit freaked out by these additional weird prongs:
Then, a little over a week ago, it started looking even more alien. The leaves were ok now (some browning?) but now these prongs looked sort of fungal. Don't like it.
A photo from today; that alien prong has grown further and is reaching the ground. I think I need to snip it before it consumes the terrarium!
This post documents my participation in a hackathon 2 weeks ago. I started writing this a week ago and finished it now, so it's a little mixed. Scroll to the bottom for the final result!
Rowan reached out asking if I'm going to the hackathon, and I wasn't sure which one he meant. Although I submitted a project to the recent Gemini hackathon, it was more of an excuse to migrate an expensive GPT-based project ($100s per month) to Gemini (worse, but I had free credits). I never really join these things to actually compete, and there was no chance a project like that could win anyway. What's the opposite of a shoo-in? A shoo-out?
So it turns out Rowan meant the Mistral x a16z hackathon. This was more of a traditional "weekend crunch" type of hackathon. I felt like my days of pizza and redbull all-nighters are long in the past by at least a decade, but I thought it might be fun to check out that scene and work on something with Rowan and Nour. It also looked like a lot of people I know are going. So we applied and got accepted. It seems like this is another one where they only let seasoned developers in, as some of my less technical friends told me they did not get in.
Anyway, we rock up there with zero clue on what to build. Rowan wants to win, so is strategising on how we can optimise by the judging criteria, researching the background of the judges (that we know about), and using the sponsors' tech. I nabbed us a really nice corner in the "Monitor Room" and we spent a bit of time snacking and chatting to people. The monitors weren't that great for working (TV monitors, awful latency) but the area was nice.
Since my backup was a pair of XReal glasses, a lot of people approached me out of curiosity, and I spent a lot of time chatting about the ergonomics of it, instead of hacking. I also ran into friends I didn't know would be there: Martin (Krew CTO and winner of the Anthropic hackathon) was there, but not working on anything, just chilling. He intro'd me to some other people. Rod (AI professor / influencer, Cura advisor) was also there to document and filmed us with a small, really intriguing looking camera.
We eventually landed on gamifying drug discovery. I should caveat that what we planned to build (and eventually built) has very little scientific merit, but that's the goal of a hackathon; you're building for the sake of building and learning new tools etc. Roughly-speaking, we split up the work as follows: I built the model and some endpoints (and integrated a lib for visualising molecules), Nour did the heavy-lifting on frontend and deploying, and Rowan did product design, demo video, pitch, anything involving comms (including getting us SSH access to a server with a nice and roomy H100), and also integrated the Brave API to get info about a molecule (Brave was one of the sponsors).
You're probably wondering what we actually built though. Well, after some false starts around looking at protein folding, I did some quick research on drug discovery (aided by some coincidental prior knowledge I had about this space). There's a string format for representing molecules called SMILES which is pretty human-readable and pervasive. I downloaded a dataset of 16k drug molecules, and bodged together a model that will generate new ones. It's just multi-shot (no fine-tuning, even though the judges might have been looking for that, as I was pretty certain that would not improve the model at all) and some checking afterwards that the molecule is physically possible (I try 10 times to generate one per call, which is usually overkill). I also get some metadata from a government API about molecules.
On the H100, I ran one of those stable diffusion LORAs. It takes a diagram of a molecule (easy to generate from a SMILES string) and then does img2img on it. No prompt, so it usually dreams up a picture of glasses or jewellery. We thought this was kind of interesting, so left it in. We could sort of justify it as a mnemonic device for education.
Finally, I added another endpoint that takes two molecules and creates a new one out of the combination of the two. This was for the "synthesise" journey, and was inspired by those games where you combine elements to form new things.
Throughout the hackathon, we communicated over Discord, even though on the Saturday, Rowan and I sat next to each other. Nour was in Coventry throughout. It was actually easier to talk over Discord, even with Rowan, as it was a bit noisy there. Towards the end of Saturday, my age started to show and I decided to sleep in my own bed at home. Rowan stayed at the hackathon place, but we did do some work late into the night together remotely, while Nour was offline. The next day, I was quite busy, so Rowan and Nour tied up the rest of the work, while I only did some minor support with deployment (literally via SSH on my phone as I was out).
Finally, Rowan submitted our entry before getting some well-deserved rest. He put up some links to everything here, including the demo video.
Not long after the contest was over, they killed the H100 machines and seem to have invalidated some of the API keys also, so it looks like the app is also not quite working anymore, but overall it was quite fun! We did not end up winning (unsurprisingly) but I feel like I've achieved my own goals. Rowan and Nour are very driven builders who I enjoyed working with, and CodeNode is a great venue for this sort of thing. The next week, Rowan came over to my office to co-work and also dropped off some hackathon merch. I ended up passing on the merch to other people, as I felt a bit like I might be past my hackathon years now!
I dream of closed, self-sufficient systems. A few weeks ago, I visited the London Permaculture Festival as I'm very, very interested in permaculture. There, Hendon Soap had a stand. They were selling a DIY Lip Balm Kit.
I use lip balm regularly, so it occurred to me I ought to learn about how it's made, and maybe make my own in the long run! Today I tried my hand at making it. It's not as easy as it looks! In theory you toss everything into a sauce pan, put in 60g of oil (I used vegetable oil), and once everything is melted, you put it in the tins and put those in the freezer as fast as possible so they cool rapidly (this is to avoid a grainy texture). In theory you can also add some edible essential oils if you want a scent.
Long story short: I made a huge mess. And the stuff is quite difficult to clean. I managed to do one tin approximately right.
The rest I was mostly able to salvage and store in a little jar. When I finish that first tin, I'll re-melt that and maybe this time use a little pot with a pouring tip to avoid spilling, as well as some kind of tool for moving the very hot tins into the freezer properly! It was so hot it melted the ice in my freezer away entirely, creating a nice little indentation to hold it.
Please don't judge the ungodly amount of ice cream -- there's a heatwave coming!
Just wanted to post a quick update on the ant farm. They're thriving and having all sorts of adventures!
It's frightening how close I keep getting to killing my ficus -- they're supposed to be hardy! An update is long overdue.
First off, I moved the UV LEDs into an IKEA greenhouse. This made things much tidier and I have room for the upcoming project. The ficus had some strong growth up and to the left, all but touching the glass.
I also got these wooden drawers that you can kind of see in the back, with each one dedicated to a different project. The bottom one (taking up the whole width) is the horticulture one as that needs the most space.
Early May, I noticed that the leaves were getting brown. I thought it might be because the humidity is too high, so I thought I'd air it our a bit. Overnight it shrivelled up completely.
I was heartbroken and went to /r/Bonsai for advice. The only response I got was "it dried out and died". I did my best to try and get the humidity right and crossed my fingers. Almost two weeks later, I was delighted to see that it was still alive and had sprouted a new leaf! Can you see it?
And now, two months later, look at all this foliage!
I do plan to remove the dead leaves, but I wanted to wait for it to be a little stronger before I remove the glass again and whack things out of balance.
This is a post I wrote a while ago, and recently decided may be worth publishing. It's a combination of anecdotal retelling and reflection. At the time, I had been reflecting on a paradox in my behaviour. I often go to extreme lengths to enact justice, sometimes with full knowledge that doing so is not worth it.
Most recently, I ordered something off of Etsy from Spain and I paid sales tax. The total value was less than £135, so I don't have to pay import tax. So I was surprised to find that my parcel was held at customs pending a payment of gift tax. Gift tax applies to gifts over £39, but of course this was not a gift. So HMRC wants £23.45 import VAT plus an £8 handling fee, which I had to pay to get the parcel no matter what.
The only way to dispute this sort of thing is to pay, get the parcel, then fill in a form, send that and the parcel label to Border Force (Coventry) by snail mail, wait for a human to review it, and finally they might decide that you should get a refund, or they might not. Things like this really get under my skin. Part of the reason I don't return to Germany is to escape this exact kind of ridiculous bureaucracy.
When things like this happen, I'm unable to let them go. The knowledge that there's something wrong in the world will live in my head rent-free, like an OCD urge, until I can resolve it. I need to actively prevent myself from getting too involved with other people's injustices (and I don't follow world news) or else I will feel like this all the time. However, when I have personally been wronged in some way, it's in my face and I can't let it go.
Now obviously, if you consider the Lime bike I rented to get to the post office, the cost of the padded brown envelope and postage, not to mention the time I spent researching the law and queuing in the post office, it's not worth the potential money I could recover from Border Force. But if I didn't do this, it would weigh on me probably forever. Glad to report that I got the refund at least!
The silliest thing I've had this itch for: my local dentist owing me £5 from when their card machine was broken and I paid cash but they had no change. I had moved away and it was not worth the trip to Fulham for them to give it back to me. I was eventually able to let it go when they promised to donate it to a specific charity. Whether they did or not, I don't care, I could wash my hands of it on the back of their promise, and erase the "weight" of that injustice from my mind.
The most serious thing I've had this itch for: recovering the deposit of a flat I once lived in. Long story short, I filed an N208 claim to take my landlord to court for the maximum legal penalty of 3x the deposit amount for this particular offence. From there, things escalated very rapidly and I negotiated an out of court settlement with the director of his agency for my full deposit (and my brother's) plus court fees. Here too I was extremely annoyed at how difficult it is to do this sort of thing in the UK, and that I needed to physically go to a specific, hard-to-reach county court to file my big binder of post-it-laced documents (painstakingly printed at my nearest library) as evidence. Trees had to die and the printer at my local library had to waste ink!
I remember struggling to decide if I should even bother negotiating at that stage, or should instead drag them to court out of spite. I felt like the lengths I went to very few others would have the freedom or ability to (financial or otherwise), so I had some kind of duty to teach them a lesson so they can think twice before they decide to mess with a poor clueless student in the future. I wasted hours queuing in the rain at those free first-come-first-serve legal clinics to make sure I did things right.
Is this some kind of hero complex? Did I grow up watching too much Disney? Having held the metaphorical knife to their throat, I was able to release myself from this duty through the settlement, and erase the weight from my mind. But why did that weight persist for so long?
I have many examples like this, but perhaps the last one I'll mention for now is another David vs Goliath tale, starring me as David again, and British Gas as Goliath. What's notable about this is that it's a whole system/institution rather than specific incompetent/malicious individuals. This was another long saga that started when British Gas incorrectly classified my flat as a business, and had me on commercial rates rather than domestic rates.
There was never an individual person I could direct my rage at. British Gas is a giant lumbering machine inadvertently squishing things, the cogs in the machine unaware of its emergent behaviour. I did yell at some of the thickest of cogs -- British Gas employees and debt collector agents -- but overall, would you be angry at a transistor for a bug in your program?
The Energy Ombudsman eventually made them fix everything and also write me an apology letter. At the time this felt like vindication. I was ready to frame the letter; another notch on my belt in the struggle for justice. Who exactly did I beat though? I don't know the lady that signed the apology letter. I just looked her up on LinkedIn. She seems like a nice person. British Gas lumbers on.
There's not much of a conclusion to this post. I suppose it's a recognition of my sometimes irrational stubbornness. Sometimes I just can't let things go. My anger and frustration seems to spike around broken systems, rather than the actors that take advantage of those systems (the court rather than the landlord). I would love to build systems with few cogs that don't squish things.
I just got back[1] from a stretch of travel (Germany, Greece), and thought that I would do some writing while I'm there. I didn't in the end. It seems like there are mainly two points of friction for me:
I spend a lot of time communicating with people within my chat client. There are a lot of people I need to keep up with, so I've become quite efficient at it. It makes sense to incorporate writing blog posts into the same workflows. I intend to have Sentinel be my post poster. Maybe I write a message, tag it with some emoji reactions, and Sentinel takes it from there.
I did actually do quite a bit of writing since my last post, but for the second reason, I decided not to publish any of that. I might try to do so, but part of me would rather go fully stealth and cut off online presence where I can, and limit it where I can't (e.g. for work).
I wonder if there's a way to use LLMs to build "unit tests" for different personas reading my blog, every time I'm about to publish something. It could also try and make inferences from posts to mitigate against any future personas that I cannot yet predict. Maybe a tool like this could be useful for people who are not very politically well-versed and want make sure they don't make strong claims that could have them in hot water later.
For now maybe I'll simply go down my unpublished posts and reconsider publishing things I've already written. It helps me to write down complex thoughts to provide them with structure, and communicate them to others without rambling or getting lost. I do that over text too unfortunately, but even more when speaking! There is one post I plan to write like that soon.
As a rule of thumb, when I repeat myself more than twice on a topic (e.g. when explaining something to someone), I decide to write a post about it, as that's enough evidence for me that other people might benefit from it, and indeed I can send the next person to ask me the same question the post I have already written. There's a post like that coming soon too.
At least for those two types of posts I don't need to overthink much. I've also decided to start writing an autobiography, at different levels of granularity. I do not intend to publish this (maybe only individual anecdotes), but I think it will help me put my life in perspective and reflect on it. I tend to easily forget events as well, so it will help me remember.
I've also recently finished a huge migration from AWS to GCP. AWS has been my workhorse for over a decade, but has recently gotten too expensive, as the last of my AWS startup credits dried up. I have since founded another company that would be eligible for credits, but I can't redeem them on the same account either way, so a migration was inevitable. Why GCP? I have $350k of credits on there for the next two years. This website now lives on GCP too, which feels like a new beginning.
That's a lie; I got back around 2 weeks ago, wrote most of this post, didn't publish it. ↩︎
A few days ago I turned 31. Birthdays are not something I look forward to. However, Veronica gifted me an ant farm kit, which is exciting. I have always been fascinated by ants and the emergent behaviour of ant colonies, but never had the chance to observe that IRL through glass panes. In our flat in Egypt, ants were very common, so I would sometimes intentionally leave a bit of bread out on the kitchen counter just to watch them slowly take it apart and carry crumbs in a line back into the walls from whence they came.
When I was little, I also remember seeing ant farms that use a special type of transparent gel rather than sand, so you could see everything the ants were doing. I think I saw it on ThinkGeek, a now defunct online store, and wishing I could have it.
Well, today is the day that I can say I have a proper ant farm right here in my room!
The kit was surprisingly compact.
The fledgling colony made the long trip from Spain in a little test tube. Her majesty, Queen Gina, and her children, had water soaking in through the cotton on the left, some seeds as snacks while travelling, and took care of the larvae in the back of their little temporary home.
Before they could move in to their new home, I first had to make it nice and cosy for them. I started off by slowly soaking the sand over the course of the day, then decorating the top of their home. I also added their little sugar water dispenser and seed bowl. Finally, I dug a hole for them in the centre as a starting point.
I then set them free on their new home, but they stayed in the test tube for a while to build up the courage. One little brave ant ventured out, but was bringing back blue pebbles and clogging up the test tube once more. She really loved her pebbles. I later read that this is normal behaviour in order to make the queen feel more comfortable.
The moment I stopped watching they very quickly moved into their new home. I came to find the test tube empty and removed it. They had started digging and also moving the sesame seeds that Veronica had left them.
And then they started digging a LOT. I can see the beginnings of little chambers, and I read that the one at the very bottom is for storing seeds, as it's the least humid area, and they take advantage of that to prevent seeds from sprouting.
I'm excited to see how this colony develops! It's already developing much faster than I expected. I'm awaiting a new set of mixed bird seeds in the mail today, as I don't think they like sesame seeds that much, and Veronica also left them a broken up spaghetti.
I recently wrote about my ficus terrarium and how some kind of mould defeated my fittonia. Since then, I've learned a lot more about terrariums (including learning about ones where you keep frogs and reptiles, from a very friendly pet shop employee). The ficus however, which is meant to be quite hardy, did eventually have the same mould problem.
By early January, it was getting quite dire. No matter how much I tried to control the humidity, the leaves were all dying and there was thread-like mould forming like spider webs. I don't think I took a photo of this.
I did some more research and found a potential solution: little creatures called springtails (this was later confirmed to me by friendly pet shop guy). I ordered live Springtails off of Amazon (I didn't know you could do that) and they arrived quite quickly.
They arrive in some soil. The cardboard box they arrived in was smaller than the plastic container, so the act of squishing the plastic container into that box had caused it to open. It was kind of a messy situation. Regardless, I introduced them into the terrarium, slowly adding more over time. They're kind of hard to see with the naked eye, but if you really focus you can see them crawling around. They can also jump, which to me looks like they just disappear.
Fast-forward to today... it worked! They cleaned up the mould on the glass as well as the white threads that were attacking the plant. They seem to fit well into this ecosystem! I still have the container with the rest of them, which I might have a future use for (hint: my ant farm I will write about soon).
I am glad to report that new leaves have sprouted, and the health of the ficus looks like it's slowly improving!
Stay tuned to know more about my horticulture adventures, including a new plant project that I will write about soon! Hint:
Do you remember when you could get CD-ROMs with a ton of old games on them? A few months ago, I remembered playing one such game that was on one of those CDs. I can't 100% remember where I got it, but I was maybe around 10 years old, and my family was struggling financially. A family friend, Aunt Wafaa (who has since unfortunately passed of breast cancer), took me to a MediaMarkt (consumer electronics store) and said I can pick out any game for my birthday. I probably thought I could get 50 or 100 or however many were in that "collection" CD for the price of one.
What stuck with me about this particular game I remembered was the soundtrack. They were all MIDI tracks, and I could remember every note. For the life of me, I couldn't remember the name of this game, but I found a subreddit called /r/tipofmyjoystick where you can post about games you don't remember the name of, and very often the community manages to find it. Indeed, when I made a post titled [PC][00s]Top-down rpg/puzzle game
, Reddit user Vellidragon cracked it in no time! Here is my post (warning: spoilers marked in the description):
I asked him how he knew, and he said:
I had the first Cult on a shareware CD way back, wondered if that could be it but found videos of the sequel when searching and realised it must be that.
I was delighted to have found it again and after some digging, found that the original website was still up, and you could still download the game for free from there. I also read that there might be an Android port now, but I couldn't quite find it. I was also surprised to find that the website contained a stitched-together map of the game (warning: contains spoilers), a view I had never seen before.
I decided to play through the game once more. I did so in three or four sessions together with Veronica. I only remembered bits and pieces about the plot so it was as if I had never played it before. Truly a blast from the past.
I also remembered the frustrating bits that time had softened my memory of. This was a typical adventure RPG of the time, where you talk to NPCs who you do small tasks for that may give you items that you then use for other tasks etc. One annoying mechanic however was the way that the progression was controlled using doors and switches. Sometimes doors are opened by NPCs and it's obvious where you need to go, but other times a switch you press opens a door on the other side of the map, with no logical connection, so you spend a lot of time running around the map checking if any new doors have opened. It's easy to get stuck and either have to scroll through the walkthroughs, or try to use every item in your inventory on every character in the off chance that it does something.
Anyway, after replaying the game, I took another look at that map. I couldn't quite put the feeling to words. It's like visiting a familiar place that you've moved on from, but with completely new eyes. It somehow contains this whole world that I spent hours in, in a single picture. It also makes me want to build my own worlds in this way, where the individual components are so simple, but the emergent complexity and richness of the stories within stick with you for decades.
In April 2023, I came across this article and found it very inspiring. I had already been experimenting with ways of visualising a digital library with Shelfiecat. The writer used t-SNE to "flatten" a higher-dimensional coordinated space into 2D in a coherent way. He noticed that certain genres would cluster together spatially. I experimented with these sorts of techniques a lot during my PhD and find them very cool. I just really love this area of trying to make complicated things tangible to people in a way that allows us to manipulate them in new ways.
My use case was my thousands of bookmarks. I always felt overwhelmed by the task of trying to make sense of them. I might as well have not bookmarked them as they all just sat there in piles (e.g. my Inoreader "read later" list, my starred GitHub repos, my historic Pocket bookmarks, etc). I had built a database of text summaries of a thousand or so of these bookmarks using Url summariser, and vector embeddings of these that I dumped into a big CSV file, which at the time cost me approximately $5 of OpenAI usage. This might seem steep, but I think at the time nobody had access to GPT-4 yet, and the pricing also wasn't as low. I had also I accidentally had some full-length online books bookmarked, and my strategy of recursively summarising long text (e.g. YouTube transcripts) didn't have an upper limit, so I had some book summaries as well.
Anyway, I then proceeded to tinker with t-SNE and some basic clustering (using sklearn
which is perfect for this sort of experimentation). I wanted to keep my data small until I found something that sort of works, as sometimes processing takes a while which isn't conducive to iterative experimentation! My first attempt was relatively disappointing:
Here, each dot is a bookmark. The red dots are not centroids like you would get from e.g. k-means clustering, but rather can be described as "the bookmark most representative of that cluster". I used BanditPAM for this, after reading about it via this HackerNews link, and thinking that it would be more beneficial for this use case.
I was using OpenAI's Ada-2 for embeddings, which outputs vectors with 1536 dimensions, and I figured the step from 1536 to 2 is too much for t-SNE to give anything useful. I thought that maybe I need to do some more clever dimensionality reduction techniques first (e.g. PCA) to get rid of the more useless dimensions first, before trying to visualise. This would also speed up processing as t-SNE does not scale well with number of dimensions. Reduced to 50, I started seeing some clusters form:
Then 10:
Then 5:
5 which wasn't much better than 10, so I stuck with 10. I figured my bookmarks weren't that varied anyway, so 10 dimensions are probably good enough to capture the variance of them. Probably the strongest component will be "how related to AI is this bookmark" and I expect to see a big AI cluster.
I then had a thought that maybe I should use truncated SVD instead of PCA, as that's better for sparse data, and I was picturing this space in my mind to really be quite sparse. The results looked a bit cleaner:
Now let's actually look at colouring these dots based on the cluster they're in. Remember that clustering and visualising are two separate things. So you can cluster and label before reducing dimensions for visualising. When I do the clustering over 1500+ dimensions, and colour them based on label, the visualisation is quite pointless:
When I reduce the dimensions first, then we get some clear segments, but the actual quality of the labelling is likely not as good:
And as expected, no dimension reduction at all gives complete chaos:
I started looking at the actual content of the clusters and came to a stark realisation: this is not how I would organise these bookmarks at all. Sure, the clusters were semantically related in some sense, but I did not want an AI learning resource to be grouped with an AI tool. In fact, did I want a top-level category to be "learning resources" and then that to be broken down by topic? Or did I want the topic "AI" to be top-level and then broken down into "learning resources", "tools", etc.
I realised I hadn't actually thought that much about what I wanted out of this (and this is also the main reason why I limited the scope of Machete to just bookmarks of products/tools). I realised that I would first need to define that, then probably look at other forms of clustering.
I started a fresh notebook, and ignored the page summaries. Instead, I took the page descriptions (from alt tags or title tags) which seemed in my case to be much more likely to say what the link is and not just what the content is about. This time using SentenceTransformer (all-MiniLM-L6-v2
) as Ada-2 would not have been a good choice here, and frankly, was probably a bad choice before too.
I knew that I wanted any given leaf category (say, /products/tools/development/frontend/
) shouldn't have more than 10 bookmarks or so. If it passes that threshold, maybe it's time to go another level deeper and further split up those leaves. This means that my hierarchy "tree" would not be very balanced, as I didn't want directories full of hundreds of bookmarks.
I started experimenting with Agglomerative Clustering, and visualising the results of that with a dendrogram:
Looking at the where bookmarks ended up, I still wasn't quite satisfied. Not to mention, there would need to be maybe some LLM passes to actually decide what the "directories" should be called. It was at this point that I thought that maybe I need to re-evaluate my approach. I was inadvertently conflating two separate problems:
There's a hidden third problem as well: potentially adjusting the tree every time you add a new bookmark. E.g. what if I suddenly started a fishing hobby? My historical bookmarks won't have that as a category.
I thought that perhaps (1) isn't strictly something I need to automate. I could just go through the one-time pain of skimming through my bookmarks and trying to come up with a relatively ok categorisation schema (that I could always readjust later) maybe based on some existing system like Johnny•Decimal. I could also ask GPT to come up with a sane structure given a sample of files.
As time went on, I also started to spot some auto-categorisers in the wild for messy filesystems that do the GPT prompting thing, and then also ask GPT where the files should go, then moves them there. Most notably, this.
That seems to me so much easier and reliable! So my next approach is probably going to be having each bookmark use GPT as a sort of "travel guide" in how it propagates the tree. "I'm a bookmark about X, which one of these folders should I move myself into next?" over and over until it reaches the final level. And when the directory gets too big, we ask GPT to divide it into two.
The LLM hammer seems to maybe win out here -- subject to further experimentation!
Chat as an interface has always been something I thought about a lot. After all, it's a close analogue to spoken conversation, our most natural form of communication.
The most basic chat interface is an input box and a chronological history of messages, so you can follow the conversation. Messages are often augmented with emojis etc to fill in the gaps for intonation and body language. If you need higher communication bandwidth, voice messages do it sometimes too. Advantages over face-to-face conversation is that text-based conversations have the option of being asynchronous and much longer-lived, potentially even pen-pal style correspondences.
The moment you start thinking about group conversations, some problems begin to be unearthed. One problem is it can get quite crowded, as you're sharing a linear chat history. It's hard to follow multiple threads of conversation that have been flattened into one, when in real life conversations can branch off and diverge.
This is a problem in video conferences too. While at a social event, groups naturally divide into smaller clusters of people in their own bubble of conversation, this has to be done explicitly in video conferences through breakout rooms and similar mechanics. Otherwise all the attention and spotlight is aimed at the person currently talking, which can throw off the dynamics.
I first noticed this phenomenon when I was running the Duolingo German language events in London. It's already common for people who don't know the language well to be shy about speaking up, but when covid started and we switched to Zoom, it was very easy for whomever is speaking to get stage fright, even if they're not normally shy. What then ends up happening is that two or three people will engage in conversation while the rest watch, unless the host (me in that case) takes control of the group dynamics. This was much easier to do in-person, especially where I could see the person's face and gauge how comfortable they are, so I don't put them on the spot (e.g. by bringing them into the conversation with simple yes/no questions).
Anyway, during covid I became quite interested in products that try to solve these problems by capturing aspects of real-life communication through product design. I remember imagining a 2D virtual environment with spatial audio in the context of my PhD research. It turned out somebody was already building something similar: a fellow named Almas, and I remember having a call with him about SpatialChat (a conversation full of lovely StarCraft metaphors). This was an environment that allowed you to replicate the act of physically leaving a huddle and moving to a different cluster to talk. You could only be heard by those in "earshot".
A 2D game called Manyland did something similar with text-only, where text would appear above the head of your character as you were typing. This created interesting new dynamics and etiquette around "interrupting" each other, as well as things like awkward silences, which don't exist when you're waiting for someone to type. There was even an odd fashion around typing speed at one point.
Interestingly, you're not occupying space in the chat log by chatting; you're filling the space above your head, so you just need to find a good place to perch. Two people can respond to the same thing at the same time. However, one person can't quite multi-task their responses / threads without jumping back and forth between people, but after all that's how it works in real life too, no?
I won't go over the history of different chat platforms and apps, but we've seen a lot of patterns that try and create some structure around communication, here in order from more ephemeral to less ephemeral:
I like to imagine conversations as trees, where branches can sprout and end just as fast. Have you ever been in an argument where someone rattles off a bunch of bad points, but you can only counter them in series? Each of your responses may in turn trigger several additional responses, and you get this exponentially growing tree and eventually you're writing essays dismantling every point one by one.
In real life, it's often hard to remember everything that was said, so you focus on just the most important parts. Or you deliberately prune the branches to the conversation doesn't become unwieldy. Some people like to muddy the waters and go off topic and it's up to you to steer the conversation back to the main trunk.
But not everything is a debate. A friend of mine figured that this tree, in 2D, ought to be the way to hold conversations. Big big internet conversations (he used social media as an example) are all adding nodes to far off branches of a huge tree. I quite like that picture. It would certainly allow for conversations to happen in parallel at the same time as you can hop back and forth between branches.
ChatGPT made the choice that chats should be linear, but you can start a new chat with a new instance of the AI at any time, and go back to old chats through the history tab. This seems to make sense for chatting with an AI assistant, but an anti-pattern emerges...
Have you ever gone down a conversation with ChatGPT only to realise that it's dug itself into a hole, so you scroll up to just after the last "good" message and edit the message to create a new, better timeline? I do this a lot, and it reminded me of undo/redo in text editors.
Undo and redo are normally linear, and if you go back in time and make a change, suddenly the old "future" is no longer accessible to you. We've all done it where we accidentally typed something after pressing undo a bunch of times to check a past version, losing our work.
Someone made a plugin for vim that allows you to navigate a tree of timelines that you create by undo-ing, sort of like automatic git branching and checkout. I feel like this ought to be a UI for interacting with ChatGPT too! Already this is being used for better responses and I feel like there must have been attempts at creating a UI like this, but I haven't seen one that does this elegantly.
This has been kind of a stream of though post, inspired by my post on resetting my AI assistant's chat memory, so I'm not entirely sure what the point I'm trying to make is. I think I'm mainly trying to narrow down the ergonomics of an ideal chat interface or chat in a virtual environment.
I think you would probably have some set of "seed" nodes -- the stem cells to your threads -- which are defined by individuals (i.e. DMs), or groups with a commonality, or topics. These would somehow all capture the nuances of real-life communication, but improve on that with the ability to optionally create ephemeral threads out of reply branches. I'm not yet sure what the UI would physically look like though.
Sentinel, my AI personal assistant has evolved a bit since I last wrote about him. I realised I hadn't written about that project in a while when it came up in conversation and the latest reference I had was from ages ago. The node-red logic looks like this now:
As he's diverged from simply being an interface to my smart home stuff, as well as amarbot which is meant to replace me, I decided to start a new project log just for Sentinel-related posts.
Edit: this post inspired me to write more at length about chat as an interface here.
I was recently offered condolences when someone found out about the unfortunate fate that befell my Bonsai plant. There's a small update!
A little over a month ago, Veronica and I went to a terrarium workshop and put together a little home for a little ficus. They're very hardy.
I picked everything out down to the colours of the sand layers and we carefully placed and aligned everything. It's not just pretty, but everything has a function. For example, the moss tells you when when it's time to add (distilled!) water, the large rocks can provide a surface for that water to evaporate (rather than get absorbed by the moss). We also added two dinosaurs, one patting the head of the other one.
As hardy as the ficus is, unfortunately the fittonia in the back was a lot more temperamental. I thought it maybe couldn't handle the humidity (the guides said you shouldn't let too much condensation build up on the glass) or it needed more light. It unfortunately started getting these strands of mould and eventually became a gloop.
I called the workshop instructor and he said there may be a number of reasons for this, asked for photos to troubleshoot, and he offered to replace it if I came by, but Shoreditch is kind of a trek from here. He said I should take it out so that it doesn't damage the ficus. In the meantime, I also got some grow lights so I can have a bit more control over the environment, rather than be at the mercy of UK winter weather!
I've been participating in the Indieweb Carnival for a while. Usually I'll take the themes on by freewriting. These past two months, for the themes Self-Care and Routine and Community and belonging, I wrote approx 2000 words each, but I never published these.
The themes are often deep and elicit introspection and vulnerability, especially these past two. I tend to knock them out on long train rides or flights where I'm offline (most recently for the ones for Oct, Nov, Dec: Birmingham, Manchester, Cairo, Ismailia, Riyadh). While tackling these themes is a useful exercise for me, I realised that sharing the output with the public is not.
Prior to the past two themes, I published my Indieweb Carnival writing, but I had them unlisted on my website. You can only read them via webmention links or if they're linked to by Indieweb Carnival organisers. The reason for this is that it's not unusual that I write about something personal, and the target reader in my mind is a complete stranger. I'm oddly ok with sharing certain things with strangers, but the issue is that not only strangers visit my website.
Sometimes I will reread what I wrote, but imagine a different persona reading. I'm comfortable sharing different things with different people, e.g. family, friends, colleagues, clients, etc. These categories are not "nested" in the sense that my inner circle ought to have access to everything, with permissions tightening as you move out the circles. Instead, these are usually non-overlapping and dependent on the content itself.
When I re-read something from the perspective of a different group, or even individual, inevitably I will want to modify what I wrote. The more I share, the harder it is to juggle these personas in my mind and release something that I'm happy sharing with the widest group of all: everyone.
For the past two topics, I found that I went deep enough that I could no longer imagine what future inference could be made from that personal information, who may read this one day after I've forgotten about it or have changed my mind on what I wrote, and how the current people in my life may view these things.
In short: the less I think about what I write, the more I overthink who reads, and vice versa. That's no good if you want to freewrite. So I decided to stop publishing these posts. Or rather, restricting them to an audience of one: me[1].
I suppose my website might be a little less colourful as a result, but maybe one day I can figure out some kind of access control system, or create a new identity separate from my main one.
This is not the most restrictive audience, since I assume that anything I write may possibly one day be read by others (e.g. though a data breach), and I also draw a distinction between different versions of myself across time, and AI (think Roko's Basilisk), but that's a topic for another time. ↩︎
I've had a problem for a while around organising articles and bookmarks that I collect. I try not to bookmark things arbitrarily, and instead be mindful of the purpose of doing so, but I still have thousands of bookmarks, and they grow faster than I can process them.
I've tried automated approaches (for example summarising the text of these webpages and clustering vector embeddings of these) with limited success so far. I realised that maybe I should simply eat the frog and work my way through these, then develop a system for automatically categorising any new/inbound bookmark on the spot so they stay organised in the future.
A new problem was born: how can I efficiently manually organise my bookmarks? The hardest step in my opinion is having a good enough overview of the kinds of things I bookmark such that I can holistically create a hierarchy of categories, rather than the greedy approach where I tag things on the fly.
I decided to first focus on bookmarks that I would categorise as "tools", which are products or services that I currently use, may use in the future, want to look at to see if they're worth using, or may want to recommend to others in the future if they express a particular need. These are a bit more manageable as they're a small subset; the bigger part of my bookmarks are general knowledge resources (articles etc).
At the moment, I rely on my memory for the above use cases. Often I don't remember the name of a tool, but I can usually find it with a substring search of the summaries. Often I don't remember tools in the first place, and am surprised to find that I bookmarked something that I wish I would have remembered existed.
Eventually, I landed on a small script to convert all my notes into files, and then using different file browsers to drag and drop files into the right place. This was still very cumbersome.
On the front page of my public notes I have two different visualisations for browsing these notes. I find them quite useful for getting an overview. I thought it might be quite useful to use the circles view for organisation too. So I thought I should make a minimal file browser that displays files in this way, for easy organisation.
Originally, I took this as an excuse to try Tauri (a lighter Electron equivalent built on Rust that uses native WebViews instead of bundled Chromium), and last month I did get an MVP working, but then I realised that I'm making things hard on myself, especially since the development workflow for Tauri apps wasn't very smooth with my setup.
So instead, I decided to write this as an Obsidian plugin, since Obsidian is my main PKM tool. Below is a video demo of how far I got.
You can:
Unlike the visualisation on my front page, which uses word count for node size, this version uses file size. So far, it helps with organisation, although I would like to work on a few quality-of-life things to make this properly useful.
Towards the end of August, I went to visit my mum in Germany. Veronica would take care of my bonsai plant in my absence. One day she woke up to a gruesome murder scene (warning, graphic images to follow).
Veronica thought it got too top-heavy and broke, but Black Pine saplings don't just explode like that. I knew it was our cat Jinn. In a bunch of further images, I noticed what look like nibble marks too. I know she likes to nibble on grass too. I insisted that she immediately tell the cat off and show her what she did wrong. Cats are not that smart and won't be able to connect consequences to their actions unless they're tightly associated. In my opinion, Veronica is consistently too soft on the cat.
Veronica felt bad about the death of bonsai buddy (although she insists it can still be saved) but it seemed to me a 10 year journey was cut prematurely short. There's not much that can be done in my opinion except starting over. Since I've been back, I've been seeing if it can be saved, with not much success yet. This is what it looks like today:
When I came back, Jinn was very misbehaved for a bit. She knows she's not allowed to jump on my desk -- twice I caught her doing so and sniffing the bonsai corpse while she thought I was asleep. She knows that certain other areas are out of bounds for her, and she was pushing those boundaries. It took her a week or two of me being back for her to go back to normal and for us to be friends again.
R.I.P. bonsai buddy
In UI design, Skeuomorphism is where UI elements look like their physical counterparts. For example, a button might have highlights/shadows, you might adjust time through slot-machine-like dials, or hear a shutter sound when you take a photo. I quite like skeuomorphic design.
I pay special attention to icons. My younger sister is young enough to have never used a floppy disk and therefore only knows this symbol 💾 to mean "save" but not why. You see it everywhere: icons (like a phone handset), language (like an "inbox"), and other tools (like the dodge and burn tools in photo editors, which stem from physical retouching on film).
Sometimes, words have gone through several layers of this, where they're borrowed over and over again. For me, one area where I see this a lot is in networks. In the days of radio and analogue electronics, we got a lot of new words that were borrowed from other things that people were already familiar with. Once computer networks came along, suddenly "bandwidth" adopted a different meaning.
The key here is this idea of familiarity. When something is new, it needs to be rooted in something old, in order for people to be able to embrace it, let alone understand it. Once they do, only then do you see design trends cut the fat (for example, the shiny Web 2.0 style made way for the more flat design we have today). If a time traveller from 20 years ago were to visit, of course they would find modern design and affordances confusing.
Take this a step further however: what about the things that never had a physical counterpart or couldn't quite be connected to one? Well, it seems we latch on to the closest available concept or symbol! For example, what exactly is "the cloud"? It never substituted water vapour in the sky; it was something new. Why is this ☰ the symbol for a hamburger menu? Because it sort of looks like items in a menu. Not to mention, why did we call it a hamburger menu? Because the symbol sort of looks like a hamburger.[1]
Anyway, why do I bring all this up? Because I noticed new words and icons showing up in the AI space, as AI is becoming more ubiquitous. AI assistance built into tools are becoming "copilots". The symbol for "apply AI" is becoming magic sparkles that look a bit like this ✨. I find this very interesting -- people seem to not quite have a previous concept to connect AI to other than "magic", and the robot emoji might be a little too intimidating 🤖 (maybe I should change the Amarbot triggering reaction to sparkles instead).
A couple days ago, this was trending on HackerNews, and sparked some conversation in my circles. As you might know, I have some interest in this space. It seemed to have some overlap with gather.town, a 2D virtual environment for work. This category really took off during covid. This product in particular has some big name backers (though not a16z ironically enough).
This got me thinking... AI agents would truly be first-class citizens in environments like these. You would interact with them the same way you interact with a human colleague. You could tell them "go tell Bob to have the reports ready by 2pm" and the agent would walk over to Bob's virtual desk, and tell them using the same chat / voice interface that a human would use.
How would agents interact with the outside world? LLMs already have an understanding of human concepts baked in. Why hack a language model to execute code (Code Interpreter) when you could use the same skeuomorphism that humans are good at, in an environment like this? If there's a big red button in the corner of your virtual office called "server restart button", a human as well as an AI agent can easily interact with that. Neither may ever know that this magic button does something in a parallel universe.
It might be some ways off before we're all working out of the metaverse, but I believe that the only way for that to happen is if it becomes more ergonomic than real life. It just so happens that this is great for humans as well as AI agents! There are already a class of tools that make you more productive in AR/VR than on a normal monitor (think 3D CAD). However when it comes to day-to-day working, organising your thoughts, communicating, etc, we still have some ways to go. To cross that bridge, we most likely need to embrace skeuomorphic design in the first instance.
What might that look like? Certainly storing information in space. Your desk top (and I don't mean "desktop", I mean literally the surface of your desk) can go 3D, and you can perhaps visualise directory trees in ways you couldn't before. Humans have excellent spatial reasoning (and memory) as my friend working on virtual mind palaces will tell you.
You could of course have physical objects map 1:1 to their virtual counterparts, e.g. you could see a server rack that represents your actual servers. However, instead of red and green dots on a dashboard, maybe the server can catch on literal fire if it's unhealthy? That's one way to receive information and monitor systems! A human as well as an AI agent can understand that fire is bad. Similarly, interactions with things can be physical, e.g. maybe you toss a book into a virtual basket, which orders a physical version of it. Maybe uploading a photo to the cloud is an actual photo flying up to a cloud?
Or maybe this virtual world becomes another layer for AI (think Black Mirror "White Christmas" episode), where humans only chat with a single representative that supervises all these virtual objects/agents, and talks in the human's ear? Humans dodge the metaverse apocalypse and can live in the real world like Humane wants?
Humans are social creatures and great at interacting with other humans. Sure, they can learn to drive a car, and no longer have to think about the individual actions, rather the intent, but nothing is more natural than conversation. LLMs are great at conversation too of course (it's in the name) and validates a belief that I've had for a long time that conversation may be the most widely applicable and ergonomic interaction interface.
What if my server was a person in my virtual workspace? A member of my team like any other? What if it cried if server health was bad? What if it explained to me what's wrong instead of me trawling through logs on the command line? I'm not sure what to call this. Is this reverse-skeuomorphism? Skeuomorphic datavis?
I might have a fleet of AI coworkers, each specialised in some way, or representing something. Already Sentinel is a personification of my smart home systems. Is this the beginning of an exocortex? Is there a day where I can simply utter my desires and an army of agents communicate with each other and interact with the world to make these a reality?
(Most) humans are great at reading faces (human faces that is, the same way Zebras can tell each other apart). This concept was explored in data visualisation before, via Chernoff faces. There are reasons why it didn't catch on but I find it very interesting. I was first introduced to this concept by the sci-fi novel Blindsight. In it, a vampire visualises statistical data through an array of tortured faces, as their brains in this story are excellent at seeing the nuance in that. You can read the whole novel for free online like other Peter Watts novels, but I'll leave the quote here for good measure:
A sea of tortured faces, rotating in slow orbits around my vampire commander.
"My God, what is this?"
"Statistics." Sarasti seemed focused on a flayed Asian child. "Rorschach's growth allometry over a two-week period."
"They're faces…"
He nodded, turning his attention to a woman with no eyes. "Skull diameter scales to total mass. Mandible length scales to EM transparency at one Angstrom. One hundred thirteen facial dimensions, each presenting a different variable. Principle-component combinations present as multifeature aspect ratios." He turned to face me, his naked gleaming eyes just slightly sidecast. "You'd be surprised how much gray matter is dedicated to the analysis of facial imagery. Shame to waste it on anything as—counterintuitive as residual plots or contingency tables."
I felt my jaw clenching. "And the expressions? What do they represent?"
"Software customizes output for user."
There are so many parallels between language and programming. For example, Toki Pona (a spoken language with a vocabulary of only 120 words) is like the RISC of linguistics. You need to compose more words together to convey the the same meaning, but it's quite elegant how you can still do that with so few words. It seems like languages don't need that large a vocabulary to be "Turing complete" and able to express any idea. Or maybe because language and thought are so tightly coupled, we're just not able to even conceive of ideas that we don't have the linguistic tools to express in the first place.
You can create subroutines, functions, macros in a program. You can reuse the same code at a higher level of abstraction. Similarly, we can invent new words and symbols that carry a lot more meaning, at the cost of making our language more terse. A language like Toki Pona is verbose because ideas are expressed from elementary building blocks and are context-dependent.
I imagine a day where abstractions layered on top of abstractions disconnect us from the underlying magic. You see a symbol like the Bluetooth icon and it has no other meaning to you except Bluetooth. In your virtual world, you interact with curious artefacts that have no bearing on your reality. You read arcane symbols as if they were ancient runes. You cast spells by speaking commands to underlings and ambient listeners that understand what you mean. Somewhere along the way, we can no longer explain how this has become a reality; how the effects we see actually connect to the ones and zeros firing. Is that not magic? ✨
This is sometimes called a drawer menu too, but the point still stands, as it slides out like a drawer. Other forms of navigation have physical counterparts too, like "tabs" come from physical folders. One you start noticing these you can't stop! ↩︎
Today was the "Build a Website in an Hour" IndieWeb event (more info here). I went in not quite knowing what I wanted to do. Then, right as we began, I remembered learning about Gemini and Astrobotany from Jo. I thought this would be the perfect opportunity to explore Gemini, and build a Gemini website!
Gemini is a simple protocol somewhere between HTTP and Gopher. It runs on top of TLS and is deliberately quite minimal. You normally need a Gemini client/browser in order to view gemini://
pages, but there's an HTTP proxy here.
I spent the first chunk of the hour trying to compile Gemini clients on my weird setup. Unfortunately, this proved to be quite tricky on arm64 (I also can't use snap or flatpak because of reasons that aren't important now). I eventually managed to install a terminal client called Amfora and could browse the Geminispace!
Then, I tried to get a server running. I started in Python because I thought this was going to be hard as-is, and I didn't want to take more risks than needed, but then I found that it's actually kind of easy (you only need socket
and ssl
). Once I had a server working in Python, I thought that I actually would prefer if I could run this off of the same server that this website (yousefamar.com) uses. Most of this website is static, but there's a small Node server that helps with rebuilding, wiki pages, and testimonial submission.
So for the next chunk of time, I implemented the server in Node. You can find the code for that here. I used the tls
library to start a server and read/write text directly from/to a socket.
Everything worked fine on localhost with self-signed certificates that I generated with openssl, but for yousefamar.com I needed to piggyback off of the certificates I already have for that domain (LetsEncrypt over Caddy). I struggled with this for most of the rest of the time. I also had an issue where I forgot to end the socket after writing, causing requests to time out.
I thought I might have to throw in the towel, but I fixed it just as the call was about to end, after everyone had shown their websites. My Gemini page now lives at gemini://yousefamar.com/ and you can visit it through the HTTP proxy here!
I found some Markdown to Gemini converters, and I considered having all my public pages as a capsule in Geminispace, but I think many of them wouldn't quite work under those constraints. So instead, in the future I might simply have a gemini/
directory in the root of my notes or similar, and have a little capsule there separate from my normal web stuff.
I'm quite pleased with this. It's not a big deal, but feels a bit like playing with the internet when it was really new (not that I'm old enough to have done that, but I imagine this is what it must have felt like).
A while ago I wrote about discovering a long-forgotten project from 2014 I had worked on in the past called Mini Conquest. As the kind of person who likes to try a lot of different things all the time, over my short 30 years on this earth I have forgotten about most of the things I've tried. It can therefore be quite fun to forensically try and piece together what my past self was doing. I thought I had gotten to the bottom of this project and figured it out: an old Java gamedev project that allowed me to play around with the 2.5D mechanic.
Well, it turns out that's not where it ended. Apparently, I had ported the whole thing to Javascript shortly after, meaning it actually runs in the browser, even today somehow! I had renamed it to "Conquest" by then. As was my style back then, I had 0 dependencies and wanted to write everything from scratch.
If you've read what I wrote about the last one, you might be wondering why the little Link character is no longer there, and what the deal with that house and the stick figure is. Well, turns out I decided to change genres too! It was no longer a MOBA/RTS but more like a civilisation simulator / god game.
The player can place buildings, but the units have their own AI. The house, when place, can automatically spawn a "Settler". I imagine that I probably envisioned these settlers mining and gathering resources on their own, with which you can decide what things to build next, and eventually fight other players with combat units. To be totally honest though, I no longer remember what my vision was. This forgetfulness is why I write everything down now!
The way I found out about this evolution of Mini Conquest was also kind of weird. On the 24th of January 2023, a GitHub user called markeetox forked my repo, and added continuous deployment to it via Vercel. The only evidence I have of this today is the notification email from Vercel's bot; all traces of this repo/deployment disappeared shortly after. Maybe he was just curious what this is.
I frankly don't quite understand how this works. The notification came from his repo on a thread related to a commit or something, that is apparently authored by me (since I authored the commit?) and I've been automatically subscribed in his fork? Odd!
I was sent two small puzzles by two separate friends recently. The first was from a friend who is currently visiting Gothenburg, Sweden, and spotted one of the programming recruitment puzzles. I might have seen this exact one through an ARG community, but couldn't quite remember.
We were chatting on the phone at the time, so I tried solving it in my head talking through it out loud, and got the right URL first try! If you can't be bothered to solve it, the URL takes you here, and as expected, it's a recruitment funnel.
Then, a more traditional puzzle:
After scratching my head for a bit, my solution was as follows (spoilers!):
Recently, there was this password game trending (and was also on the front page of HN). It goes quite deep, and I really like the variety of the challenges. I ragequit after my chicken died of overfeeding (you'll know what I mean when you reach it), especially because the game is different every time you play it, so you can't just copy-paste, but have to restart from scratch.
Anyway, coincidentally, as I was sorting through some old bookmarks, I came across this link: https://www.troyhunt.com/partnerships/. I was really confused as to why I bookmarked this. The automatic summariser summary was very innocuous.
troyhunt.com seeks partnerships for cross-platform solutions and value-added web services. To start the process, an account must be created.
Troy Hunt is the guy who made "Have I Been Pwned". My confusion ended once I actually tried to make an account. I remembered that he made this page to send scammers to and have them waste time trying to satiate impossible password requirements!
I mentioned many times that I like these sort of puzzles (and I still need to write about my own various attempts at similar ones), but some of these password ones are really quite clever and a great inspiration!
Around two months ago, I was talking to a friend about these games that involve programming in some form (mainly RTSs where you program your units). Some examples of these are:
Needless to say, I'm a fan of these games. However, during that conversation, I suddenly remembered: I made a game like this once! I had completely forgotten that for a game jam, I had made a simple game called Homebound. You can learn more about it at that link!
At the time, you could host static websites off of Dropbox by simply putting your files in the Public
folder. That no longer worked, so the link was broken. I dug everywhere for the project files, but just couldn't find them. I was trying to think if I was using git or mercurial back then and where I could have put them. I think it's likely I didn't bother because it was something small to put out in a couple of hours.
Eventually, in the depths of an old hard drive, I found a backup of my old Dropbox folder, and in that, the Homebound source code! Surprisingly, it still worked perfectly in modern browsers (except for a small CSS tweak) and it now lives on GitHub pages.
Then, I forgot about this again (like this project is the Silence), until I saw the VisionScript project by James, which reminded me of making programming languages! So I decided to create a project page for Homebound here.
I doubt I will revisit this project in the future, but I might play with this mechanic again in the context of other projects. In that case, I might add to this devlog to reference that. I figured I should put it out there for posterity regardless!
I made some small changes to the Miniverse project. It still feels a bit boring, but I'm trying different experiments, and I think I want to try a different strategy, similar to Voyager for Minecraft. Instead of putting all the responsibility on the LLM to decide what to do each step of the simulation, I want to instead allow it to modify its own imperative code to change its behaviour when need be. Unlike the evolutionary algos of old, this would be like intelligent design, except the intelligence is the LLM, rather than the LLM controlling the agents directly.
Before I do this however, I decided to clean the codebase up a little, and make the GitHub repo public, as multiple people have asked me for the code. It could use a bit more cleanup and documentation, but at least there's a separation into files now, rather than my original approach of letting the code flow through me into a single file:
I also added some more UI to the front end so you can see when someone's talking and what they're saying, and some quality of life changes, like loading spinners when things are loading.
There's still a lot that I can try here, and the code will probably shift drastically as I do, but feel free to use any of it. You need to set the OPENAI_KEY
environment variable and the fly.io config is available too if you want to deploy there (which I'm doing). The main area of interest is probably NPC.js which is where the NPC prompt is built up.
(Skip to the end for the conclusion on how this has affected my website, or continue reading for the backstory).
For as long as I can remember, I've been almost consistently engaged in some form of education or mentorship. Going back to my grandparents, and potentially great-grandparents, my family on both sides has all been teachers, professors, and even a headmaster, so perhaps it's something in my blood. I started off teaching in university as a TA (and teaching at least one module outright where the lecturer couldn't be bothered). Later, I taught part-time in a pretty rough school (which was quite exhausting) and even later at a much fancier private school (which wasn't as exhausting, but much less fulfilling) and finally I went into tutoring and also ran a related company. I wound this business up when covid started.
Over the years I found that, naturally, the smaller the class, the more disproportional impact you can have when teaching. I also found that that personal impact goes up exponentially not when I teach directly, but zoom out and find out what it is the student actually needs (especially adult students), and help them unblock those problems for themselves. As the proverb goes,
"Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime."
There's also a law of diminishing returns at play here. By far the biggest impact you can have when guiding someone to reach their goals (academic or otherwise) comes at the very start. This immediate impact has gotten bigger and bigger over time as I've learned more and more myself. Sometimes it's a case of simply reorienting a person, and sending them on their way, rather than holding their hand throughout their whole journey.
This is how I got into mentoring. I focused mainly on supporting budding entrepreneurs and developers from underpriviledged groups, mainly organically through real-life communities, but also through platforms like Underdog Devs, ADPList, Muslamic Makers and a handful of others. If you do this for free (which I was), you can only really do on the side, with a limited amount of time. I wasn't very interested in helping people who could actually afford paying me for my time, paradoxically enough...
I decided recently that there ought to be an optimal middle ground that maximises impact. 1:1 mentoring just doesn't scale, and large workshop series aren't effective. I wanted to test a pipeline of smaller cohorts and mix peer-based support with standard coaching. I have friends who I've worked with before who are willing to help with this, and I think I can set up a system that would be very, very cheap and economically accessible to the people I care about helping.
Anyway, I've started planning a funnel, and building a landing page. Of course, any landing page worth its salt ought to have social proof. So I took to LinkedIn. I never post to LinkedIn (in fact, this might actually have been my very first post in the ~15 years I've been on there). I found a great tool for collecting testimonials in my toolbox called Famewall, set up the form/page, and asked my LinkedIn network to leave me testimonials.
There were a handful of people that I thought would probably respond, but I was surprised that instead other people I completely hadn't expected were responding. In some cases, people that I genuinely didn't remember, and in other cases people where I didn't realise just how much of an impact I had on them. This was definitely an enlightening experience!
I immediately hit the free tier limit of Famewall and had to upgrade to a premium tier to access newer testimonials that were rolling in. It's not cheap, and I'm only using a tiny fraction of the features, but the founder is a fellow indie hacker building it as a solo project and doing a great job, and we chatted a bit, so I figured I should support him.
I cancelled my subscription a few days later when I got around to re-implementing the part that I needed on my own site. That's why this post is under the Website project; the review link (https://amar.io/review) now redirects to a bog standard form for capturing testimonials (with a nice Lottie success animation at the end, similar to Famewall) and in the back end it simply writes the data to disk, and notifies me that there's a new testimonial to review. If it's ok, I tweak the testimonial JSON and trigger an eleventy rebuild (this is a static site). In the future, I might delegate this task to Sentinel!
The testimonials then show up on this page, or any other page onto which I include testimonials.njk
(like the future mentoring landing page). For the layout, I use a library called Colcade which is a lighter alternative to Masonry recommended to me by ChatGPT when I asked for alternatives, after Masonry was giving me some grief. It works beautifully!
Amarbot no longer has a WhatsApp number. This number now belongs to Sentinel, the custodian of Sanctum.
This number was originally wired up directly to Sanctum functions, as well as Amarbot's brain; a fine-tuned GPT-J model trained on my chat history. Since this wiring was through Matrix it became cumbersome to have to use multiple Matrix bridges for various WhatsApp instances. I eventually decided use that model on my actual personal number instead, which left Amarbot's WhatsApp number free.
Whenever Amarbot responds on my behalf, there's a small disclaimer. This is to make it obvious to other people whether it's actually me responding or not, but also so when I retrain, I can filter out artificial messages from the training data.
I mentioned recently that I've been using OpenAI's new functions API in the context of personal automation, which is something I've explored before without the API. The idea is that this tech can short-circuit going from a natural language command, to an actuation, with nothing else needed in the middle.
The natural language command can come from speech, or text chat, but almost universally, we're using conversation as an interface, which is probably the most natural medium for complex human interaction. I decided to use chat in the first instance.
Introducing: Sentinel, the custodian of Sanctum.
No longer does Sanctum process commands directly, but rather is under the purview of Sentinel. If I get early access to Lakera (the creators of Gandalf), he would also certainly make my setup far more secure than it currently is.
I repurposed the WhatsApp number that originally belonged to Amarbot. Why WhatsApp rather than Matrix? So others can more easily message him -- he's not just my direct assistant, but like a personal secretary too, so e.g. people can ask him for info if/when I'm busy. The downside is that he can't hang out with the other Matrix bots in my Neurodrome channel.
A set of WhatsApp nodes for Node-RED were recently published that behave similarly to the main Matrix bridge for WhatsApp, without all the extra Matrix stuff in the way, so I used that to connect Sentinel to my existing setup directly. The flow so far looks like this:
The two main branches are for messages that are either from me, or from others. When they're from others, their name and relationship to me are injected into the prompt (this is currently just a huge array that I hard-coded manually into the function node). When it's me, the prompt is given a set of functions that it can invoke.
If it decides that a function should be invoked, the switchResponse
node redirects the message to the right place. So far, there are only three possible outcomes: (1) doing nothing, (2) adding information to a list, and (3) responding normally like ChatGPT. I therefore sometimes use Sentinel as a quicker way to ask ChatGPT one-shot questions.
The addToList
function is defined like this:
{
name: "addToList",
description: "Adds a string to a list",
parameters: {
type: "object",
properties: {
text: {
type: "string",
description: "The item to add to a list",
},
listName: {
type: "string",
description: "The name of the list to which the item should be added",
enum: [
"movies",
"books",
"groceries",
]
},
},
required: ["text", "listName"],
},
}
I don't actually have a groceries list, but for the other two (movies and books), my current workflow for noting down a movie to watch or a book to read is usually opening the Obsidian app on my phone and actually adding a bullet point to a text file note. This is hardly as smooth as texting Sentinel "Add Succession to my movies list". Of course, Sentinel is quite smart, so I could also say "I want to watch the first Harry Potter movie" and he responds "Added "Harry Potter and the Sorcerer's Stone" to the movies list!".
The actual code for adding these items to my lists is by literally appending a bullet point to their respective files (I have endpoints for this) which are synced to all my devices via the excellent Syncthing. In the future, I could probably make this fancier, e.g. query information about the movie/book and include a poster/cover and metadata, and also potentially publish these lists.
I've been experimenting with OpenAI's new functions API recently, mostly in the context of personal automation, which is something I've explored before without the API (more on that in the future). However, something else I thought might be interesting would be to give NPCs in a virtual world a more robust brain, like the recent Stanford paper. This came in part from the thinking from yesterday's post.
The Stanford approach had many layers of complexity and they were attempting to create something that is close to real human behaviour. I'm less interested in that, and would instead like to design an environment with much higher constraints based on simple rules. I think finding the right balance there leads to the most interesting emergent results.
So my first goal was to create a very tightly scoped environment. I decided to start with a 32x32 grid, made of emojis, with 5 agents randomly spawned. The edge of the grid is made of walls so they don't fall off.
Agents after they had walked towards each other for a chat
When I was originally scoping this out, I thought I would add mechanisms for interacting with items too. These items could be summoned in some way perhaps. I built a small API for getting the nearest emoji to item text as well, which is still up at e.g. https://gen.amar.io/emoji/green apple (replace "green apple" with whatever). It also caches the emojis so overall not expensive to run.
I also explored various models for generating emoji-like images, for the more fantastical items, and landed on emoji diffusion. It was at this point that I quickly realised I'm losing control of the scope, and decided to focus on NPCs only, and no items.
Each simulation step (tick) would iterate over all agents and compute their actions. I planned for these possible actions:
I wanted the response from OpenAI to only be function calls, which unfortunately you can't control, so I had to add to the prompt You MUST perform a single function in response to the above information
. If I get any bad response, we either retry or fall back to "do nothing", depending.
The prompt contained some basic info, a goal, the agent's surroundings, the agent's memory (a truncated list of "facts"), and information on events of the past round. I found that I couldn't quite rely on OpenAI to make good choices, so I selectively build the list of an agent's capabilities on the fly each tick. E.g. if there's nobody in speaking distance, we don't even give the agent the ability to speak. If there's a wall ahead of the agent, we don't even give the agent the chance to step forward. And if the agent just spoke, they lose the ability to speak again for the following round or else they speak over each other.
I had a lot of little problems like that. Overall, the more complicated the prompt, the more off the rails it goes. Originally, I tried The message from God is: "Make friends"
as I envisioned interaction from the user coming in the form of divine intervention. But then some of the agents tried speaking to God and such, so I replaced that with Your goal is: "Make friends"
, and later Your goal is: "Walk to someone and have interesting conversations"
so they don't just walk randomly forever.
They would also feel compelled to try and remember a lot. Often the facts they remembered were quite useless, like the goal, or their current position. The memory was small, so I tried prompt engineering to force them to treat memory as more precious, but it didn't quite work. Similarly, they would sometimes go on endless loops remembering the same useless fact over and over. I originally had all information in their memory (like their name) but I didn't want them to forget their name, so put the permanent facts outside.
Eventually, I removed the remember
action, because it really wasn't helping. They could have good conversations, but everything else seemed a bit stupid, like I might as well program it procedurally instead of with LLMs.
I did however focus a lot on having a very robust architecture for this project, and made all the different parts easy to build on. The server does the simulation (in the future, asynchronously, but today, through the "tick" button) and stores world state in a big JSON object that I write to disk so I can rewind through past states. There is no DB, we simply read/write from/to JSON files as the world state changes. The structure of the data is flexible enough that I don't need to modify the schemas, and it can remain pretty forwards-compatible as I make additions, so I can run the server off of older states and it picks up those states gracefully.
Anyway, I'll be experimenting some more and writing up more details on the different parts as they develop!
I love cracking hard problems. So it follows that I love CTFs, ARGs, even treasure hunts, and other puzzles of the sort.
I even tried my hand at creating these for other people. I'll talk about these more in the future, but one puzzle that I made came up in conversation, where I programmed a sage character in Manyland to give you hints for a key word that he confirms, which is needed for the next stage of the puzzle.
We were talking about how much more fun these puzzles can now be with the rise of LLMs. Back then, it was all quite procedural. But today, you could prompt an LLM to be the sage and not reveal the passphrase until the player has met certain conditions.
A week later my friend brought my attention to AI Gandalf. I LOVE this. I managed to make it through the main seven levels in around 20 minutes and get myself on the leaderboard, although my prompts weren't very creative. My friend had some much more creative prompts. If you haven't tried this, try it and let me know if you need any tips!
Now I'm stuck on the extra level 8 ("Gandalf the White"). This seems to be the ultra-hardened version that the developers have built from all the data they've gathered. I figured it must be possible, since there's a leaderboard, but it seems like they've actually been updating it on the fly whenever a new loophole is found.
It's driving me crazy! If anyone can come up with a solution, please give me a tip!
Ever since I discovered the Discord server for AI music generation, I knew I needed to train a model to make my voice a great singer. It took some figuring out, but now I'm having a lot of fun making myself sing every kind of song. I've tried dozens now, but here are some ones that are particularly notable or fun (I find it funniest when things glitch out especially around the high notes):
When people visit my website, it's not very clear to them what it is I actually do. It used to be that my website doubled as my CV, but after a while that became sort of useless as I no longer needed to apply to jobs, and I stopped maintaining it.
Recently, I began revamping the landing page of my website (yet again) and started off by cleaning up the hero section. I had a bunch of icons that represented links, and thought I was being clever and language-agnostic by using icons instead of text, but then realised that even I was forgetting what icon meant what, so I couldn't begin to hope that others would know what they meant. So I added text. And I made the WebGL avatar have an image fallback in case the browser doesn't support WebGL. Similarly, I froze the parallax effects under the same conditions as those rely on hardware acceleration.
Then, I decided to create a project grid right under the hero. I wanted to treat this as a showcase of the things I am involved in, or have been involved in, loosely in order of importance. This has been inspired by:
I began by adding the most important stuff for now, and might add some more over time. I keep pretty good notes of everything I do, so this list could become very very long, as I've worked on a lot of things over time. I realised that a few, cool, recent things are more meaningful than many, old, arbitrary things, so I'll try not to make this grid too large, and link to the full directory of projects (at least the ones that have made their way online) in the last tile.
It would have also been quite boring I think if I listed all my publications, as they're largely related. The same holds true for every weekend project, hackathon, game jam, utility script, game mod, etc. The projects I worked on during university I think are similarly just too old now. I also left out my volunteering work because it felt a bit too vain to include, and I'm not sure it would actually spur on any meaningful conversation. I also left out the things where I don't have significant enough involvement.
Please let me know your thoughts on the above and/or on how to improve this!
Some days ago, OpenAI released the code and models for Shap-E, which lets you do text-to-3D, and download a mesh you can use in any 3D software or for game development (rather than NeRFs with extra steps like previous models, and many papers that never released their code). This is very exciting, as the quality is reasonably good, and previously I would try to get that through various hacks.
There is already a HuggingFace space to try it, but no model on HuggingFace that you can easily use with their Inference API. You can fork the space and turn it into an API with some extra work, but I wasn't able to easily figure this out, and since running your own space is expensive anyway, I decided to take the easy way out and put a Flask server on top of OpenAI's existing code.
My server implementation is here. Since generating a new model only takes a couple seconds, I decided to design the interface as a "just-in-time" download link. You request /models/cat.ply
, and if it's been generated before, it will download right away, but if not, then it's generated on the fly while the request is blocking.
I ran this on vast.ai, on an A10 instance, but I'm probably not going to keep it up for long as it's a bit expensive. I used the default pytorch image, and tweaked the config to expose the server on port 8081, which is a Docker port that vast.ai then maps to a port on the host. I added a proxy to that on my Model Compositor project which you can try here for free.
My Red Maple and Wisteria seeds haven't sprouted yet, but I was left with all this extra soil! So I decided that I ought to plant the other species too. The remaining seeds I have are for Black Pine, Cherry Blossom, and Japanese Cedar. This is what they look like respectively:
I only had three Cherry Blossom seeds, and unlike the Red Maple, I decided to only plant one seed in that pot. Besides that, I've largely only used half of the seeds I have of each species so far, and I'm thinking that even that is unnecessary, but let's see!
As I was soaking them for 48 hours, they kind of got mixed up a bit, and I had a bit of a challenge separating the Black Pine from the Cherry Blossom, but I think I got there in the end. To better keep track of everything, as I was really starting to forget which is which, I put in some little wooden sticks:
The soil had dried quite a bit, so I made it wetter, maybe even a little too wet, as it was soaking the cotton on the bottom and created some condensation on the plastic. I also used tap water, which I didn't do for the first two, as it's pretty hard / rich in calcium. For my tomato plant, the effects of this were soon obvious as calcium residue was visible on the top of the soil and edges of the soil where it meets the pot. I didn't want to have the same for these plants, but it should hopefully all be OK.
If you'd like to learn some more about each species, here are their sections in my little book:
So now we have 5 different pots stratifying -- let's see which sprout first!
Dolly 2.0, a recently released open-source LLM, is a big deal, but not because there's anything new or special about it, but rather specifically because it's legally airtight. Unlike the models that were fine-tuned based on the leaked LLaMA model (many of which Meta takes down), it's based on the EleutherAI pythia model family. The training data is CC licensed and was crowd-sourced from 5000 Databricks employees.
Since it's on Hugging Face Hub, you can use it with Langchain, and I expect that it will become the Stable Diffusion of LLMs. I think especially companies that legally can't send their data off to the US will flock to using Dolly.
I kind of like how there's still this theme of woolly hooved animals (in this case a sheep), but still a divergence from the LLaMA strain of models (llama, alpaca, vicuna). I don't like how it sounds too similar to "DALL-E 2" when spoken though.
My bonsai seeds have soaked for 48 hours and I'm ready to move on to the next step! The reveal: I picked Wisteria and Red Maple. They ticked all the right boxes for me as my first try.
The Wisteria seeds are the small ones and the Red Maple are the two big ones. I used half of the seeds that I had of each species.
I assembled the "Auto Irrigation Growing Pot" and tried to ignore the conflicting instructions. I think you're not meant to fill the reservoir with any water at all until after the Stratification step (which I'll explain in a sec), and it's ambiguous how deep the seeds should go beyond "same depth as the size" (the size of what, the seeds?), so I just used my best judgement.
It turns out that I actually have a lot of soil. I didn't even use up a full peat disc so far. I have three more pots, so I'm considering getting some more seedlings started in the meantime and increase the chances of success...
At any rate, I sowed the current seeds and sprinkled a tiny bit of water into the soil to keep it moist, as it had dried out a bit in the meantime. I don't think the instructions should have the soil bit as step 1 if you're then going to soak the seeds for 48 hours after that, it should really be the second step.
And now that they're sown, I put them in the fridge. In the fridge, the one on the left is the Red Maple (this is more of a note to myself -- I should label them really; there are little wooden sticks for that in the kit). Putting them in the fridge is the first part of the Stratification step, which is meant to simulate winter conditions, then spring, so that they can germinate as they would in nature.
I'll be checking on them every few days and keeping the soil damp. Hopefully in two or three weeks they will start sprouting and I can remove them from the fridge. I set some calendar events. So now we wait!
I'm in the process of organising a big pile of bookmarks, the current batch dating to 2019. I realised that while the ones from 2022 are still relevant, I really don't know why I bookmarked some of the things that I did in 2019. Some of them are kind of interesting articles, but I no longer remember what my intention was.
Was it to read them later? I think they're mostly just not that interesting to me anymore. Was it to do use them somehow or keep them as a resource? If so, I don't see how, as they've usually lost their relevance.
I have already noticed that the rate at which I bookmark links is much higher than the rate at which I triage them. Part of this is because the triage process is still too high-friction for me. Most of the time, I want to be able to very easily categorise and store a resource in the right place for later search, or append it to the scratch file of a relevant project.
I always knew that I need to take measures to ensure that the "service rate" of these lists is higher than the rate at which they grow, but now I also think that there's a certain time cutoff after which the lists might as well be self-pruning. After all, if something's been in the "backlog" for so long, surely it can't be that important? I need a Stale Bot for bookmarks!
I finally decided to start on my bonsai project. To read more about what this is all about, check out the project page. I haven't written anything about the tomato project, or any of the other (failed) horticulture projects, but I will eventually, since documenting failures is important too! This is the first log of what is probably going to be rather perennial chronicles.
The kit that I'm using to get a start with bonsai is really quite neat. It comes with 5 different species of seeds: Japanese Wisteria, Cherry Blossom, Japanese Cedar, Red Maple Tree, and Black Pine Tree.
This is a great set of tools in such a small package and I'm quite excited! The instruction booklet goes into a decent amount of detail, though I already know a bunch from YouTube and other places as I had a general interest in bonsai before deciding to try myself.
It came with two peat pucks that you put in some water and watch as they slowly grow while they absorb the water.
I decided to do both of them, as I wanted to try multiple species at the same time, and they grow to about 3x their original size! It's actually quite a lot of soil.
I then decided on two species that I wanted to grow. The next step was to put some of the seeds in warm water for 48 hours, such that they can soften, which makes it easier for the seedling to break through the shell. The two that I picked had seeds that looked very distinct from each other!
If you would like to know what species I picked, check back in 48 hours when I document the sowing process! I'll give you a hint: I didn't pick the mainstream choice (Black Pine).
A common way - if not the most common way (looking at WordPress dominance) - to do i18n is using the "Portable Object" format. You start off with a template file (.pot
) and normally you would load that into Poedit, type in your translations for every string (including plurals), then download a pair of files: the .po
and the "compiled" Machine Object .mo
file.
Unfortunately, my daily driver is an arm64 device (my phone actually -- a lot of people think this is insane, but I'll write it up eventually and explain myself). I can't run Poedit without some crazy hacks. You could also copy the .pot
file to a .po
file, then just edit that directly (it's a straightforward text file), and there are some tools to convert .po
to .mo
, but that's all a bit of a hassle.
As luck would have it, there's a great free tool online that does everything I need called Loco. You load whatever file in, do your translations (with optional filtering), and download the files. You can save it to a cloud account (which is I think how they make money) but I had no need of that.
I figured this all out after being given access to a WordPress deployment to help an organisation out with some things. Previously, I only had access to the WP dashboard, and changed some text for them via CSS. Now that I had FTP access, I could just change everything in one fell swoop by modifying the English strings for the site, and I deleted the hacky CSS. Once you copy the files back over, everything is automatically updated.
A few days ago, I wrote a post where I reminisced about the online forums of yesteryear. I mention tracking down and reaching out to a webmaster of a forum that meant a lot to me. Well, I have an update: she responded, and it was indeed her! I considered asking her if she might be open to digging up any backups she might have had, but then I thought about it, and I figured that perhaps some things are better left in the past. I don't remember the contents of those posts, but I do remember the positive emotions, and I think that's enough.
This leads me onto a topic that I've wanted to write about for a while: data-hoarding. I personally struggle with the concept of entropy in general. This manifests itself in many ways, but a clear one is information loss. If I were to leave this unchecked, I could see myself easily becoming a data-hoarder. The impulse is much stronger for unique data that I created (personal data), and in fact this is probably a strong motivator for my note-taking, as I see writing as a form of "backing up my brain". I've barely scratched the surface, but I reckon with enough data, I could even be resurrectable.
In my personal notes, I have a directory called "maxims", where I reason about a set of principles that I live my life by. There are however a certain set of mental tools that help me cope with life in general, but aren't quite at the level of certainty of a "maxim". I decided to start writing these down, and for now I'm putting them in a separate folder called "meditations" (kind of inspired by Marcus Aurelius' writing) until I come up with a better name.
The reason I bring this up is because there's a useful tool that helps me mitigate this urge to hoard personal data, namely picturing the notion that, for all we know, physics seems to indicate that our universe is time-reversible (except when you're dealing with black holes, but let's not get into that). In other words, if you know the end state of a system, and its evolution laws, you can simulate it backwards and determine a previous state, regardless of how chaotic it may be. Of course, it's one thing to calculate where a thrown ball originated from, another to un-burn a book, but physically it's all the same.
Similarly, relativity seems to indicate that if you travel at the speed of light, then all of time can exist at once, bringing up the concept of a Block Universe, and our perception of past and present is more of a side-effect of our mode of existence. To that end, I like to imagine that if certain data has existed at some point, then it is "stored", and theoretically retrievable, in the past, or the Akashic Records to use the term I learned from Ra. If not through "time-travel", then if someone were to take a perfect snapshot of our universe and simulate physics in reverse.
If you find this kind of thing interesting, I recommend Sabine Hossenfelder's book "Existential Physics", which she signed for me after a talk at the Royal Institute!
Before I get too carried away, let me write down one more story. When I was little, we had a Win 98 computer. I knew that machine inside and out. I remember all the games I used to play on it with my brothers (anyone remember the Worms games?) and I remember making little games with the old versions of PowerPoint. We made a mouse-maze game at one point, called "The A-maze-ing Maze", and in my head I can still hear the voice-over recording that I asked my brother to make for the instruction slide of the game, and his inflection of the word "maze".
I kept it in good working condition probably until I went off to university. Some people are amazed at how well-kept my projects from the early 2000s are, but that computer truly had even the earliest projects I ever worked on on it.
My mother didn't like that I always had electronics and hardware lying around in my room. She often threatened to throw my things out. I told her that this old computer was especially important to me, and I put it in my closet so that it's not in the way.
At some point, I probably came back home from uni to visit, and the computer was gone. My mother told me that she had it scrapped, and it was long gone at that point. I can't remember how I reacted, but I often remember the feeling, and I'm not sure if I'll ever get over that loss, as silly as that may sound. It truly felt like losing a part of myself.
I have a good relationship with my mother, and I've brought this up several times since then, but I don't think she quite understands what it meant to me, and she never apologised. Usually, she says that she assumed that I had already pulled out the hard drive, as I had a very tall stack of what she assumed were hard drives (they were actually CD-ROM drives).
Anyway, I'm not writing this to roast my mum! In fact, allow me to add one more anecdote (I lied, sorry) to offset the above story somewhat. One of my earliest memories of losing progress that hadn't saved was when playing the game "Amazon Trail" (a somewhat more modern spin-off of the well-known Oregon Trail). I made hours of progress on that game, and lost everything to a crash. My mother was the one who was there to comfort me as I cried.
I'm sharing this to put into words a different kind of loss, and a means of managing it. Like with the death of a person, you can imagine that they exist on in your memories. I like to think that the things I lose exist in a much more concrete way, in space-time itself, and that the loss was deterministically inevitable.
While that might not yet enable me to let go of certain losses, I can at least avoid obsessing over hoarding other data, and allowing certain things to be forgotten. Perhaps that can help someone else too!
I need to make a new update post on all the AI stuff. Things move so fast that I often just can't be bothered! I'm making this post mostly for myself and people who ask about something very specific.
LangChain recently announced classes for creating custom agents (I think they had some standard Agents before that too though). Haystack has Agents too, although it seems that their definition explicitly involves looping until the output is deemed ok, as most implementations need to do this anyway.
The way I understand this and see it implemented is that it's essentially an abstraction that allows LLMs (or rather, a pipeline of LLM functions) to use "tools". A tool could for example be a calculator, a search engine, a webpage retriever, etc. The Agent has a prompt where it can reason about which tool it's supposed to use, actually use these, and make observations, which it can then output.
It also allows for the decomposition of a task and taking it step by step, which can make the system much more powerful. It's a lot closer to how a human might reason. An example of this general idea taken to the extreme is Auto-GPT which you can send on its merry way to achieve some high level goal for you and hope it doesn't cost you an arm and a leg. Anyone remember HustleGPT btw?
There's something called the ReAct framework (Reasoning + Acting -- I know, unfortunate name) which is the common "prompt engineering" part of this, and prompts using this framework are usually built in to these higher-level libraries like LangChain and Haystack. You might also see the acronym MRKL (Modular Reasoning, Knowledge and Language, pronounced "miracle") being used. This comes from this older paper (lol, last year is now "old"), and it seems that ReAct is basically a type of MRKL system that is also able to "reason". They might be used interchangeably though and people are often confused about where they differ. The ReAct paper has much clearer examples.
A common tool is now, of course, embeddings search, which you can then chain to completion / chat. You might remember two months ago when I said at the bottom of my post about GPT use cases that this is where I think the gold dust lies. Back then, I had linked gpt_index; it's now called llama_index and has become relatively popular. It lets you pick what models you want to use (including the OpenAI ones still, unlike what the rename might suggest), what vector store you want to use (including none at all if you don't have loads of data), and has a lot of useful functionality, like automatically chopping up PDFs for your embeddings.
Not too long ago, OpenAI released their own plugin for this, that has a lot of the same conveniences. One surprising thing: OpenAI's plugin supports milvus.io as a vector store (an open-source, self-hosted version of the managed pinecone.io) while llama_index doesn't. I don't think it's worth messing around with that though tbh, and I think pinecone has one of those one-click installers on the AWS marketplace. If you're using Supabase, they support the pgvector extension for PostgreSQL, so you can just store your embeddings there, but from what I hear, it's not as good.
Of course, if you're subject to EU data regulations, you're going to use llama_index rather than send your internal docs off to the US. I say internal docs, because it seems everyone and their mother is trying to enter the organisational knowledge retrieval/assistant SaaS space with this. Some even raising huge rounds, with no defensibility (not even first-mover advantage). It's legitimately blowing my mind, and hopefully we don't see a huge pendulum swing in AI as we did crypto. We probably will tbh.
The only defensibility that may make sense is if you have a data advantage. Data is the gold right now. A friend's company has financial data that is very difficult to get a hold of, and using llama_index, which is the perfect use. Another potential example: the UK government's business support hotline service is sitting on a treasure trove of chat data right now also. Wouldn't it be cool to have an actually really good AI business advisor at your beck and call? Turn that into an Agent tool, and that's more juice to just let it run the business for you outright. Accelerando autonomous corporation vibes, but I digress!
Personally, I would quite like an Obsidian plugin to help me draw connections between notes in my personal knowledge base, help me organise things, and generally allow me to have a conversation with my "memory". It's a matter of time!
Veronica recently sent me this puzzle:
In English: the grid is a city and you must place buildings in the cells. Buildings can have a height between 1 and 5 floors inclusive. Rows and columns have Sudoku rules; you can't have a building of the same height be on the same row/column. The numbers on the edges are how many buildings are visible from that vantage point.
So for example, for a row of 13254, the number on the right would be 2 (you can only see the buildings 4 and 5) and the number on the left would be 3 (you can only see 1, 3, and 5).
Give it a go then check against my solution!
2 | 4 | 1 | 3 | 5 |
3 | 5 | 2 | 4 | 1 |
5 | 2 | 4 | 1 | 3 |
4 | 1 | 3 | 5 | 2 |
1 | 3 | 5 | 2 | 4 |
When I was much younger, after the internet had already picked up mainstream steam, but before social media, I spent a lot of time on online forums. The communities were small (and even if they were big, the "regulars" were a small community) and everyone knew everyone. Most of these no longer exist and the chances that I can reconnect with the friends I made are very slim, which is a shame as they've strongly influenced who I am today. I hear that similar communities exist on Mastadon and in some pockets of the internet but I already know it just won't be the same.
I'd get home from school, sit at the family computer, and check for new posts in threads I was part of. For the smaller ones, I would check all new posts on the entire forum and I engaged in a lot of discussions. I remember runescape-tip.com with waves of nostalgia as I just looked it up on the Wayback Machine (we're talking mid-2000s). I also remember the mutual support between friends on teenforum.tv, under the administration of the 21 year old webmaster and MySpace-layout-maker Nora, who seemed so old and wise at the time. Some friendships survived the forums' demise, but somewhere between transitions from MSN to Skype and beyond, it all fizzled away.
One forum that survived to this day however is dreamviews.com, although it looks quite different to how I remembered it. In typical vBulletin fashion, every year I get a birthday email from there, and every year I remember the friends I made on there. We obviously spoke a lot about lucid dreaming, a topic I was very interested in (though barely had any), but also a lot of off-topic stuff. Judging by the length of time the emails go back, I was active on there "only" as far back as 2010, which is later than the others.
I remember checking DreamViews one day, a long time ago, after I had already been inactive for quite some time (probably as I was at uni and busy with life) and I saw a post with one of the more veteran users on the site, reminiscing about all the "old" active users that had gone inactive. He listed usernames that I recognised, and mine was among them! That was the first time I considered that I too might have had an impact on all these people who had an impact on me, and that I wasn't just a random internet stranger to them. By the time I saw that post, the veteran user had also already moved on, so I decided to leave it there, and preserve these memories in my journaling as best I could.
I once tracked Nora down on a different platform, in 2014 (6 years after teenforum.tv died) and she had responded, according to my email notifications. I don't know what that response was anymore, because even that different platform (some kind of design community) is now dead too. Today, I did some sleuthing, and tracked her down on LinkedIn. I messaged her 5 minutes ago, and who knows, maybe she still has a backup of those forums and I can reminisce over the conversations? If anything comes of it, I'll be sure to post!
I can't imagine I'm the first to try this, but new hobby acquired:
I ran the ones below on the spot and it was quite fun. Before this, whenever I visited the British Museum (a few times a year), I didn't really give most of those statues a second glance.
An exercise for the reader (this one's interesting because they put a reference of what it could have look like if it were complete based on a different statue):
And another bust of good old Caesar (might be interesting as there's so much reference material, and it's so broken):
Try it and have fun! I'll try another batch the next time I go.
I wrote a short article about a trick for editing the text in HTML text nodes with only CSS. This is one of those articles where the goal is just to share something that I learned or discovered, that someone might benefit from, and the primary mode of finding this content is through a search engine.
It doesn't quite make sense for this to be an "article" in the way that I use that word (a long-form post bound in time that people follow/subscribe to) so I might eventually turn all these guide-type posts into wiki-notes, so they can exist as non-time-bound living documents.
For a long time I've been interested in the idea of creating a digital twin of yourself. I've tried this in the past with prompt completion trained on many years of my chat data, but it was always just a bit too inaccurate and hollow.
I also take a lot of notes, and have been taking more and more recently (a subset of these are public, like this post you're reading right now). I mentioned recently that I really think that prompt completion on top of embeddings is going to be a game-changer here.
You probably already know about prompt completion (you give it some text and it continues it like auto-complete on steroids) which underpins GPT-3, ChatGPT, etc. However, it turns out that a lot of people aren't familiar with embeddings. In a nutshell, you can turn blocks of text into high-dimensional vectors. You can then do interesting things in this vector space, for example find the distance between two vectors to reason about their similarity. CohereAI wrote an ELI5 thread about embeddings if you want to learn more.
None of this is particularly new -- you might remember StyleGAN some years ago which is what first really made this concept of a latent space really click for me, because it's so visual. You could generate a random vector that can get decoded to a random face or other random things, and you could "morph" between faces in this space by moving in this high-dimensional space. You could also find "directions" in this space (think PCA), to e.g. make a slider that increases your age when you move in that direction, while keeping other features relatively unchanging, or you could find the "femininity" direction and make someone masculine look more feminine, or a "smiling amount" direction, etc.
The equivalent of embedding text into a latent space is like when you have an image and you want to hill-climb to find a vector that generates the closest possible image to that (that you can then manipulate). I experimented with this using my profile picture (this was in August 2021, things have gotten much better since!):
Today, I discovered two new projects in this space. The first was specifically for using embeddings for search which is not that interesting but, to be fair, is what it's for. In the comments of that project on HackerNews, the second project was posted by its creator which goes a step further and puts a chat interface on top of the search, which is the exact approach I talked about before and think has a lot of potential!
Soon, I would like to be able to have a conversation with myself to organise my thoughts and maybe even engage in some self-therapy. If the conversational part of the pipeline was also fine-tuned on personal data, this could be the true starting point to creating digital twins that replace us and even outlive us!
Some weeks ago I built the "Muslim ChatGPT". From user feedback, I very quickly realised that this is one use case that absolutely won't work with generative AI. Thinking about it some more, I came to a soft conclusion that at the moment there are a set of use cases that are overall not well suited.
There's a class of computational problems with NP complexity. What this means is not important except that these are hard to solve but easy to verify. For example, it's hard to solve a Sudoku puzzle, but easy to check that it's correct.
Similarly, I think that there's a space of GPT use cases where the results can be verified with variable difficulty, and where having correct results is of variable importance. Here's an attempt to illustrate what some of these could be:
The top right here (high difficulty to verify, but important that the results are correct) is a "danger zone", and also where deen.ai lives. I think that as large language models become more reliable, the risks will be mitigated somewhat, but in general not enough, as they can still be confidently wrong.
In the bottom, the use cases are much less risky, because you can easily check them, but the product might still be pretty useless if the answers are consistently wrong. For example, we know that ChatGPT still tends to be pretty bad at maths and things that require multiple steps of thought, but crucially: we can tell.
The top left is kind of a weird area. I can't really think of use cases where the results are difficult to verify, but also you don't really care if they're super correct or not. The closest use case I could think of was just doing some exploratory research about a field you know nothing about, to make different parts of it more concrete, such that you can then go and google the right words to find out more from sources with high verifiability.
I think most viable use cases today live in the bottom and towards the left, but the most exciting use cases live in the top right.
Another important spectrum is when your use case relies on more on recall versus synthesis. Asking for the capital of France is recall, while generating a poem is synthesis. Generating a poem using the names of all cities in France is somewhere in between.
At the moment, LLMs are clearly better at synthesis than recall, and it makes sense when you consider how they work. Indeed, most of the downfalls come from when they're a bit too loose with making stuff up.
Personally, I think that recall use cases are very under-explored at the moment, and have a lot of potential. This contrast is painted quite well when comparing two recent posts on HN. The first is about someone who trained nanoGPT on their personal journal here and the output was not great. Similarly, Projects/amarbot used GPT-J fine-tuning and the results were also hit and miss.
The second uses GPT-3 Embeddings for searching a knowledge base, combined with completion to have a conversational interface with it here. This is brilliant! It solves the issues around needing the results to be as correct as possible, while still assisting you with them (e.g. if you wanted to ask for the nearest restaurants, they better actually exist)!
Somebody in the comments linked gpt_index so you can do this yourself, and I really think that this kind of architecture is the real magic dust that will revolutionise both search and discovery, and give search engines a run for their money.
Welp, looks like I'm a month late for the N-O-D-E Christmas Giveaway. You might be thinking "duh, Christmas is long gone", and I also found it weird that the deadline was the 31st of January, but it turns out that that was a mistake in the video and he corrected it in the comments.
Since I keep up with YouTube via RSS, I didn't see that comment until it was too late. I only thought to check again when my submission email bounced.
Oh well! At least it gave me a reason to finally write up my smart home setup! This also wasn't the first time that participating in N-O-D-E events really didn't work out for me -- in 2018 I participated in the N-O-D-E Secret Santa and sent some goodies over to the US, and really put some effort into it I remember. Unfortunately I never got anything back which was a little disappointing, but hey, maybe next time!
I've been planning to start this project for a while, as well as document the journey, but never really got around to it. I had a calendar reminder that tomorrow the N-O-D-E Christmas Giveaway closes, which finally gave me the kick in the butt needed to start this one! I also want to use this as an opportunity to create short-form videos on TikTok to learn more about it (in this case, documenting the journey). The project page is here.
Recently, people whose work I admire made me have to confront the "art not artist" dilemma once more. In this case, Nick Bostrom with racism, and Justin Roiland with domestic abuse.
Thinking about it, more generally, I guess it comes down to:
However, it makes me think about the question: what if an AI were to be in a similar situation? Done something good and also done something bad. The current vibe seems to be that AI is a "tool" and "guns don't kill people, people kill people". But once you assign agency to AI, it starts opening up unexplored questions I think.
For example, what if you clone an AI state, one goes on to kill, the other goes on to save lives, in what way is the other liable? It's a bit like the entanglement experiment that won the 2022 Nobel physics prize -- you're entangling across space (two forks of a mind) vs time (old "good" version of a celebrity vs new "bad" version of a celebrity) where all versions are equally capable of bad in theory. To what extent are versions of people connected, and their potential?
It also reminds me of the sci-fi story Accelerando by Charles Stross (which I recommend, and you can read online for free here) where different forks of humans can be liable for debts incurred by their forks.
On a related note, I was recently reading a section in Existential Physics by Sabine Hossenfelder titled "Free Will and Morals". Forgive the awful photos, but give it a read:
So it doesn't even have to be AI. If someone is criminally insane, they are no longer agents responsible for their own actions, but rather chaotic systems to be managed, just like you don't "blame" the weather for being bad, or a small child for making mistakes.
Then, what if in a sufficiently advanced society we could simply alter our memories or reprogram criminal intent away? Are we killing the undesirable version? The main reasons for punishment are retribution, incapacitation, deterrence, and rehabilitation, but is there research out there that has really thought about how this applies to AI?
There's a fifth reason that applies only to AI: Roko's Basilisk (warning: infohazard) but it's all connected, as I wonder what majority beliefs we hold today that future cultures will find morally reprehensible. It might be things like consuming animals or the treatment of non-human intelligence that is equivalent to or greater than humans by some metric. At least we can say that racism and domestic violence are pretty obviously bad though.
Twilio used to be a cool and trustworthy company. I remember when I was in uni, some CS students (I was not a CS student) built little SMS conversation trees like it was nothing, and suddenly SMS become something you could build things with as a hobby.
Over the past month, my view of Twilio has completely changed.
Ten days ago (Jan 19th) at around 7am UTC, I woke up to large charges to our business account from Twilio, as well as a series of auto-recharge emails and finally an account suspension email. These charges happened in the span of 3 minutes just before 5am UTC. My reaction at this point was confusion. We were part of Twilio's startup programme and I didn't expect any of our usage to surpass our startup credits at this stage.
I checked the Twilio dashboard and saw that there was a large influx of OTP verification requests from Myanmar numbers that were clearly automated. I could tell that they're automated because they came basically all at once, and mostly from the same IP address (in Palestine). At this point, I realised it was an attack. I could also see that this was some kind of app automation (rather than spamming the underlying API endpoint) as we were also getting app navigation events.
After we were suspended, the verifications failed, so the attack stopped. The attacker seemed to have manually tried a California IP after that some hours later, probably to check if they've been IP blocked, and it probably wasn't a physical phone (Android 7). Then they stopped.
I also saw that our account balance was more than £1.5k in the red (in addition to the charges to our bank account) and our account was suspended until we zero that balance. The timing could not have been worse as we were scheduled to have an important pitch to partners at a tier 1 VC firm. They could be trying the app out already and unable to get in as phone verification was confirmed broken.
We're on the lowest tier (as a startup) which means our support is limited to email. I immediately opened a ticket to inform Twilio that we were victims of a clear attack, and to ask Twilio for help in blocking these area codes, as we needed our account to be un-suspended ASAP. They took quite a long time to respond, so after some hours I went ahead and paid off the £1.5k balance in order for our account to be un-suspended, with the hope that they can refund us later.
I was scratching my head at what the possible motive of such an attack could be. I thought it must be denial of service, but couldn't think of a motive. We're not big enough for competitors to want to sabotage us, so I was expecting an email at any point from someone asking for bitcoin to stop attacking us, or a dodgy security company coming in and asking for money to prevent it. But Twilio sent an email saying that this is a case of toll fraud.
I recommend reading that article, but in essence, those numbers are premium numbers owned by the attacker, and every time Twilio sends them a verification SMS, they make money, and we foot the bill.
Twilio seemed to follow a set playbook that they use for these situations. Their documentation names a set of countries as the one where toll fraud numbers most likely come from and recommend are blocked (I suppose it's easy to get premium numbers there): Bangladesh, Sri-Lanka, Myanmar, Pakistan, Uzbekistan, Azerbaijan, Kyrgyzstan, and Nigeria.
I immediately went and blocked those area codes from our side, though Twilio also automatically blocked all countries except the US and the UK anyway, so it didn't really make a difference. Also, the attacker tried again using Indonesian numbers after that, so clearly a blocklist like that is not enough. Later I went and one by one selectively allowed only countries we actually serve.
Beyond this, Twilio's response was to try and do everything to blame this on us. They wash their hands of the responsibility to secure their own APIs, and instead the onus is on us to implement our own unreasonable security measures.
I told a friend about this, and through that friend found out that this is actually a very common problem that people have been having with Twilio, because Twilio dropped the ball. Apparently, out of all of those cases, we got pretty lucky (some people lost 6 figures). For me, the main issues are:
Their email was incredibly patronising, like others have reported, and they acted like they're doing us a huge favour by blessing us with a partial refund in account credits (not even real money). But we need to explain to them first how we promise to be better and not do a silly mistake like this again!
Twilio tries to push you into agreeing not to dispute the bank charges (see the link above for why they do this). I refused to agree to this, and first wanted to know exactly how much they would refund us, and if they would refund us in real money, not account credits (they agreed to "prioritize" this).
They told us that their finance team is who decides the refund amount, based on the information we provide on how we'll do better and a breakdown of the charges. I told them exactly what we did to combat this, and what the charges were. We had lost a few hundred in startup credits, then just over £2k in real money.
Instead of telling me how much they would refund (remember, I still haven't agreed not to dispute the charges, which they "required" in order to issue a refund), they went ahead and refunded us £847 and some change immediately.
I believe this to be a ploy to try and prevent us from disputing the original charges, because if we dispute now, we would have more back than what they charged.
I sought some advice, with mixed opinions, but it seems quite clear that if we dispute these charges, at the very least it would mean that we can no longer use Twilio for SMS anymore (which I don't want to anyway). But, this means switching to a different provider before disputing.
It would be relatively easy to switch, as they all tend to work the same way anyway, but would still require:
This is not difficult, but time and effort that I don't have right now, as well as a distraction from our actual core product. I don't know if £1.1k is worth that "labour", or any extra stress that may come if Twilio decides to make a stink about this and pass us on to collections etc.
All I know is: Twilio, never again. I will advise people to not use Twilio for the rest of my life and longer depending on how that advice may spread and how long this article survives.
My brother's in New York and I was reminded of a scam we fell for there once. This wasn't the typical Time's Square Elmo-league stuff, but seemed quite legitimate! I wanted to recount the story in case it might help someone.
We were planning to visit the Empire State building (which by the way, wasn't that great, especially that foggy day) and when we arrived there we were shocked to see a queue going all around the block and across several streets. We were approached by a man named DeShawn Cassidy selling the New York Pass.
"You can leave. Your Wallet. At home," he says. "You can laugh at aaaaall these people," as he points to the massive queue, telling us we can skip it with the glorious New York Pass. It's fast-lane entry and cheaper tickets into the Empire State building and a bunch of other attractions around New York within a certain time period.
He was a very convincing and charismatic salesman. We asked him why the people in the queue aren't cleaning him out if it's so good. He threw his hands up and said, "It behooves me!" misunderstanding what that word means.
We paid him $80 for 5 passes I believe, which was a great deal. He rubbed his hands like a fly about to have a meal as we were taking the money out, and gave us a receipt, staking his name and reputation on it, "DeShawn Cassidy", and that we can call him at any time if we need anything.
Of course, you know how the rest of the story goes. DeShawn was all but erased from existence, and we didn't have the opportunity to "laugh at all these people" as the security made us queue like everyone else. The special entrances were only for people who actually worked in the building.
We thought that maybe there's a faster queue inside, after clearing the building queue, and at least we don't need to get new tickets. Wrong again! The man at the till took one look at our little plastic cards, and in the strongest New York accent that still rings in my mind to this day, said the infamous words:
New York Pass? Don't do nothin'!
Yesterday evening I had a call with three founders looking for some advice on specific things. Something that came up was how to make a proper pitch deck. My advice is usually to go to Slidebean and check out the pitch decks of some well-known companies. There's a clear pattern to how these are structured, depending on who the target of the deck is.
But recently, a different founder sent me a pitch deck asking for feedback and he used a platform called Tome[1], and his slides were pretty cool, and when viewed on that platform could even have little video bubbles where he explains the slide. At first I though this was a GPT-3-based slide generator (similar to ChatBA (formerly ChatBCG)) but it seems to be more than that and looks like it could be a great tool for putting together a pitch deck on a whim!
Referral link, not sponsored ↩︎
Great article on some ways to interact with ChatGPT: https://oneusefulthing.substack.com/p/how-to-use-chatgpt-to-boost-your. I find it funny that so many people speak to ChatGPT politely (I do too). I wonder if post-singularity we'll be looked upon more favourably than the impolite humans.
Last weekend I built a small AI product: https://deen.ai. Over the course of the week I've been gathering feedback from friends and family (Muslim and non-Muslim). In the process I learned a bunch and made things that will be quite useful for future projects too. More info here!
A while ago I dug into my DNA via a number of services. I had the uncommon opportunity of being able to compare the results of two services (while only really paying for one). Now I finally got around to writing this up and might update it over time as I do more genealogy-related things. https://yousefamar.com/memo/notes/my/dna/
In my previous post I made a little block diagram. Here's the workflow for how I did that: https://yousefamar.com/memo/articles/writing/graphviz/
If you happen to have checked my main feed page in the past few days, you might have notice I've added a box to subscribe to a newsletter. This is meant to be a weekly digest of the posts I make the week before, delivered to your email inbox.
I think I'm getting close to figuring out a good system for content pipelines, though I still think about it a lot. As such, this newsletter part will mostly be an experiment for now. It won't be an automated email that summarises my posts, but rather I'm going to write it myself to begin with. I'd like to follow a style like the TLDR newsletter, which I've been following since they launched. This means e.g. a summary of cool products I might have bookmarked throughout the week, which might also give me the opportunity/excuse to review and organise them.
I'm not convinced that the medium of newsletters is the right way to consume content. I for one am a religious user of kill-the-newsletter to turn newsletters into Atom feeds. A lot of people consume content via their email inboxes though, and it seems easier to go from that to the feed format, rather than the other way around at the moment. At any rate, I want to create these various ways of consuming content. The pipeline for this content might look like this:
The other consideration is visibility of my audience. I don't actually know if anyone reads what I write unless they tell me (hi James!), and unless I put tracking pixels and such in my posts, but is it really that important? With email, you have a list of subscribers, which probably gives you slightly more data over feed readers polling for updates to your feed, but again, I don't really want to be responsible for a list of emails, and I don't like being at the mercy of the big email providers' spam filters if I want to send email from my own domain (yes, this is despite SPF/DKIM and all that, based on some voodoo you can still reach people's junk folder).
So I'm thinking for now I probably don't even really care who reads what I write, and if it becomes relevant (e.g. if I want to find out what people would like to see more of), I can publish a poll.
Not too long ago I mentioned that the search engines will need to add ChatGPT-like functionality in order to stay relevant, that there's already a browser extension that does this for Google, and that Google has declared code red. Right on schedule, yesterday Microsoft announced that they're adding ChatGPT to Bing. (If you're not aware, Microsoft is a 10-figure investor in OpenAI, and OpenAI has granted an exclusive license to Microsoft, but let's not get into how "open" OpenAI is).
I heard about this via this HackerNews post and someone in the comments (can't find it now) was saying that this will kill original content as we know it because traffic won't go to people's websites anymore. After all, why click through to websites, all with different UIs and trackers and ads, when the chat bot can just give you the answers you're looking for as it's already scraped all that content. To be honest, if this were the case, I'm not so sure if it's such a bad thing. Let me explain!
First of all, have you seen the first page of Google these days? It's all listicles, content marketing, and SEO hacks. I was not surprised to hear that more and more people use TikTok as a search engine. I personally add "site:reddit.com" to my searches when I'm trying to compare products for example, to try and get some kind of real human opinions, but even that might not be viable soon. You just can't easily find what you need anymore these days without wading through ads and spam.
Monetising content through ads never really seemed like the correct approach to me (and I'm not just saying that as a consistent user of extensions that block ads and skip sponsored segments in YouTube videos). It reminds me a lot of The Fable of the Dragon-Tyrant. I recommend reading it as it's a useful metaphor, and here's why it reminds me (skip the rest of this paragraph if you don't want spoilers): there's a dragon that needs to be fed humans or it would kill everyone. Entire industries spring up around the efficient feeding of the dragon. When humans finally figured out how to kill it, there was huge resistance, as among other things, "[t]he dragon-administration provided many jobs that would be lost if the dragon was slaughtered".
I feel like content creators should not have to rely on ads in the first place in order to be able to create that content. I couldn't tell you what the ideal model is, but I really prefer the Patreon kind of model, which goes back to the ancient world through art patronage. While this doesn't make as much money as ads, I feel like there will come a point where creating content and expressing yourself is so much easier/cheaper/faster than it is today, that you won't have high costs to maintain it on average (just look at TikTok). From the other side, I feel like discovery will become so smooth and accurate, that all you need to do is create something genuinely in demand and it will be discovered on its own, without trying to employ growth hacks and shouting louder than others. I think this will have the effect that attention will not be such a fiery commodity. People will create art primarily for the sake of art, and not to make money. Companies will create good products, rather than try to market worthless cruft. At least that's my ideal world.
So how does ChatGPT as a search engine affect this? I would say that this should not affect any kinds of social communication. I don't just mean social media, but also a large subset of blogs and similar. I think people will continue to want to follow other people, even the Twitter influencer that posts business tips, rather than ask ChatGPT "give me the top 5 business tips". I believe this for one important reason: search and discovery are two different things. With search, there is intent: I know what I don't know, and I'm trying to find out. With discovery, there isn't: I don't know what I don't know, but I loiter in places where things I would find interesting might appear, and stumble upon them by chance.
Then there's the big question of having a "knowledge engine" skipping the sources. Let's ignore the problem of inaccurate information[1] for now. I would say that disseminating knowledge at the moment is an unsolved problem, even through peer-reviewed, scientific journal papers and conference proceedings (this is a whole different topic that I might write about some day, but I don't think it's a controversial view that peer-review and scientific publishing is very, very broken).
I do not believe that the inability to trace the source of a certain bit of knowledge is necessarily the problem. I also don't believe that it's necessarily impossible, but lets pretend that it is. It would be very silly I think to cite ChatGPT for some fact. I would bet that you could actually get a list of references to any argument you like ("Hey ChatGPT, give me 10 journal citations that climate change is not man-made").
I think the biggest use cases of ChatGPT will be to search for narrowly defined information ("what is the ffmpeg
command to scale a video to 16:9?") and discover information and vocabulary on topics that you know little about in order to get a broad overview of a certain landscape.
However, I don't see ChatGPT-powered search killing informative articles written by humans. I see AI-generated articles killing articles generated by humans. "Killing" in the sense that they will be very difficult to find. And hey, if ChatGPT could actually do serious research, making novel contributions to the state-of-the-art, while citing prior work, then why shouldn't that work be of equal or greater value to the human equivalent?
In the case of AI-generated garbage drowning out good human articles just by sheer quantity though, what's the solution? I think there are a number of things that would help:
Overall I think that ChatGPT as the default means of finding information is a net positive thing and may kill business models that were flawed from the start, making way for something better.
I've had this problem with normal Google before (the information cards that try to answer your questions). For a long time (even after I reported it), if you searched something like "webrtc connection limit", you would get the wrong answer. Google got this answer from a StackOverflow answer that was a complete guess as far as I could tell. Fortunately, the person who asked the question eventually marked my answer as the correct one (it already had 3x more upvotes than the wrong one) although the new answer never showed up in a Google search card as far as I can tell. ↩︎
I finally wrote an article on my thoughts about ChatGPT after a lot of repeated questions/answers from/to people: https://yousefamar.com/memo/articles/ai/chatgpt/
This is one of those things where I'm not sure it should really be an "article" but instead something more akin to a living document that I update continuously, maybe with a chronological log included. At the same time, a lot of the content is temporally bound and will probably lose relevance quite fast. Something to figure out in the future!
Amarbot was using GPT-J (fine-tuned on my chat history) in order to talk like me. It's not easy to do this if you follow the instructions in the main repo, plus you need a beefy GPU. I managed to do my training in the cloud for quite cheap using Forefront. I had a few issues (some billing-related, some privacy-related) but it seems to be a small startup, and the founder himself helped me resolve these issues on Discord. As far as I could see, this was the cheapest and easiest way out there to train GPT-J models.
Unfortunately, they're shutting down.
As of today, their APIs are still running, but the founder says they're winding down as soon as they send all customers their requested checkpoints (still waiting for mine). This means Amarbot might not have AI responses for a while soon, until I find a different way to run the model.
As for fine-tuning, there no longer seems to be an easy way to do this (unless Forefront open sources their code, which they might, but even then someone has to host it). maybe#6742 on Discord has made a colab notebook that fine-tunes GPT-J in 8-bit and kindly sent it to me.
I've always thought that serverless GPUs would be the holy grail of the whole microservices paradigm, and it might be close, but hopefully that would make fine-tuning easy and accessible again.
My friend Selvan sent me this puzzle:
Feel free to give it a try before revealing my thought process and solution! Also, in case you're wondering, the sticks do have to have marshmallows on both ends, and they're straight, and marshmallows can't be in the same position or at infinity. Also, the sticks can cross (this doesn't violate the "2D" requirement). None of this was obvious to me!
At first, I looked at this as a graph. The graph is undirected and the vertices unlabelled. There are two possible edge weights, and the graph is not allowed to violate the triangle inequality. Intuitively, whenever edge weights are involved, I think of force-directed graphs (like a spring system with different length springs) that relax into a configuration where there's no tension in the springs.
Anyway, if you think about it as a graph, you'll realise that topologically, the first configuration is exactly the same as a square with an X in it. In fact, it's not possible for any other configuration to exist, as a graph with 4 vertices and 6 edges is completely connected. This means that we can't play around with topology, only the edge weights (or rather, move the vertices around, if you think of it that way).
There is no alternative layout where a fourth vertex is inside a triangle like the example, so the vertices *must* be in a quadrilateral layout. If you then build a trapezium using three long sticks and one short stick, you'll quickly see that there's a layout at which the shorter ones are all the same length. I made a visualisation to help illustrate this:
Afterwards, Selvan prompted me to realise that the distance between the bottom left corner and the point of intersection in the middle of the X should be the same as the red line distance, answering at which point exactly the vertices along the red lines are equidistant from each other!
Obsidian Canvas was released today and I find this very exciting! As you might know, I'm a very visual thinker and try to organise my thoughts in ways that are more intuitive to me. I've always thought that an infinite canvas that you can place nested collapsible components and primitives on makes much more sense than a directory tree. I've used other tools for this, but the separation from my PKM tool (Obsidian) has always been a big barrier.
Obsidian keeps getting better over time! It seems the canvas format is relatively simple, where I reckon I could have these be publishable. More importantly though, I think it would be quite useful to organise my thoughts internally. Currently I use a combination of whiteboard wallpaper, actual paper, and Samsung Notes on my S22 Ultra; the only not-bad Android note-taking app with good stylus support, but frustratingly it doesn't let you scroll the page infinitely in the horizontal direction!
It can be a bit frustrating to try and manipulate a canvas without over-reliance on a mouse, but I don't think there are any ergonomic ways to interact well with these besides a touch screen, and at least the keyboard shortcuts for Canvas seem good. When AR becomes low-friction, I hope to very soon be able to use 3D spaces to organise documents and assets, in a true mind palace. For now, Obsidian Canvas will do nicely though!
/u/dismantlemars created a colab to run OpenAI's new Point-E model that you can use here. My first few experiments were interesting though not very usable yet! Supposedly it's thousands of times faster than DreamFusion though (the most well known crack at this). It took me about 30 secs to generate models, and converting the point cloud to a mesh was instant.
I tried to first turn my profile picture into 3D, which came out all Cronenberg'd. To be fair, the example images are all really clean renderings of 3D models, rather than a headshot of a human.
Then I tried the text prompt "a pink unicorn" which came out as an uninteresting pink blob vaguely in the shape of a rocking horse. Simply "unicorn" looked a bit more like a little dinosaur.
And finally, "horse" looked like a goat-like horse in the end.
The repo does say that the text to point cloud model, compared to the image to point cloud model is "small, worse quality [...]. This model's capabilities are limited, but it does understand some simple categories and colors."
I still find it very exciting that this is even possible in the first place. Probably less than a year ago, I spoke to the anything.world team, and truly AI-generated models seemed so far out of reach. Now I feel like it won't be much longer before we can populate entire virtual worlds just by speaking!
On a related note, I recommend that you join the Luma waitlist for an API over DreamFusion.
There are APIs out there for translating natural language to actions that a machine can take. An example from wit.ai is the IoT thermostat use case.
But why not instead use GPT-3? It ought to be quite good at this. And as I suspected, the results were quite good! The green highlighted text is AI-generated (so were the closing braces, but for some reason it didn't highlight those).
I think there's a lot here that can be expanded! E.g. you could define a schema beforehand rather than just give it some examples like I have, but I quite like this test-driven approach of defining what I actually want.
I did some tweaks to teach it that I want it to put words in my mouth as it were. It invented a new intent that I hadn't defined, so it would probably be useful to define an array of valid intents at the top. It did however manage to sweet-talk my "wife"!
I think this could work quite well in conjunction with other "modules", e.g. a prompt that takes a recipient
, and a list of people I know (and what their relationship is to me), and outputs a phone number for example.
Amazon's creating AI-generated animated bedtime stories (story arc, images, and accompanying music) with customisable setting, tone, characters, and score. I believe that procedurally generated virtual worlds will be one of the prime use cases for these large models, and this is one example that I expect to see more of!
I think the most difficult part here will be to craft truly compelling and engaging stories, though this is probably soon to be solved. My brother and I attempted a similar project (AI-generated children's books) and the quality overall was not good enough at the time, but at the speed these things move I expect that to be a thing of the past in a matter of months!
Yesterday GitHub Copilot engineers borked production and I felt like someone had suddenly turned the lights off.
I hadn't realised how accustomed I had become to using it until this happened. I would make my intent clear in the code, then wait for it to do its thing, then it just wouldn't. Y'all got any more of them AIs?
At the same time, the next time you deploy a bad build to production, remember that even the big guys do it!
I wrote an article on bruteforcing Tailscale domain names (code included!): https://yousefamar.com/memo/articles/hacks/tailnet-name/
I'm letting day-nft.com expire. This was an experiment with 3 other people where we minted simple NFTs that each correspond to a different date going back something like 10 years. The technical part was relatively straightforward, but we realised that the whole thing is just one big hype game, and in order for it to succeed we would need to do things that we weren't comfortable with morally, so we abandoned the project. At that point I had already done some research and analysis on NFT marketplaces (which I intent to publish at some point) that helped me cement the current views I hold about this space.
Seems like GPT-4 is just around the corner! I'm really looking forward to it and not just the improvement on GPT-3, but the multi-modal inputs. I really think GPT-4 and models like it will be central to our future.
Nvidia's new diffusion model is really pushing the envelope. A lot of exciting capabilities!
I'm certain the market for GPT3-based spreadsheet plugins/add-ons is ripe for sales much more than libraries that target developers like cerebrate.ai. I've seen a general-purpose add-on for Google Sheets here, but I think that crafting these prompts to do specific things and wrapping these in higher-level functions has much more potential.
More Stable Diffusion resource links: https://rentry.org/sdupdates2
It's official — Amarbot has his own number. I did this because I was using him to send some monitoring messages to WhatsApp group chats, but since it was through my personal account, it would mark everything before those messages as read, even though I hadn't actually read them.
My phone allows me to have several separate instances of WhatsApp out of the box, so all I needed was another number. I went for Fanytel to get a virtual number and set up a second WhatsApp bridge for Matrix. Then I also put my profile picture through Stable Diffusion a few times to make him his own profile picture, and presto: Amarbot now has his own number!
In case the profile picture is not clear enough, the status message also says that he's not real. I have notifications turned off for this number, so if you interact with him, don't expect a human to ever reply!
Some of my HNS domains are expiring soon and I don't think I'll renew them. While the concept is super cool, unless Chrome and Safari adopt HNS, it'll never go anywhere. I now think it's very unlikely that they ever will.
Almost exactly 6 years ago, I ate too many Pringles, as reminded by my photo app throwback. My brother won a contest where the prize was crates of Pringles and he gave me all the sour cream and onion ones. I ate too many of them in too short a time and since then I kind of lost my taste for them. The same thing happened to me with peanuts — I used to love them and now I basically never eat them.
When I was a student, I got an oyster photocard for commuting with a discount. Eventually I also had my railcard added to this (though IIRC, the discounts aren't cumulative). I had it renewed right at the last possible moment before expiry and aging out, and the new card was meant to expire on the 31st of Jan 2020. It never did and I've been using it since — maybe expiry meant the discount?
Eventually the outermost plastic layers peeled off (the layer with my name and photo on it) leaving an ominous blank card.
The card number was also peeled off, so when I had an incomplete trip one day, while getting that sorted, a friendly TFL employee let me know what it was on a receipt of my past few journeys. Only then did I really think about what the point of using an oyster card is (since I'm not getting discounts anymore) over a contactless credit card.
It seems there isn't really much of a benefit for me, so I'll probably just let it run out and stop using it. I might draw a little picture in that empty spot.
I had a normal oyster card many many years ago (before the first photocard) that I at some point added to the online dashboard with 60p still on it. I had given this oyster card to a homeless lady thinking there was more than that on it and she probably tossed it. I reckon if I plan my last trip in such a way that the balance goes to -60p, then never top it up again, then my overall balance with TFL should be... well, balanced!
Hello twitter! This post was syndicated using Bridgy.
As of today, if you react to a message you send me on WhatsApp with a robot emoji (🤖), Amarbot will respond instead of me. As people in the past have complained about not knowing when I'm me and when I'm a bot, I added a very clear disclaimer to the bottom of all bot messages. This is also so I can filter them out later if/when I want to retrain the model (similar to how DALL-E 2 has the little rainbow watermark).
The reason I was able to get this to work quite easily is thanks to my existing Node-RED setup. I'll talk more about this in the future, but essentially I have my WhatsApp connected to Matrix, and Node-RED also connected to Matrix. I watch for message reactions but because those events don't tell you what the actual text of the message is that was reacted to was, only the ID, I store a small window of past messages to check against. Then I query the Amarbot worker with the body of that message and format and respond with the reply.
This integrates quite seamlessly with other existing logic I had, like what happens if you ask me to tell you a joke!
Amarbot has been trained on the entirety of my WhatsApp chat logs since the beginning of 2016, which I think is when I first installed it. There are a handful of days of logs missing here and there as I've had mishaps with backing up and moving to new phones. It was challenging to extract my chat logs from my phone, so I wrote an article about this.