Spatial Queries x LLMs: Spatial Awareness
Some of the limitations mentioned above can be addressed when you bring language model interaction out of the playground and into a more dynamic 2D environment. Clicking buttons can be a little clunky, but something humans love to do, both online and off, is move and organize things. Whether it be clicking and dragging, grouping, or rearranging things, this behavior is ingrained into how we work and think. The concept itself is a tool for thought; we manipulate the physical or digital world to encode information in its very state, like alphabetized books or a browser window of tabs you have open to "read later". We naturally do this when working in spatial canvases too, and we realized that yes-anding this behavior by allowing it to interact with language model operations is key to building truly special interactions.
To do that we use spatial querying, which is the idea that, within a 2D (or 3D) space, elements can get information from other elements at specific points or areas within that space. When you combine spatial querying with language models, you start to get some *cool* interactions. These tools, which combine spatial querying with the pseudo-cognitive abilities of language models are called "spatially aware". Here's an example:
Here, the "sentiment analyzer" block queries the space around it, performs a basic sentiment analysis operation using GPT-3 on any text block that enters its area of effect, and colors that block depending on its sentiment. What we’re focusing on here is not the specific operation (sentiment analysis), but the interaction the user has with this system. What this creates is a user-manipulable zone of sentiment analysis, so to speak, around the block itself, which can be moved freely around the document, painting other elements accordingly. This gives the user the power to perform language model operations around their canvas just by moving or organizing the information they have stored there, making it a much more natural part of their workflow and cognitive processes.
Here's another example of this kind of tool. We call it the vibe clusterer:
This interaction is similar, but it acts on *all* the pieces of text around it, in this case finding a single word which can describe all of them. An interesting way of thinking about this is that essentially, there is a hidden layer beneath the canvas, populated at each point with words that represent the "common vibe" of the concepts on the visible layer. So, if there are 2 blocks near each other in the canvas labeled "panther" and "gazelle", then around that point, the hidden layer is populated by words such as "animal", "mammal", or "fast". The "vibe clusterer" block then makes that hidden layer visible: it drags that language up into the main canvas for you to see. By moving the main block around, you're moving where you're inspecting the hidden layer, and by moving the text blocks, you're changing the actual makeup of the hidden layer. For example, if you then drag a "racecar" block near "panther" and "gazelle", this changes the makeup of the hidden layer at that location, to things like "fast", "speed", or "quick". And before we forget why we're here, it's the analysis power of the language model that populates this hidden semantic layer by doing the work of finding similar vibes.
This "vibe clusterer" is one of the simplest implementations of this idea of spatial awareness, but we've already entered territory that absolutely would not be possible were we using a linear, text-forward interface. The combined power and language models and spatial canvases is really quite cool, and we’re honestly just scratching the surface. On we go.