The arrangment of things
is more important than the
things themselves.

AI Part 6: Investment - Foundation Models

Foundation model companies such as OpenAI and Anthropic have something of a problem. While they are producing amazing new technologies, and look like a solid investment compared to "AI pins", they possess a number of crucial weaknesses that undermine their attractiveness.

The most important is that they have no "moat", which is to say no built-in protection against competitors. At first glance, the impressive level of technology they are building would seem to insulate them from competitors, but this advantage is somewhat illusory.

Consider SpaceX. Whatever one might think of its founder, it's impossible not to regard their achievements as anything short of revolutionary.

Based on its 2015 reusable-rocket technology, SpaceX lowered the cost of putting objects into space by a factor of 20x, from $54,500/kg to $2,720/kg and is on track to lower it dramatically again to $100-$200/kg. If you want to change the world, make something cheaper, and it rarely gets better than 99.7% cheaper.

A decade on, they have no competitors, even state-level competitors like China or India, and not for lack of demand. The market for launching payloads is robust with a long line of companies, universities and countries looking to expand their operations and reduce their costs. So why are they so singular? Why can no one complete?

Read more

AI Notes: LLM Inference

Here is a copy of my notes from when I was learning about LLM structure. They're pretty raw, but I look back at them from time-to-time, so I've posted them here in the hope that other people might find them useful. No attempt has been made to make this accessible to the lay person.

Read more

AI Part 5: Practical Implementation

Neural AI can be thought of as semi-controlled programming using simulated neurons inspired by nature, rather than pure symbolic constructs like logic, rules, and ideas. It’s also a kind of tacit admission that Mother Nature is way better at solving some kinds of problems than programmers are.

We can combine these neurons into patterns that accomplish various goals, such as:

  • Computer vision: Drone inspection of solar panel damage
  • Time-series models: Identifying patterns in sequential data, such as temperature, vibration data, or circuit voltage to detect problems
  • Natural language processing: Document classification, sentiment analysis, computer code generation
  • Object Detection and segmentation: Automated meter reading, vegetation management, safety compliance monitoring
  • Resolution enhancement: Improving low quality images, such as old data or satellite data
  • Optical Character Recognition: Loading old hand-written records into a database, equipment name-plate reading
  • Reinforcement Learning Agents: Electrical grid balancing, storage optimization, inventory management

Neural AI should be used in cases that play to its strengths, where the chances and costs of failure are low (if the AI misses a broken solar panel, it will likely catch it tomorrow and the cost is negligible; less so in medical contexts or power-grid operations, to say nothing of large-scale weapon systems).

The best way to look at neural AI is as a programming technique, like the wealth of symbolic AI that came before it and the even larger corpus of ordinary programming techniques.

Each of these bring their own strengths and weaknesses and none of them is a silver bullet. Neural AI can do some things much faster than the others and indeed, can do some things that are basically impossible in the others.

Lost in the hype is that the reverse is also true. There is no circumstance in which it makes sense to write, say, a file system with neural AI. File systems must work 100% of the time, quickly and efficiently. This is antithetical to neural AI. The technologies are complementary and any successful implementation of any solution will likely involve some mix of these approaches.

Read more

AI Part 4: Large Language Models

Since LLMs like ChatGPT are the subject of most of the current hype and misinformation, we should examine what they are and what they are not. LLMs are an incomplete implementation of the heuristic/mid-brain; it works on the basis of lossy-data storage and pattern matching.

Lossy is a technical term from the world of data compression: you can use compression to take a large file and make it smaller, that is, store it more efficiently. Lossless allows you to get back exactly what you compressed as no information is thrown away. Lossy compression is when information that is deemed unimportant by some metric is thrown out and you can’t get back exactly what you put in, but a reasonable facsimile.

Movies are a good example of lossy compression; we get a great end-product by throwing away information we don’t think the viewer will notice, resulting in video files being 200x smaller and therefore 200x cheaper to distribute.

LLMs store a huge number of concepts in an extremely small space, in part by throwing away a huge amount of “superfluous” data and trying to reconstruct what is missing when it needs it.

A top-shelf model's size, as commonly used, is on the order of 800GB. That's as much as approximately ten 4k movies or all the text on Wikipedia in all languages about 3 times over.

Since such models are trained on data on the order of 1,000,000GB (1PB), that suggests a compression ratio of 1250:1 (1,000,000 ÷ 800) using a simple, but misleading calculation.

Neural AI stores much more about the relationships and patterns about the data, or metadata, than the actual data itself.

As such, the real compression ratio of factual data, and by extension how much detail is thrown away, is much higher; perhaps 10,000:1 or more. A simple model, like the ones many people use for free, have a ratio 100 times higher.

Humans do the same thing; we might store enough in our minds to recognize the Coca-cola logo, but not enough to reproduce it from memory. We might remember the significance of a spreadsheet without memorizing all the numbers in it.

If we try to remember the specific numbers from the spreadsheet, we may be able to reconstruct the ones we can’t remember. But if we get it wrong, we’ll say something incorrect. LLMs will make the same kind of mistake, and this is one of the sources of hallucinations.

Read more

AI Part 3: Human intelligence, the model for AI

Cognitive psychologists recognize different modes of processing in human thinking: automatic, intuitive/heuristic and deliberate/analytical.

The automatic represents the functions that happen below the awareness of the person in question, such as most of the vision pipeline (we experience the end of that process, but not everything that goes into it), reflexes, pupil dilation, etc.

Note that heartbeat and many body-related functions are handled locally; your heart has its own simple neural network that causes it to beat; otherwise the hearts of quadriplegics would stop.

The automatic mode is sometimes called the "hind-brain" in colloquial usage, which somewhat maps with the anatomical "hind-brain", although imperfectly.

Intuitive/heuristic, our instinctive mode, is where most humans spend the overwhelming bulk of their lives, and represents the limit of brain function for the vast majority of other creatures.

It is a fast, reactive system that quickly searches our personally accumulated knowledge to match inputs, balancing the various results to arrive at the next action. It is extremely efficient, aggressively filtering out unimportant stimuli and responding only to changes in the remainder. It represents the part of our brain that governs instinct and intuition.

Read more

AI Part 2: Terminology

Here is a brief list of common terms used in AI with simple but accurate definitions. Be aware that while they have precise meanings, many of these terms are used interchangeably by authors on the internet. The list itself will be presented in two versions: topical and alphabetic. To aid…

Read more

AI Part 1: A Brief History of Artificial Intelligence

A very old mainframe from the 1950s

We are currently in the middle of the third AI boom; not the first, and likely not the last. It’s worth looking back at the road that brought us to this point because a) history has a well-known tendency to repeat itself and b) it will help place the current moment in perspective.

Early Days

The idea of artificial neurons themselves were first proposed in 1943 by Walter Pitts and Warren McCulloch, partially based on Alan Turing's groundbreaking work in the 1930s.

The first machine to make use of this novel idea was built in 1951. Marvin Minsky's SNARC was built using 40 vacuum tube-based neurons and was capable of solving simple mazes. In that same year computer programs were also learning to play checkers and chess.

The momentum accelerated in 1956 with the Dartmouth workshop, a meeting of like minded academics interested in "thinking machines" hosted by John McCarthy (who coined the term "Artificial Intelligence"), which established AI as a formal discipline.

1956 also saw the development of the first AI program, “Logic Theorist”. It independently solved 38 mathematical theorems from Principia Mathematica, an important math reference of the day, in one case doing so even more elegantly than the book’s authors. One of those authors, Bertrand Russel, was reportedly delighted to hear he’d been upstaged by the first AI.

Amusingly, the The Journal of Symbolic Logic rejected the creator's paper as lacking notability, somehow missing the fact that the proof's author was not human.

Later that year, at another early AI event, George Miller said "I left the symposium with a conviction, more intuitive than rational, that experimental psychology, theoretical linguistics, and the computer simulation of cognitive processes were all pieces from a larger whole."

Those instincts were well placed. The events of 1956 also laid the groundwork for the emergence of cognitive science in the 1970s; the intersection of Anthropology, Linguistics, Neuroscience, Psychology, Philosophy and Artificial Intelligence.

Read more