Understanding How AI LLMs Actually Work

I recently came across this video, 99% of Developers Don't Get LLMs, that despite it's click-bate title, makes a good point. It's a point I have been passionate about my whole life. The core is that we should not simply use tools, but also understand them. To be clear, I am not advocating for expert level understanding of the inner-workings of every device or piece of software we use in our personal and professional lives, but for a foundational knowledge. I have a few unrelated examples from my own life below showcasing how my innate curiosity has benefited me over the years.

When I was in 5th or 6th grade in the late 90s, I began upgrading and replacing parts in a Gateway desktop PC that was given to me, which quickly led to my first full custom build in the early '00s. As I researched parts before making purchase decisions, I gravitated to those review sites such as Tom's Hardware, Anandtech, TweakTown and others that really went into architectural deep dives for each component, and mapping the performance back to architectural decisions. This led me to gain a very solid understanding of not just the parts and their relative performance characteristics, but also why their performance sat where it did, and the design trade offs made along the way, and also in computer architecture. This knowledge allowed me to make much more informed decisions knowing my use cases (as a budding 3D artist in High School, I had some odd needs for your typical teenager).

As another (more relatable perhaps) example, when I started driving, I read a few books on the mechanical workings of cars, and always had the repair manual for cars I owned on hand. I typically changed my own oil, performed my own brake pad swaps, spark plug replacements and brake disc replacements. These are all relatively straight-forward things to do, and saved me a large amount of money over time. It also afforded me a lot of insight when talking with mechanics on the larger items I wasn't equipped to fix myself, and the ability to understand the complexity, labor and other factors when reading their estimates.

The two examples above illustrate some of the advantages I've had in life carrying a more then surface-deep understanding of various tools used in my day-to-day. However, my curiosity as to the inner workings of everything extends to almost every corner of my life, and that curiosity has nearly always paid for itself in the long run.

When it comes to my use of AI as a developer and professional, this is no different. When we begin to understand what LLMs and the Transformer Models they are built on actually do under the hood, a lot becomes demystified. When we internalize the idea of token prediction, and begin to realize that AI isn't even really answering our questions, but instead, it's completing our thoughts, filling in the blanks, and simply predicting what the next words should probably look like after the ones we prompted, we can begin to understand they why behind prompt engineering. We can begin to make educated adjustments to our use of the tool. We can (and very much should) realize AI in its current form is not actually intelligent, but simply recognizing the patterns in our prompts, looking at it's training data to see what structures look similar, and, using probability and assigned weights to fill in the blank we left it.

I strongly encourage personal learning and growth to everyone I speak with on the subject, especially given how quickly our world is changing and evolving. Learning and deeper understanding MUST be an integral part of all of our lives to keep up, stay relevant, and not least of all, grow as individuals. Knowledge truly is power, and AI isn't going to change that principle - maybe just the substance.