top of page

AIFP Part I.2: A framework to (start) understanding any AI system.

  • Writer: Jinghong Chen
    Jinghong Chen
  • Feb 2
  • 10 min read

In the previous article, we established the following understanding about AI:


An AI system is a system that learns from data.

This understanding is foundational: it is the root of our AI knowledge tree. Now we can comfortably move on to the next question: given a specific AI system, how do I understand it systematically? The quesiton is incredibly important for change-makers, because the answer would enable you to start doing your own research on any AI system. That is exactly the aim of this article. Let's get started!


"The Trinity": Three Things to Understand AI


Many complex disciplines revolve around three fundamental subjects. For example, Physics is concerned with Space, Time, and Matter. Geometric constructs consist of Point, Line, and Plane. The modes of persuasion in Aristole's terms are Logos (Logic), Ethos (Speaker's Character), and Pathos (Emotions). This strucuture of "Trinity" seems to be universal in understanding complex subjects.


Luckily, this "Rule of Three" also applies to AI. If you cut all the clutters and take the subject of AI to the mere bones, you get three things: Model, Learning, and Data. Let's take a closer look at each of them.


The "Trinity of AI", inspired by the Holy Trinity in Christianity: Model is not AI; Learning is not AI; Data is not AI; But AI exists in Model; AI exists in Learning; AI exists in Data.
The "Trinity of AI", inspired by the Holy Trinity in Christianity: Model is not AI; Learning is not AI; Data is not AI; But AI exists in Model; AI exists in Learning; AI exists in Data.


"Model": an Architecture of Parameters


Writing this in 2025, I would imagine you have heard a lot about "models". You would have heard some company has come up with "stronger" AI models that can do many new things, more "efficient" AI models that are cheaper to run, and more "specialized" AI models that are built for specific tasks... Models seem to be the protagonist in the AI business. How would our AI knowledge tree, which is rooted in learning, accommodates this fact?


If you think about the process of learning, it is clear that there must be some form of "memory" to store what has been learned. Learning is impossible if there's no place to store away what's learned. Another pre-requisite is that you must be able to update and access that memory. Because learning is pointless if you don't make use of what's learned. For human beings, the device that can update and access memory is called a Brain. For artificial intelligence, the equivalent is called a Model.


Why the term "Model"? I believe this reflects a fundamental assumption about learning: that the world can never be fully understood. The best we can do is to come up with simplified models that approximates the real world. Another benefit of the term "Model" is that it brings up an intuitive mental image: think about how you would build a sand model of the Eiffle Tower on a beach. You would take out your phone, find an image of the Eiffle Tower, and try to shape the sand so that it maximally resembles the real thing. That's exactly what AI models do in learning: representing the real world with the best precision using what's available. Although you know on the outset that you will never get the true Eiffle Tower, the sand model can nevertheless be a spectacular view. The same is true for AI.


What does a model contain? For practical reasons, we will focus on digital computer-based models (i.e., we are concerned with "Machine Intelligence"). For those not familiar with computers, they can only store two things: instructions (i.e., how to operate on numbers) and data (i.e., numbers). And so it follows that correspondingly, models that live on computers can only have two fundamental components. And they are called architecture (i.e., how to operate on numbers) and parameters (i.e., numbers).


To give you a concrete example, consider a model of only two parameters, a=0.9 and b=5. The model takes one input number x=90 (say, your mark for the previous maths exam) and gives one output number y (say, a prediction for your next maths exam). These information alone doesn't fully specified the model, you have to define how to compute using the parameters a, b and the input x. For example, y = ax + b = 86. This equation is the architecture of the model.


We use the term "architecture" because modern models often consist of a set of basic, repetitive equations, known as "layers". When you run the model, you pass the input through each layer (i.e., equations), get the output, and pass the output to the next layer. Your final model will be the compositie of a bunch of layers, much like an building built floor-by-floor.


The architecture of BERT that rocked the NLP world. The colored blocks, layers, are essentially sets of pre-defined equations. The "Nx" to the left and the right of the image indicates that these blocks are stacked together N times to form the final model. You don't need to know all about the layers to appreciate why "architecture" is a good name.
The architecture of BERT that rocked the NLP world. The colored blocks, layers, are essentially sets of pre-defined equations. The "Nx" to the left and the right of the image indicates that these blocks are stacked together N times to form the final model. You don't need to know all about the layers to appreciate why "architecture" is a good name.

So now we have met the protaginst in AI: the model. It is simply a set of numbers (parameters) that are arranged in a set of equations (architecture). Put poetically, a model is an architecture of parameters. You would heard about model of size more than 600 billions. That simply means there are 600B numbers in place to run the equations of the model. I admit 600B is an impressive number. If visualized, it could be as spectacular as the greatest architecture mankind has ever built.


"Data": Show, don't Tell.


Before we get to the core of AI, "Learn", the verb, we need to understand the subject ("Model") and the object ("Data"). So let's look at data next.


For adults, the most widely used data from which we learn are books. If you want to become a expert, you would spend years essentially reading about the subject. This is incredibly efficient. Books condense knowledge hard-earned through decades of efforts into texts that can be understood in matters of weeks. It is one of the most efficient ways to pass on knowledge. Is this way of learning applicable to machines? In other words, if we have a specific problem, can we simply find all the related books, feed it to an AI, and hope that it will learn to solve the problem?


In most cases, no. Because reading has two major limitations: (1) Reading often provides indirect information about how to perform a task. For example, many books on architecture design can be purely philosophical about the design process and not contain much information about what great architects were given (budgets, plans, practical constraints) as inputs and what they delivered (the actual building they designed) as results. You can easily imagine someone who are well-read on design philosophies but is still a bad designer. (2) You can only learn what's been written before. But as we explained in Part I.1, even the very first AI was designed to learn something yet unknown to human experts. Put another way, you can never fundamentally surpass your teachers if you learn only from them. What's worse, what if the problem is new and nobody has ever written on it before? And so we need to look beyond books as our data.


The principle of choosing data for AI learning is "Show, don't Tell". Instead of telling AI what to do step-by-step. The better way is to let it explore and learn from first-hand inputs and outputs. We will see in Part I.5 that this philosophy of maximal learning freedom has been the driving force of AI's success: the less we constrain AI with pre-existing knowledge, the more AI learns from first-hand data, the better they become.


Here's some historical evidence: in developing AIs to generate natural languages (e.g., English), people started with rule-based systems that break down a sentence into part-of-speech (nouns, verbs, adjectives, etc.) and tried to teach machines to rearrange them into gramatical sentences. That did not work well. Then statistical methods, which discarded expert-crafted rules and treated language as nothing more than strings with patterns, made machine translation possible for the first time. Now, as I write in the era of deep learning, assumptions about how strings should be modeled are further simplified. We now take almost the same general approach to model languages as we model numbers, speech, and images. That's the power of assuming minimally in the context of AI technlogy.


So we know that data should be as "raw" as possible. In principle, we should provide no more than the inputs and the outputs for the task of interests, and assume minimally about what should be learned.


"Learning": turn the knobs and see what you get


So far, we know that Model is an architecture of parameters (i.e., numbers arranged by equations); Data is a set of inputs and corresponding desirable outputs; Now we are in position to discuss Learning. Again, let's try to understand it via an intuitive example.


Imagine you, a junior pastry chef, have come to patisserie school in Paris to learn baking. Your first lesson is on using the oven to bake common pastries, specifically, setting the right temperature and duration. As course material, your instructor has provided various types of doughs ready for the oven (e.g., buttered crossiant doughs) and the reference final pastry for each of the dough (i.e., nicely baked crossiants). If you use the oven correctly, you should get something similar to the reference pastry. Your instructor believes in learning by doing and gives you plenty of doughs to experiment with, but he leaves you with no further instructions. The exam is in three days when you will be given a few doughs and expected to bake them by setting the correct oven temperature and duration accordingly. Now, what would you do to pass the exam?


Here's what I would do: I would take a dough, turn the knobs of the oven to set temperature and time, wait, take out the baked pastry and compare it with the reference pastry (perhaps by smelling and tasting, if it's edible at all). If my product is close to the reference, I learned that the temperature and time settings are good and I put them down in my little notebook so I can potentially re-use them; If my product is far off, I learned that the settings are not good but I also take notes so I know what not to do. Either way, I will learn something from the process. I can repeat this many times for each type of pastry until I'm satisfied with my baking skills.


I imagine your solution isn't too far from this. The baking example is remarkably similar to our AI learning scenario: the oven (architecture) and the knobs (parameters) form our model; the doughs (inputs) and the reference pastries (outputs) are our data; our learning algorithm, at the core, only contains four basic steps: bake, taste, note, repeat. And this corresponds to the four fundamental steps in AI algorithms: forward, evaluate, update, and repeat. Let me explain these jargons:

  • Forward (bake): means passing the input through the model to obtain the output under the current parameters. This is like running the oven to bake a pastry.

  • Evaluate (taste): after obtaining the model's output, we evaluate it against the reference using some measurable metrics (e.g., correctness of the output). We conventionally call this quantity "loss" which should be minimized. In our example, it's like tasting our pastry. The further off the taste from the reference, the higher this "loss" value should be. The goal of learning is to minimize this loss.

  • Update (note): at this point, we have some ideas of how good our current models are because we have baked and tasted. It's time to revise our model accordingly. In the baking example, we revise by adding items to our notebook with the idea that we would consult these notes when we next bake. In AI, you would want to update the parameter of your models in someway so that the change can be reflected in the next Forward step.

  • Repeat: it is self-evident that it takes a number of trial-and-errors before we can learn something useful. Additionally, we would want to look at different kinds of pastry because you are not going to bake crossiants all the time but also pain-au-chocolat and Palmier. For AI, this means that the system should perform the Forward-Evaluate-Update cycle on a range of data so that it can cope with most situations of interests.


Realistically, the model would contain millions of parameters (it's a huge oven with uncountable knobs) and so you would ask a computer program to turn the knobs for you in some systematic manner. I will cover these details when we get to Part.II, Technology. But you already have the core idea. AI learning is like learning to bake with an eccentric teacher that leaves no instructions but loads of raw materials for you to try out yourself.


The learning process is also referred to as "training", which is more frequently used as of 2025. The difference is purely positional: when we say "learning", the actioning subject is the AI system as in "AI learns from data"; When we say "training", the actioning subject is us humans as in "We train AI on this data". I would argue that "training" has a misleading implication that we know what we wanted in the first place and simply train AI to do it. Well, we don't. Just think about how much AlphaGo and ChatGPT surprised us in playing Go and in writing. But you can see why humans prefer this terminology: most people want to have a sense of control.


Apply the "Trinity" to the Real-World


We've covered a lot of ground today. Let's consolidate before we move on. The "Trinity" of AI system are:

  • Model: consists of an architecture (equations) and parameters (numbers).

  • Data: consists of input-output pairs for the particular task.

  • Learning: consists of four basic steps "Forward", "Evalaute", "Update", and "Repeat".


This understanding should allow you to start researching real-world AI systems. For example, you can start to understand "GPT-3 has 175 billion parameters, each with 16-bit precision, requiring 350GB of storage since each parameter occupies 2 bytes." It is talking about the Model, how many knobs do this giant "oven" has and how much computer storage is needed; Although we haven't talked about langauge model pretraining, you could guess that in the statement "Sixty percent of the weighted pre-training dataset for GPT-3 comes from a filtered version of Common Crawl", we are talking about the Data and you could follow the Common Crawl link to find out what exactly the input-output pairs look like. The point is, anything you hear about AI ultimately relate to one of the three elements in the AI Trinity, and by now you should know which.


What's left to be learned? Well, plenty. Now you have a broad idea of what role each part plays, but what are the specific technologies and challenges in the AI Trinity? For example, what are the considerations in building the Model and in curating the Data? How exactly are we going to "update" the model during training? To fully address these questions, you would need more than a Engineering degree. And that's not our plan. Instead, I will show you a minimal yet useful example that put the Trinity together in action so you can appreciate for yourself why building AI system can be hard. That's the topic of the next article. See you there!























 
 
 

Comentarios


Stay Tuned.

Get notified of new contents as they come up.

bottom of page