Skip to main content

C Chose the Wrong Origin: Why 1-Based Indexing Is Better Language Design

· 14 min read

The first element is the first element, not the zeroth element.

For decades, we have been told by systems programmers that counting from zero is "closer to the metal," "mathematically purer," and the definitive mark of a "real programmer." Languages that use 1-based indexing—like Julia, R, MATLAB, and Fortran—are often patronized as quirky tools for mathematicians who just "don't understand computer science."

I call bullshit.

0-based indexing is not a fundamental law of computing. It is an implementation leak. The first item in a sequence becomes the "zeroth" element only when you choose to describe it by its displacement from a memory address. While that perspective is perfectly valid for pointer arithmetic, buffers, and low-level boundaries, it is fundamentally the wrong way to name human-facing objects.

What started as a bare-metal address-offset convention in C was wrongfully elevated into a universal philosophy of language design. By forcing humans to adopt this machine-centric view for high-level mathematical modeling, the software industry has condemned generations of developers to a permanent, exhausting cognitive friction.

It is time to tear down this illusion. Let's start from the very beginning: how human beings actually perceive the world.

The Elementary School Reality Check

Before we even touch a keyboard, every single human being learns to count objects starting from 1. This isn't just an arbitrary childhood habit; it is the fundamental reality of discrete mathematics.

Let's look at a classic elementary school math problem—the "Tree-Planting Problem." Imagine a row of trees:

O O O O O O O O
1 2 3 4 5 6 7 8

Question: How many trees are there from the 3rd tree to the 7th tree, inclusive?

Any second-grader will tell you the answer: 7 - 3 + 1 = 5.

The formula for the number of elements in a closed interval [i, j] is j - i + 1. This formula is hardwired into human intuition because it accurately describes the physical reality of discrete objects.

Yet, the 0-based zealots will look at this and complain: "Ugh, that +1 is so ugly and error-prone! We should use half-open intervals [i, j) and 0-based indexing so the length is just j - i!"

This is where the delusion begins. To avoid typing a simple +1—a mathematical truth of closed intervals—0-based languages force you to shift your entire cognitive model of the universe. They demand that the first tree be called the "zeroth tree." They sacrifice the natural naming of physical entities just to make the arithmetic of interval boundaries look marginally cleaner.

Trading a massive, global cognitive distortion for a microscopic local algebraic convenience is not good engineering. It is anti-human.

Objects are Not Offsets

The central distinction that the 0-based camp refuses to acknowledge is this:

1-based indexing names objects. 0-based indexing names displacements.

When humans refer to physical or mathematical entities, we use ordinal positions: the first child, the first employee, the first row of a matrix, the first equation in a dynamic system. Nobody naturally says "the zeroth apple" or "the zeroth physical unit."

Of course, a programming language can redefine reality. A language can dictate anything. But forcing a definition that violently conflicts with human cognition is the hallmark of poor language design.

Consider a simple sequence:

Elements: A B C
Object Index: 1 2 3
Offset: 0 1 2

Both integer rows describe something real, but they answer entirely different questions. The object index answers: "Which object is this?" The offset answers: "How far is this object from the chosen origin?"

A good language design should not pretend they are the same thing. The first element has a displacement of 0 from the beginning. That is a physical truth of memory layout. But it is still the first element. Confusing these two facts is the original sin of 0-based indexing.

The first object is not an offset. It is an object.

The Semantic Theft: Offset is NOT Identity

The root of this madness lies in the C programming language and its pointer arithmetic:

a[i] == *(a + i)

In C, i is not an "index" in the mathematical sense. It is an offset (displacement). It answers the question: "How many memory strides away are we from the base address?" The first element is 0 strides away, hence a[0].

As a low-level machine implementation, this is completely logical. But here is the critical distinction:

Offset is an implementation perspective. Ordinal (1st, 2nd, 3rd) is a user-facing identity.

When you write a mathematical formula like {i=1}{N}xi\sum_\{i=1\}^\{N\} x_i, ii represents the identity of the variable. It answers the question: "Which specific object is this?"

0-based languages commit the ultimate semantic theft: they take a machine-level offset coordinate and violently force it onto the user as an object identity. They force you to run a permanent, non-stop "translation daemon" in the back of your mind. Every time you want the kk-th element, you must subconsciously calculate k - 1.

This is not a "frictionless" programming experience. This is cognitive labor. You are manually doing the translation that the compiler was invented to do for you.

Machines are built to serve humans, not the other way around. If my mathematical model starts at 1, the language should start at 1.

The Dijkstra Fallacy: Counting Gaps vs. Counting Objects

So, how did this semantic theft become a religious dogma? Whenever you attack 0-based indexing, someone will inevitably quote Edsger W. Dijkstra's famous manuscript, Why numbering should start at zero (EWD831).

Dijkstra argued that using half-open intervals 0 <= i < N is the cleanest way to represent sequences, as the length is simply N - 0 = N, and adjacent sequences share a clean boundary.

Let's dismantle this sacred cow. Dijkstra was an absolute genius, but in this specific argument, he committed a massive semantic theft: he conflated objects with boundaries.

Look at a sequence of elements:

Elements: A B C D E

If you are slicing an array, inserting an element, or calculating a prefix sum, you are dealing with boundaries, or gaps:

Boundaries: 0 1 2 3 4 5
| A | B | C | D | E |

Yes, boundaries start at 0. The gap before the first element is gap 0. Prefix lengths start at 0. Displacements start at 0. Dijkstra perfectly optimized the algebra of boundaries.

But a programming array is primarily a collection of elements, not just boundaries. Element A is the 1st object. Element B is the 2nd object.

The fact that an algorithm has a zero state does not mean the data objects inside that algorithm should be zero-indexed.

Dijkstra's fatal flaw was forcing the numbering system of boundaries onto the objects themselves. Just because the boundary before A is 0, it does not mean the object A itself should be named 0. You cannot rename a discrete physical object just because it makes your slicing syntax slightly prettier.

The (0, 0) Grid: Machine Thinking in Disguise

The desire to unify everything under 0-based coordinates leads to the modern nightmare of LeetCode grid problems. When you do that, you get the modern nightmare of LeetCode grid problems: forcing developers to map a 2D physical grid starting from (0,0). A matrix cell is an object, not a displacement vector. Calling the upper-left cell (0,0) is not mathematics; it is storage layout leaking into the problem statement, violently bullying human intuition.

The false unity of 0-based coordinates

Another defense of 0-based indexing is that it creates one unified coordinate system:

element index
boundary index
prefix length
storage offset
slice endpoint

can all be represented with the same integers if indexing starts from 0. This supposedly reduces conversions and therefore reduces mistakes.

This sounds plausible only if one has already accepted the offset worldview. From the perspective of object-centered programming, it is not simplification. It is semantic compression with hidden costs.

But this is not true semantic unity. It is flattening different meanings into one number system.

The price of this flattening is that object identity gets distorted. The first element becomes 0 so that boundaries, offsets, and prefix lengths can look cleaner.

Even in low-level buffer manipulation, this should be understood as an implementation convention, not as a superior human-facing indexing model. It is certainly not a universal language design principle.

0-based indexing does not eliminate off-by-one reasoning. It relocates it. It reduces some boundary errors by creating object-identity errors.

In a 1-based model, we may still write transformations such as:

physical_offset = object_index - 1

That is fine. The point is not to make -1 disappear from the universe. The point is to put the cognitive burden in the right place.

If we are dealing with implementation offsets, let offset conversion happen there. If we are dealing with human-facing object indices, the first object should be 1.

A notation that is locally elegant for slicing but globally hostile to object naming is not good language design.

The Myth of the "Extra Subtraction"

When defenders of 0-based indexing are backed into a corner, they almost always resort to the classic performance argument: "1-based indexing is inefficient. If you start at 1, the compiler has to waste CPU cycles doing *(base + i - 1) for every array access!"

This argument is incredibly persistent, but it is fundamentally flawed. It exposes a profound lack of imagination regarding how programming language compilers and object models actually work.

The critique *(base + i - 1) assumes that the logical origin of an array must be the physical address of its very first element. Yes, if you blindly force a 1-based index into a 0-based origin model, you get a -1 penalty.

But why should we accept the 0-based origin model as absolute truth?

Enter the Virtual Origin Shift.

A modern, intelligently designed 1-based language does not patch a 0-based system; it redefines the origin. Instead of anchoring the array to the first physical element, the compiler can simply set the logical origin (the base pointer) to the position of a one-before-begin sentinel.

Physical memory: [ element_1 ] [ element_2 ] [ element_3 ]
^
physical_base

Logical origin: virtual_base = physical_base - sizeof(element)

Memory access: A[i] = *(virtual_base + i) for i = 1, 2, ...

Look at that. There is no i - 1. There is no performance penalty. There is no extra subtraction at runtime. The memory offset calculation for 1-based indexing (origin + i) is exactly as elegant and direct as it is for 0-based indexing.

C standardizes the "one-past-end" sentinel to make 0-based slicing neat. A 1-based language simply leverages a "one-before-begin" virtual origin. Both models are perfectly coherent at the hardware level. The difference is that the 1-based model hides the offset translation inside the compiler, leaving the human user with a clean, mathematically accurate interface.

0-based indexing is not the "natural consequence of memory." It is the natural consequence of choosing the wrong origin and forcing the human to do the compiler's job.

The Illusion of "Off-by-One" Prevention

Another desperate defense is that 0-based indexing prevents "off-by-one" errors (OBOEs). The claim is that half-open intervals [0, N) magically solve boundary mistakes.

Let's be brutally honest: 0-based indexing does not eliminate off-by-one errors. It merely relocates them.

It reduces some errors when slicing buffers, but it creates a massive, permanent source of errors in object identity and algorithm design. When you are writing a complex Dynamic Programming (DP) algorithm, or implementing a mathematical formula from a research paper, you are inherently dealing with ii-th states and jj-th characters.

In a 0-based language, if state i represents the first i characters, but the ii-th character itself is at string[i-1], you have engineered a cognitive nightmare. The code becomes littered with s[i-1] patches. This isn't just "writing a few more characters"; this is severe semantic misalignment.

When a language design forces you to constantly doubt whether i means "the ii-th object" or "an offset of ii", it breeds paranoia. In 1-based languages, if the math says A{i,j}A_\{i, j\}, the code says A[i, j]. The boundary of the physical entity perfectly aligns with the logical boundary.

Conclusion: Reclaiming Human Intuition

The debate between 0-based and 1-based indexing is not just about syntax. It is about who yields to whom in the software engineering hierarchy.

The 0-based paradigm demands that the human mind bends to accommodate the machine's memory layout. It wraps this compromise in the guise of "elegance" and "professionalism," punishing those who rely on natural human counting.

1-based indexing, when implemented with modern compiler architecture, proves that we do not have to compromise. We can have bare-metal, C-level performance without sacrificing the purity of mathematical modeling.

It is time to stop pretending that 00 is the start of a sequence of objects. Zero is a count. Zero is an offset. Zero is empty.

But the first element? The first element is exactly that: 1.

If we truly want to reduce cognitive load and rediscover the pure joy of programming, it begins with choosing the right origin. It begins by counting from 1.

Epilogue: The Psychology of the 0-Based Programmers

If the mathematical and architectural arguments for 1-based indexing are so robust, why does the mainstream programming community defend 0-based indexing with such religious fervor?

The answer does not lie in computer science. It lies in social psychology. The relentless defense of 0-based indexing is a textbook manifestation of two well-documented psychological phenomena: Cognitive Dissonance and the Sweet Lemon Effect.

1. Cognitive Dissonance: The Sunk Cost of Brain Rewiring

Every Computer Science student and IT professional has gone through the painful, counter-intuitive process of rewiring their natural human cognition to adapt to the machine's 0-based memory layout. It took years of debugging Index Out of Bounds errors and mentally calculating i - 1 to finally make this unnatural habit feel "automatic."

When you present these developers with the fact that 1-based indexing is a superior, zero-friction design for object modeling, it triggers massive Cognitive Dissonance.

To accept that 1-based indexing is better is to accept a devastating truth: The immense mental effort they spent over the last decade adapting to the machine was largely unnecessary. It means admitting that their "hard-earned programming intuition" is not a mark of high intelligence, but merely a scar left by a 1970s hardware compromise.

Human ego inherently rejects this. To protect their self-worth and justify their sunk costs, they instinctively lash out. They must fiercely defend 0-based indexing, not because it is mathematically correct, but to convince themselves that their suffering had meaning.

2. The Sweet Lemon Effect: Rationalizing the Matrix

The second psychological barrier is the brutal reality of the job market.

Deep down, many smart engineers know that 1-based languages (like Julia) offer a far more elegant and mathematically pure experience. But they also know they are trapped. The corporate world, the tech giants, and the global job market are completely dominated by C, Java, Python, and JavaScript. They are chained to the 0-based monolith to pay their bills and advance their careers.

When you cannot have what is good, the mind protects itself by deciding that what you do have is actually exactly what you wanted all along. This is the Sweet Lemon Effect (the inverse of sour grapes).

Instead of facing the depressing reality that they are just digital laborers forced to use suboptimal, legacy tools dictated by corporate momentum, they deceive themselves. They hallucinate non-existent mathematical elegance in 0-based slicing. They convince themselves the "lemon" is sweet. They chant "0-based is closer to the metal" to mask the bitter taste of having no choice in the matter.

The defense of 0-based indexing is rarely about logic anymore. It is an industry-wide coping mechanism.

Whether you are building a high-level mathematical model, a web backend, or a low-level memory buffer, demanding 1-based indexing is not just a syntactic preference. You are rejecting the cognitive labor imposed by a 1970s hardware constraint. You are demanding that the language design enforces a fundamental separation of concerns: the compiler handles the offsets, and the human handles the objects.

And objects start at 1.