# Membership proofs and De Bruijn indices

by

```{-# OPTIONS --cubical #-}
module 2020-1-19-DeBruijnIndex where

open import Cubical.Data.Bool
open import Cubical.Data.List
open import Cubical.Data.Maybe
open import Cubical.Data.Prod
open import Cubical.Data.Sigma hiding (_≡_; _×_)
open import Cubical.Foundations.Function
open import Cubical.Relation.Nullary

open import Agda.Builtin.Equality
open import Agda.Builtin.Nat
open import Agda.Builtin.String

variable A : Set
```

This post requires basic knowledge on dependent type programming, in the sense that Idris people would prefer. Though I imported cubical libraries, I didn’t mean to use the cubical features they provide. I just need a working list definition and basic operations on an equality type.

# The “contains” relation

In a dependent type setting, we can define propositions as types. Like, the “contains” relation is a type:

```infix 4 _∈_
data _∈_ {A : Set} (a : A) : List A -> Set where
here  : (as : List A) -> a ∈ a ∷ as
there : ∀ {as c} -> (a ∈ as) -> a ∈ c ∷ as
```

We can observe some trivial “contains” facts:

```_ : 1 ∈ 1 ∷ 3 ∷ []
_ = here (3 ∷ [])

_ : 1 ∈ 2 ∷ 1 ∷ []
_ = there (here [])

_ : 1 ∈ 2 ∷ 1 ∷ 3 ∷ []
_ = there (here (3 ∷ []))
```

We can see that `here` is the proof that a list contains its head, and `there` is the proof that if a list contains something, appending anything to the list head will still make the appended list contain the thing. We can generalize `there` to list concatenation:

```theres : ∀ {as bs : List A} {a} -> a ∈ bs -> a ∈ (as ++ bs)
theres {as = []} p = p
theres {as = _ ∷ _} p = there (theres p)

theres′ : ∀ {as bs : List A} {a} -> a ∈ as -> a ∈ (as ++ bs)
theres′ (here as) = here (as ++ _)
theres′ (there p) = there (theres′ p)
```

# Syntax with binding

One straightforward application of this “contains” property is “syntax with binding”, say, a syntax tree definition with binding (lambda abstraction and application) (we’re talking about simple syntax with binding here – no HOAS/PHOAS). We use names (say, strings) to represent references. First, let’s assume that `String` has decidable equality:

```decEqStr : (a b : String) -> Dec (a ≡ b)
decEqStr a b with primStringEquality a b
... | true = yes p where postulate p : _
... | false = no p where postulate p : _
```

The language will be typed, and will have a primitive type `nat` and the function type:

```infixr 7 _=>_
data Ty : Set where
nat  : Ty
_=>_ : Ty -> Ty -> Ty
variable a b c d : Ty
```

And equality on `Type` is decidable:

```argEq : a => b ≡ c => d -> (a ≡ c) × (b ≡ d)
argEq refl = refl , refl

decEqTy : (a b : Ty) -> Dec (a ≡ b)
decEqTy nat nat = yes refl
decEqTy nat (_ => _) = no (λ ())
decEqTy (a => b) nat = no (λ ())
decEqTy (a => b) (c => d) with decEqTy a c
... | no nrefl = no (nrefl ∘ proj₁ ∘ argEq)
... | yes refl with decEqTy b d
... | yes refl = yes refl
... | no nrefl = no (nrefl ∘ proj₂ ∘ argEq)
```

Let’s define `Name` as `String`, and `Ctx` (short for “Context”) as a list of typed bindings (so it’s simply-typed, not dependently-typed):

```Name = String
Ctx = List (Name × Ty)
```

And here’s the syntax tree definition. Terms are required to be well-typed and well-scoped, so it should contain the context the term is typed:

```data Term (Γ : Ctx) : Ty -> Set where
```

`lit` creates `nat` literals and `suc` finds the successor of `nat`s, both should be well-typed under any context:

```  lit : (n : Nat) -> Term Γ nat
suc : Term Γ (nat => nat)
```

Application only happens when the function’s type is respected:

```  app : Term Γ (a => b) -> Term Γ a -> Term Γ b
```

Then let’s look into bindings. A variable reference should contain the name and a proof that the name is in-scope:

```  var : (x : Name) (i : (x , a) ∈ Γ) -> Term Γ a
```

An abstraction extends the context of the body term:

```  lam : (x : Name) (a : Ty) -> Term ((x , a) ∷ Γ) b
-> Term Γ (a => b)
```

We don’t even need to implement context lookup – the information of the binding is already stored in the proof term of “contains”, say, in the arguments of the constructors of `∈`.

# Fact: `∈` proofs are de Bruijn indices

But what actually is a proof of `∈`? The proof of `∈` is structurally isomorphic to a natural number (`here` corresponds to `zero`, `there` corresponds to `suc`), if we omit the arguments of its constructors. The natural number isomorphic to the proof of `∈` itself is essentially the index of the element (in our case it’s the binding in the context) in the list. So what we stored in the variable references are their bindings’ corresponding indices in the context!

In fact, the `lookupVar` function that tries to find a name in the context is essentially finding the index of the name:

```lookupVar : (Γ : Ctx) (x : Name) -> Dec (Σ Ty λ a → (x , a) ∈ Γ)
lookupVar [] x = no λ ()
lookupVar ((y , t) ∷ Γ) x with decEqStr x y
... | yes refl = yes (t , here Γ) -- return `zero`
... | no nrefl with lookupVar Γ x
... | no nex = no λ
{ (s , here .Γ) -> nrefl refl
; (s , there p) -> nex (s , p)
}
... | yes ex = yes (ex .fst , there (ex .snd))
-- ^ return `suc <induction result>`
```

An interesting fact is that it is enough to store only the index of the binding in variable reference terms. The indices we’re using here is also known as De Bruijn Indices.

Note: we’ll need to really lookup the context to obtain the type of a binding if the constructor arguments of `∈`’s proof are omitted, though.

# Remark: terms from different contexts

In our setting, terms are bounded with the context they are formed under. This means that when we want to, say, do some term substitution, (so two terms go together into one) we need to ensure the terms are from the same context.

Example: apply a term `e` onto a lambda term `f` requires `e` and `f` to be formed under the same context (this fact can be told from the parameters of `Term.app`). Imagine `f` is of form `λ x. a` where `a` is a term, we can tell from the parameters of `Term.lam` that `a` is formed under a context extended from the context where `f` and `e` are formed. Therefore we cannot simply substitute `x := e` into `a`, because `a` and `e` are from different contexts. This implies that we need to append `x` into the context of `e` before substitution, say, plus one to all the indices inside `e` (we refer to this operation as “shifting” or “weakening”).

In fact, we can extend the context with any arbitrary contexts:

```extendCtx : ∀ {Γ Θ a} -> Term Γ a -> Term (Γ ++ Θ) a
extendCtx (lit n) = lit n
extendCtx suc = suc
extendCtx (app f e) = app (extendCtx f) (extendCtx e)
extendCtx (var x i) = var x (theres′ i)
extendCtx (lam x a p) = lam x a (extendCtx p)
```

Note that this `extendCtx` doesn’t touch the name of the variables. The variable references still refer to correct binding even there’s name shadowing.

There’s a common mistake for those who try to implement syntax with binding via de Bruijn indices, say, forgetting to shift the indices of the substituted term. Now with dependent types, we gain a thorough understanding of why should we shift the indices before substitution. The same thing also happens for “taking a binding from the context” when dealing with let bindings or typed bindings under dependent types – the bindings themselves are from a different (though weaker, because they’re from a outer scope with fewer variables) context, so we need to shift them before using them under the inner contexts.

I have jumped into the hole of “forgetting to shift de Bruijn indices” when developing Voile and Narc. That’s also the motivation for writing this post. I hope this will help dependent type implementors.

Tweet this Top 