Combinatorial number system

Introduction

Combinatorial number system represents a non-negative natural numbers as sum of binomial coefficients, which is a correspondence between natural numbers (taken to include 0) N and k-combinations.

By restricting the number of binomial coefficients, say to r, any non-negative integer m can be represented uniquely as sum of r binomial coefficients. A different value of r would result in a different combinatorial number system.

Definition(Combinadics). $\forall m \geq 0, m \in \mathbb{N}_{\geq 0}$, there exist unique $r, C_r, \cdots, C_i, \cdots, C_1$ such that $C_i \geq 0$, and $C_j>C_i$ for $j>i$ and

Lemma. Using Pascal’s triangle, we can get

Proof. We prove it using the basic property of the Pascal’s triangle. That is every entry is the sum of two entries in the preceding row, one on the top and the other on the top-left of the current entry.

as $\binom{n-r}{0}=1$, we have

Then we can prove the bijection between the natural numbers and the k-combinations.
Exsitence. We prove it using mathematical induction.

  • Base case: $m=0$, we have
  • Inductive hypothesis: Assume that the statement is true for $m$, we have
  • Inductive step: Add 1 to both sides of above equation, we haveAssume the first $j C_i$ ‘s that are consective. That means $C_{l+1}=C_l+1$ for $1 \leq l<j$, we have Then we havewhere $\alpha \ge 1$ means that it’s not consective with the first $j C_i$ ‘s.
    So we have proved the existence of the bijection between the natural numbers and the k-combinations.

Uniqueness. We prove it by contradiction. Assume that there are two different representations of $m$ as sum of binomial coefficients, say

Consider sets $A^{\prime}=A-B$ and $B^{\prime}=B-A$, carrying elements of $A$ and $B$ that are not present in the other set.
Since $A$ and $B$ have equal sums:

As $A$ is not equal to $B$, we have $A^{\prime} \neq \emptyset$ and $B^{\prime} \neq \emptyset$.
Suppose $\binom{C_A}{r}$ and $\binom{C_B}{r}$ be the largest coefficients in $A^{\prime}$ and $B^{\prime}$ respectively.
Since $A^{\prime}$ and $B^{\prime}$ have no common elements, we have $C_A \neq C_B$.
Without loss of generality, assume $C_A > C_B$, then we have $C_B + 1 \leq C_A$

So we have

As $\sum A^\prime = \sum B^\prime$, we have a contradiction. So we have proved the uniqueness of the bijection between the natural numbers and the k-combinations.

Conclusion. We have proved the bijection between the natural numbers and the k-combinations.

Applications

In the CFS signature scheme[3], the combinatorial number system is used to compress the signature. The original signature is a word of length $n=2^{16}$ bits with weight $w=9$.
Since its low hamming weight, we obviouly have an intuition that we can compress it.
We totally have $\binom{2^{16}}{9} \approx 2^{125.5}$ kinds of words, so we can use 126 bits to index the word $z$ instead of storing the whole $z$.
As we proved above, we have $\binom{2^{16}}{9} = \sum_{i=0}^9\binom{2^{16}-1-i}{9-i}$, so we can use the position of the 1’s in $z$ to index $z$.
Let $i_1<\ldots<i_9$ denote the positions of the non-zero bits of $z$. We define the index $I_z$ of $z$ by:

The full CFS signature scheme is as follows:

Signature algorithm

  • hash the document $D$ into $s=h(D)$
  • compute $s_i=h([\cdots s \cdots \mid \cdot i \cdot])$ for $i=0,1,2 \ldots$
  • find $i_0$ the smallest value of $i$ such that $s_i$ is decodable
  • use our trapdoor function to compute $z$ such that $H z^T=s_{i_0}$
  • compute the index $I_z$ of $z$ in the space of words of weight 9
  • use $\left[\cdots I_z \cdots \mid \cdot i_0 \cdot\right]$ as a signature for D

Verification algorithm

  • recover $z$ from its index $I_z$
  • compute $s_1=H z^T$ with the public key $H$
  • compute $s_2=h\left(\left[\cdots h(D) \cdots \mid \cdot i_0 \cdot\right]\right)$ with the public hash function
  • compare $s_1$ and $s_2$ : if they are equal the signature is valid

Discussion

As we can see, there is a bijection between the index $I_z$ and the word $z$, so intuitively it’s a theoretical upper bound.
The easiest way to compress the signature is to use the 9 positions of the 1’s in $z$ to index $z$. Every position is $\log{2^{16}}=16$ bits, so we need $9 \times 16=144$ bits to index $z$.
We can think about why there are redundant $144-126=18$ bits here, and where does the redundancy come from?
I think the redundancy here comes from the possibility that the position of 1 in the back is smaller than the position of 1 in the front, but it does not exist in reality.
This is one of the redundancy situations, maybe there are other situations that we won’t discuss here.

References

[1] http://math0.wvstateu.edu/~baker/cs405/code/Combinadics.html
[2] Siddique A B, Farid S, Tahir M. Proof of bijection for combinatorial number system[J]. arXiv preprint arXiv:1601.05794, 2016.
[3] Courtois N T, Finiasz M, Sendrier N. How to achieve a McEliece-based digital signature scheme[C]//Advances in Cryptology—ASIACRYPT 2001: 7th International Conference on the Theory and Application of Cryptology and Information Security Gold Coast, Australia, December 9–13, 2001 Proceedings 7. Springer Berlin Heidelberg, 2001: 157-174.
[4] Bernstein D J, Buchmann J, Dahmen E. Post-Quantum Cryptography. Mathematics and Statistics Springer-11649[R]. ZDB-2-SMA. Springer, Heidelberg, 2009.