2024-03-01
1.58 bit Algebra

This week I’ve looked at the LLM optimization of using 1.58 bit algebra to speed up the training and transformation.

The idea is to operate with ternaries: ${-1,0,1}$. To store such a number in a naive way, we will require two bits, however, by using modulos, more data can be stored within a block of bits.

Since $n$ bits can store $2^n$ states, storing $m$ ternaries will require $3^m$ states. If $2^n=3^m$ then $n=m \frac{ln(3)}{ln(2)}$. Also, $\frac{ln(3)}{ln(2)} \approx 1.58496250072116 \approx 1.58$ so for each ternary we will need 1.58 bits.

Here is a table to determine how many tri-state values can be packed in differnet bit blocks:

Bits 3-state size bits per data
2 1 2.00
3 1 3.00
4 2 2.00
5 3 1.67
6 3 2.00
7 4 1.75
8 5 1.60
16 10 1.60
32 20 1.60
64 40 1.60
128 80 1.60
256 161 1.59
512 323 1.59
1024 646 1.59
2048 1292 1.59
4096 2584 1.59

Note that all of 8-, 16- or 32-bit approaches of storage will deliver a $1.60$ bits per ternary, which is very close to the ideal ratio of $1.58$. A very small improvement will start happening with 512 bits, however doing multicplication, division and modulos on 512-bit values to pack and unpack the values is not fun and definitively not computationally optimal.

The next issue that we need to pay attention is the numerical stability of the system.

In the ternary space we define the two operators, addition and multiplication, as following:

a b a+b a*b
-1 -1 -1 1
-1 0 -1 0
-1 1 0 -1
0 -1 -1 0
0 0 0 0
0 1 1 0
1 -1 0 -1
1 0 1 0
1 1 1 1

Given this operations, we can evaluate:

\[1+ 1 - 1 - 1 = ((1+1) - 1 ) - 1 = (1 -1) - 1 = -1\]

or

\[1+ 1 - 1 - 1 = 1 + (1 + (- 1 - 1)) = 1 + (1 - 1) = 1\]

This is caused by the numerical instability introduced by $1+1=1$ and $-1 - 1 = -1$

The more meaningful way of implementing order of operation would be similar to voting: determine how many $-1$ and how many $1$ are and establish the result based on who has majority.

$1+ 1 - 1 - 1 = 0$ Because there are two $1$ s and two $-1$

The lack of associativity of the unary addition operator requires a special treatment of addition. In a neural network,sum of products is a common operation:

\[R = b + \sum_{1}^{n} A_i * w_i\]

Here $R$ is the result, $b$ is the bias, $n$ is the size of the previous layer, $A$ is the output of the previous layer and $w$ are the weights of the current neuron.

$n$ is generally a large number, much larger than 40, 20 or 5, which are the number of ternaries that can be stored in 64, 32 or 8. For this reason, the implementation of dot product ($\sum_{1}^{n} A_i * w_i $) for a fixed size must return the number winning ternaries in the sum. $1+1+1$ will return $3$ while $-1-1$ will return $-2$. Once the whole sum has been evaluated, the value will be transformed back in a ternary.

Read More
 2018-08-04
Vaadin 10 minimal project

Since a few days, I’ve started exploring Vaadin 10, the new and shiny version of Vaadin.

Vaadin 10 comes with a new library of controls, called “flow”. The old code remains functional, as long as the package of the control classes are renamed.

Surprisingly, the old approach of starting a new project with an Apache Maven archetype is not available anymore. Probably because of marketing reasons (that’s my personal speculation), getting a new project in Vaadin now requires to sign up on their web site and use the on-line wizard. The startup project is a fully fledged application, composed of two tabs. The new application has CSS, Polymer templates, HTML. The good news is that it compiles and runs from the first attempt, which is something to expect, since it has no database or other external dependencies. The bad news is the increased complexity of the “bare bones” application. How complex would it be when such a project will reach production capabilities?

Read More
 2015-11-26
Set Match

Tennis racket Most of the record linkage techniques operate on two tables and define set of association rules. Is this the limit of what can be done? Well, my software goes beyond independent tables, exploring the relational dimension of data. This post is about matching sets of records associated to one data record.

Read More
 2015-11-04
Evaluate record linkages

Today’s topic is how to evaluate the correctness of a linkage. Even with the few options described in the previous posts (such as field equality and Jaro-Winkler dissimilarity), a number of models were created, each with its own specific accuracy. Without a specialized interface, the choice of the parameters alpha, beta, the true/false threshold and the decision between boolean and confidence based adjustment of the test are tough calls.

Read More
 2015-10-31
Using Jaro-Winkler to enhance record linkage

The previous post has exclusively relied on the exact match of name. Out of the ten names, only two of them enjoyed the perfect equality. Due to a coincidence, one of the two names was associated with a wrong record. It seems that perfect equality does not yield the necessary accuracy; on the other hand, one can speculate that measuring the “degree of similarity” between the entries will yield some better results. One such measure is the “Jaro-Winkler” distance from [1]. This measure returns a value between zero and one, with one for the equal strings and zero for the totally different strings. The measure factors in the similarity of the letters of a string, their sequentiality as well as the position of similar characters inside the string.

Read More
 2015-10-25
Introduction in Record Linkage

Welcome to my record linkage blog!

Record Linkage is a sum of techniques to associate records of two or multiple databases without sharing common keys.

ACME Corporation stores customer data provided by the sales and support departments. Since there is/was no global concept of “Customer ID”, both departments use their own internal IDs, while collecting data like Customer Name, Customer Address, Customer Company and the list of registered products. Since the data of sales is disconnected of support data, the decision support system cannot identify the classes of customers that require the less support, even if these costumers are preferred to high maintenance customers.

Read More
 2015-11-26
Set Match

Tennis racket Most of the record linkage techniques operate on two tables and define set of association rules. Is this the limit of what can be done? Well, my software goes beyond independent tables, exploring the relational dimension of data. This post is about matching sets of records associated to one data record.

Read More
 2015-11-04
Evaluate record linkages

Today’s topic is how to evaluate the correctness of a linkage. Even with the few options described in the previous posts (such as field equality and Jaro-Winkler dissimilarity), a number of models were created, each with its own specific accuracy. Without a specialized interface, the choice of the parameters alpha, beta, the true/false threshold and the decision between boolean and confidence based adjustment of the test are tough calls.

Read More
 2015-10-31
Using Jaro-Winkler to enhance record linkage

The previous post has exclusively relied on the exact match of name. Out of the ten names, only two of them enjoyed the perfect equality. Due to a coincidence, one of the two names was associated with a wrong record. It seems that perfect equality does not yield the necessary accuracy; on the other hand, one can speculate that measuring the “degree of similarity” between the entries will yield some better results. One such measure is the “Jaro-Winkler” distance from [1]. This measure returns a value between zero and one, with one for the equal strings and zero for the totally different strings. The measure factors in the similarity of the letters of a string, their sequentiality as well as the position of similar characters inside the string.

Read More
 2015-10-25
Introduction in Record Linkage

Welcome to my record linkage blog!

Record Linkage is a sum of techniques to associate records of two or multiple databases without sharing common keys.

ACME Corporation stores customer data provided by the sales and support departments. Since there is/was no global concept of “Customer ID”, both departments use their own internal IDs, while collecting data like Customer Name, Customer Address, Customer Company and the list of registered products. Since the data of sales is disconnected of support data, the decision support system cannot identify the classes of customers that require the less support, even if these costumers are preferred to high maintenance customers.

Read More
 2018-08-04
Vaadin 10 minimal project

Since a few days, I’ve started exploring Vaadin 10, the new and shiny version of Vaadin.

Vaadin 10 comes with a new library of controls, called “flow”. The old code remains functional, as long as the package of the control classes are renamed.

Surprisingly, the old approach of starting a new project with an Apache Maven archetype is not available anymore. Probably because of marketing reasons (that’s my personal speculation), getting a new project in Vaadin now requires to sign up on their web site and use the on-line wizard. The startup project is a fully fledged application, composed of two tabs. The new application has CSS, Polymer templates, HTML. The good news is that it compiles and runs from the first attempt, which is something to expect, since it has no database or other external dependencies. The bad news is the increased complexity of the “bare bones” application. How complex would it be when such a project will reach production capabilities?

Read More
 2024-03-01
1.58 bit Algebra

This week I’ve looked at the LLM optimization of using 1.58 bit algebra to speed up the training and transformation.

The idea is to operate with ternaries: ${-1,0,1}$. To store such a number in a naive way, we will require two bits, however, by using modulos, more data can be stored within a block of bits.

Since $n$ bits can store $2^n$ states, storing $m$ ternaries will require $3^m$ states. If $2^n=3^m$ then $n=m \frac{ln(3)}{ln(2)}$. Also, $\frac{ln(3)}{ln(2)} \approx 1.58496250072116 \approx 1.58$ so for each ternary we will need 1.58 bits.

Here is a table to determine how many tri-state values can be packed in differnet bit blocks:

Bits 3-state size bits per data
2 1 2.00
3 1 3.00
4 2 2.00
5 3 1.67
6 3 2.00
7 4 1.75
8 5 1.60
16 10 1.60
32 20 1.60
64 40 1.60
128 80 1.60
256 161 1.59
512 323 1.59
1024 646 1.59
2048 1292 1.59
4096 2584 1.59

Note that all of 8-, 16- or 32-bit approaches of storage will deliver a $1.60$ bits per ternary, which is very close to the ideal ratio of $1.58$. A very small improvement will start happening with 512 bits, however doing multicplication, division and modulos on 512-bit values to pack and unpack the values is not fun and definitively not computationally optimal.

The next issue that we need to pay attention is the numerical stability of the system.

In the ternary space we define the two operators, addition and multiplication, as following:

a b a+b a*b
-1 -1 -1 1
-1 0 -1 0
-1 1 0 -1
0 -1 -1 0
0 0 0 0
0 1 1 0
1 -1 0 -1
1 0 1 0
1 1 1 1

Given this operations, we can evaluate:

\[1+ 1 - 1 - 1 = ((1+1) - 1 ) - 1 = (1 -1) - 1 = -1\]

or

\[1+ 1 - 1 - 1 = 1 + (1 + (- 1 - 1)) = 1 + (1 - 1) = 1\]

This is caused by the numerical instability introduced by $1+1=1$ and $-1 - 1 = -1$

The more meaningful way of implementing order of operation would be similar to voting: determine how many $-1$ and how many $1$ are and establish the result based on who has majority.

$1+ 1 - 1 - 1 = 0$ Because there are two $1$ s and two $-1$

The lack of associativity of the unary addition operator requires a special treatment of addition. In a neural network,sum of products is a common operation:

\[R = b + \sum_{1}^{n} A_i * w_i\]

Here $R$ is the result, $b$ is the bias, $n$ is the size of the previous layer, $A$ is the output of the previous layer and $w$ are the weights of the current neuron.

$n$ is generally a large number, much larger than 40, 20 or 5, which are the number of ternaries that can be stored in 64, 32 or 8. For this reason, the implementation of dot product ($\sum_{1}^{n} A_i * w_i $) for a fixed size must return the number winning ternaries in the sum. $1+1+1$ will return $3$ while $-1-1$ will return $-2$. Once the whole sum has been evaluated, the value will be transformed back in a ternary.

Read More