Updated on in #devops — 12 Minute Read

# Definition of Base64

Base64 is a encoding scheme used to represent arbitrary data with US-ASCII-compatible strings. The alphabet of an encoded string consists of 2^6 + 1 = 65 characters, where the first 64 characters represent the actual values and the last one (=) is used for padding when needed. Each of the 2^6 = 64 value characters represents 6 bits of the data.

See below for a listing of the full alphabet:

Character Value (bin) Value (hex) Value (dec)
A 000000 00 0
B 000001 01 1
C 000010 02 2
D 000011 03 3
E 000100 04 4
F 000101 05 5
G 000110 06 6
H 000111 07 7
I 001000 08 8
J 001001 09 9
K 001010 0A 10
L 001011 0B 11
M 001100 0C 12
N 001101 0D 13
O 001110 0E 14
P 001111 0F 15
Q 010000 10 16
R 010001 11 17
S 010010 12 18
T 010011 13 19
U 010100 14 20
V 010101 15 21
W 010110 16 22
X 010111 17 23
Y 011000 18 24
Z 011001 19 25
a 011010 1A 26
b 011011 1B 27
c 011100 1C 28
d 011101 1D 29
e 011110 1E 30
f 011111 1F 31
g 100000 20 32
h 100001 21 33
i 100010 22 34
j 100011 23 35
k 100100 24 36
l 100101 25 37
m 100110 26 38
n 100111 27 39
o 101000 28 40
p 101001 29 41
q 101010 2A 42
r 101011 2B 43
s 101100 2C 44
t 101101 2D 45
u 101110 2E 46
v 101111 2F 47
w 110000 30 48
x 110001 31 49
y 110010 32 50
z 110011 33 51
0 110100 34 52
1 110101 35 53
2 110110 36 54
3 110111 37 55
4 111000 38 56
5 111001 39 57
6 111010 3A 58
7 111011 3B 59
8 111100 3C 60
9 111101 3D 61
+ 111110 3E 62
/ 111111 3F 63

Encoding 24 bits (3 bytes) of data, takes 4 characters in Base64 (4 * 6 bits = 24 bits). If the data is not a multiple of 3 bytes, we have to append zero-bytes (i.e. 00000000) until it is (at most we have to append 2 such bytes). Afterwards, we split the data into 3-byte chunks, to get a sequence of so-called quanta and proceed by encoding each quantum as follows.

Encoding some data with up to 3 bytes looks like this:

Data (# bytes) + zero-bytes Base64 with padding
(0)
00000100 (1) + 0000000000000000 BAAA BA==
0000010000010000 (2) + 00000000 BBAA BBA=
000001000001000001000001 (3) BBBB BBBB

The number of characters at the end of the Base64 string to replace with the padding character (=) can be calculated as follows for a data with n bytes:

$c = n \quad \text{mod} \quad 3 \quad \text{where} \quad n \in \mathbb{N}$

The number of required value bits v and padding bits p are easy to calculate for a given number of data bits n:

$v = \lceil{n/6} \rceil \cdot 6 \quad \text{where} \quad n \in \mathbb{N}$ $p = \lceil{v/24} \rceil \cdot 24 - v$

Note, that 24 is the smallest common multiple of 6 and 8.

## Observations

• Base64 strings are invalid (i.e., cannot be decoded) if they contain any characters outside the alphabet given above.
• Due to the padding, the length (number of characters) in valid Base64 strings is always divisible by 4. Therefore, Base64 strings are also invalid, if this is not the case.
• Base64 is of course fully reversible: decode64(encode64(d)) = d for some arbitrary data d with n >= 0 bytes. Therefore, libraries usually provide an encode and a decode function.