Have you ever wondered how your Bitcoin wallet seed words (mnemonic words) guard access to your wallet funds and what makes such setup secure ? In this article we’ll dive into the basics of BIP-39 which describe what seed words are and how we can use them to backup our wallet keys in a recoverable way. Let’s go!
In the early days of Bitcoin there were no mnemonic words or easy ways to backup your wallet keys. The default wallet implementation would randomly create private keys which were stored inside a wallet file and you were responsible for backing up that file frequently (by default every 100 transactions). It wasn’t ideal from a user experience point of view to say the least. People would often times forget to backup their wallet file or they did back it up but then the backup would get lost because it was stored on electronic devices which would fail sooner or later. This resulted in a lots of Bitcoins being lost forever.
With time Bitcoin developers came up with better ways to backup private keys. One of the ideas proposed by Pieter Wuille in 2012 in BIP-32 and later widely adopted was called “Hierarchical Deterministic Wallets”. It described a way to generate an unlimited number of private and public keys in a deterministic fashion such that given the same seed (a random list of bytes of certain length) the same list of keys would be generated. This solved the problem of having to backup a list of private keys every 100 transactions. Having a backup of a single seed was enough to take care of the backup of all the private/public keys one would ever need.
Another idea that further simplified the backup of the wallet seed was proposed in BIP-39 by Marek Palatinus, Pavol Rusnak, Aaron Voisine and Sean Bowe. BIP-39 described a method to encode a random list of bytes (a seed) as an easy to remember/write down list of words. Compared to raw binary or hexadecimal representations of the seed (which still required electronic devices to store it) having a human-readable representation enabled much better handling of the seed by humans. From this point forward the seed could be written on paper or spoken over telephone and this opened up new, physical ways of backing up the seed (multiple paper copies in different locations, durable copies on metal plates with extra protection from fire/flood etc.).
In this article we’ll dive into the step-by-step process of transforming a random list of bytes (entropy) into a mnemonic sequence of words according to BIP-39 specification.
Step 1 – Entropy
First we need a good source of randomness. We can flip a coin or roll a dice. If we use a computer (or a hardware wallet) it has a built in random number generator which can act as a source of randomness. To keep things simple we’re going to flip a coin. BIP-32 specifies the entropy length to be between 128 and 256 bits and a multiple of 32 bits. Each coin flip is 1 bit of entropy. We want to have a 24-word seed so let’s toss the coin 256 times and write heads as “0” and tails as “1”.
00110010100001010111110100001011111111111010000010010000010010101101
00010101111001001011000100111100011110001001111011110111011010010100
11001100111011100110001011101101001010110101001111010010011010111111
0001100101011001000110100010000110110001100101110001
The following table describes the relation between the initial entropy length (ENT), the checksum length (CS) and the length of the generated mnemonic sentence (MS) in words. So if we wanted to have 12-word seed we’d generate a 128-bit entropy.
CS = ENT / 32 MS = (ENT + CS) / 11 | ENT | CS | ENT+CS | MS | +-------+----+--------+------+ | 128 | 4 | 132 | 12 | | 160 | 5 | 165 | 15 | | 192 | 6 | 198 | 18 | | 224 | 7 | 231 | 21 | | 256 | 8 | 264 | 24 |
Step 2 – Split entropy into groups
Next we split the entropy binary into groups and end up with 23 groups each 11-bit long and 24th group having just 3 leftover bits:
00110010100 00101011111 01000010111 11111111010 00001001000
00100101011 01000101011 11001001011 00010011110 00111100010
01111011110 11101101001 01001100110 01110111001 10001011101
10100101011 01010011110 10010011010 11111100011 00101011001
00011010001 00001101100 01100101110 001
Step 3 – Encode
Each group (except for the last 24th group which only has 3 bits) contains a 11-bit number (0-2047 in decimal) and this number describes an index into a BIP-39 wordlist.
The first binary number is 00110010100. This binary number converted to decimal is 404. We can convert the binary sequence above into a decimal sequence (you can use a calculator, a web tool or do it by hand on paper if you have time).
404 351 535 2042 72
299 555 1611 158 482
990 1897 614 953 1117
1323 670 1178 2019 345
209 108 814
The words in the wordlist are 0-indexed meaning you start counting from 0. The number 404 corresponds to the word “crater”. Converting the full list of decimal numbers to words gives us:
crater cloud drill young animal
century earth siren because detail
knock unfold error jaguar merry
pistol fatigue nation wise clinic
boss assault grape
Step 4 – Checksum (24th word)
The last step is to calculate a checksum. The purpose of a checksum is to quickly verify if the list of words is correct (valid) or not. It can detect errors like using a wrong word, missing a word or having it in the wrong position.
To calculate the checksum we take all the 256 entropy bits we started with in Step 1 and calculate a SHA256 digest from them.
$ echo 0011001010000101011111010000101111111111101000001001000001001010110100010101111001001011000100111100011110001001111011110111011010010100110011001110111001100010111011010010101101010011110100100110101111110001100101011001000110100010000110110001100101110001 | shasum -a 256 -0
f3f06d74b794b20645460aa0b17d4e7a77eaaea283ee55344adbfcece4a63432
NB: shasum calculates the SHA digest from the input. Option -a 256 means using the 256 algorithm, -0 means reading the input in BITS mode where each ASCII '0' is interpreted as 0-bit and ASCII '1' is interpreted as 1-bit.
This is a number in hexadecimal format. We need the first 8 leftmost bits (1 byte) from this hash. We can use this online hex to binary converter:
f3 (hex) -> 1111 0011 (binary)
Next we add these bits to the 3 leftover bits from 24th group from Step 2 and end up with:
00111110011
This is the last word (499 – dinosaur)
The full 24-word list is now:
crater cloud drill young animal
century earth siren because detail
knock unfold error jaguar merry
pistol fatigue nation wise clinic
boss assault grape dinosaur
We can verify that this is indeed a correct BIP-39 seed using this excellent BIP-39 tool created by Ian Coleman.
Step 5 – The seed
The last step of BIP-39 is creating the actual binary seed which is then used as a master key in BIP-32 deterministic wallet or using other methods. We’re not going to dive into the details of what this step involves but only quote from the BIP-39 spec:
To create a binary seed from the mnemonic, we use the PBKDF2 function with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string “mnemonic” + passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 bits (= 64 bytes).
You can read more about the PBKDF2 function in the context of cracking the passphrase here.
BONUS:
If you followed the steps above now you should be able to create and verify the correctness of BIP-39 seeds yourself (with minimal assistance from tools like binary to hex to decimal converters).
In the video below you can watch the seed stamping process where I punch all the 24 words onto 4mm thick stainless steel Coldbit Steel plate using a 1.5kg hammer and a A-Z letter stamping set:
You should be able to check if the last word of the seed in the video above is correct or not. You have to assume the first 256 bits of entropy are correct and calculate the missing 8 bits. The first person who comments on this post with a correct answer (the last word) and a little bit of description on how they did it can receive 1x Coldbit Steel + 1x Coldbit Passphrase + a Stamping Set + 1.5kg (3 lbs) hammer for free and create a long lasting, corrosion and fire resistant backup of their BIP-39 seed.
Is the last word ‘envelope’? The last word has 3 bits of entropy, 8 bits of checksum. Assuming 256 bits entropy are correct, just strip off 8 bits of checksum and recompute it.
To do this I used the bitwasp/bitcoin php library (disclosure: I wrote it) and called Bip39::mnemonicToEntropy to validate. when the mnemonic was rejected, I printed out the bitstring, converted to a byte string, and called Bip39Mnemonic::entropyToMnemonic.
Is the word “act”?
I took the index of the words, converted the number to binary, added three zeros, calculated the sha256 digest, took the first part of the hexadecimal to convert it to binary, added three zeros to the 8 digits of the binary and got 00000010011 which translates to 19 in decimal. The word in index 19 of bip39 wordlist is act. I added act to the first 23 words of the mnemonic and checked the Ian Coleman site which said it was a valid mnemonic.
Close but no cigar. Please note that I mentioned: “You have to assume the first 256 bits of entropy are correct and calculate the missing 8 bits.” You assumed only 253 bits of entropy were correct and zeroed the remaining 3 bits. You ended up with a valid mnemonic but not the one I had in mind.
In your example the SHA digest of your string is:
f3f06d74b794b20645460aa0b17d4e7a77eaaea283ee55344adbfcece4a63432
But when I input the same string in any online SHA256 converter I always obtain:
44F6DAFA3D7A1720B5EBBF2ADC1663DF4DAB03776EED48D2CDA775237A547E59
Could you explain why the different result?
In your example the SHA digest of your string is:
f3f06d74b794b20645460aa0b17d4e7a77eaaea283ee55344adbfcece4a63432
But when I input the same string in any online SHA256 converter I always obtain:
44F6DAFA3D7A1720B5EBBF2ADC1663DF4DAB03776EED48D2CDA775237A547E59
Could you explain why the different result?
Online SHA256 converters assume the input is a string but in this case it’s a binary (0 and 1) number.
That’s what the
shasum -a 256 -0
command line does. The-0
option means “treat the input as a binary number”.HTH
I try to use perl, and the results still like this
shasum -b -a 256 C:\Users\User\Pictures\testseed.txt
\44f6dafa3d7a1720b5ebbf2adc1663df4dab03776eed48d2cda775237a547e59 *C:\\Users\\User\\Pictures\\testseed.txt
Can you give any detail how to get f3xxxx results as yours?
Thomas – “envelope” is the right word. Congratulations! Please contact me.
I have code that does every single step of entropy to pk and addy, forwards and backwards
if ever Thomas doesn’t contact you, please contact me with the word solution ‘envelope’ and i’d be more than willing to share all the details required to win the prize.
make more puzzles too !
James – The idea is to treat the string “00110010…” not as a string but as a binary number. That’s what the “shasum -0” option does. It reads in BITS mode where ASCII ‘0’ is interpreted as 0-bit and ASCII ‘1’ is interpreted as 1-bit.
where i can doing this?
[…] a recent post about BIP-39 I described how mnemonic sentences in the context of Bitcoin work and what makes them secure. 128 […]
Thanks for writing this! Really interesting.
I use another approach to generate the entropy; Picking the words from paper raffle tickets. This makes the randomness a bit more transparent for non-programmers. My open source project seedpicker.net has all the details, in particular the Last Word Calculator (http://seedpicker.net/calculator/last-word.html)
security Tip
dont stamp the word in the plate!
stamp the word no. inside the plate!
1. more easy for stamp
2. for example you can add every word no. +10
word no. 404 you stamp 414
if some people found the plate he can not recover the seed.
This is securyty by obscurity. It’s not good if used alone.
Look for Doubleslow keystretcher on Github to see a better solution.
Example: generate 12-24 BIP39 mnemonic seed, write it down on your paper or steel, then use that as a “salt” for the Doubleslow keystretcher. Remember the passphrase (and the key stretching settings, or write them down somewhere). Use the output from Doubleslow keystretcher as your BIP39 seed for your wallet.
You can add another iteration of Doubledlow with different passphrase. For more obscurity.
“hansi” has left a very dangerous tip, that falls back to relying on human memory. DO NOT trust that you will be able to remember that you’ve manually added +10 to everything.
Great article. In all other articles about this subject (24th or 12th word), people are just pumping their own scripts to find the last word. Sometimes, you need to understand why/how it is done and this article does it very well.