The Python Quants

Bitcoin and Blockchain — Opening the Blackbox with Python

Quant Insights Conference London 2016

Dr. Yves J. Hilpisch

team@tpq.io | http://tpq.io

The Python Quants GmbH

About Me

For the curious:

Agenda

  • Some Bitcoin Market Data
  • Hashing Functions
  • Password Cracking
  • Crypto Signing of Messages
  • Idea behind Block Chain
  • Idea behind Bitcoin Mining
  • Bitcoin Mining with Python

Some Bitcoin Data

Total Number of Bitcoins

We work with data from http://quandl.com. First, the total number of bitcoins mined.

In [2]:
import quandl as q
bn = q.get('BCHAIN/TOTBC') / 1e6  # in millions
In [3]:
bn.plot(figsize=(10, 6));

Total Number of Transactions

The total number of transactions.

In [4]:
bt = q.get('BCHAIN/NTRAT') / 1e6  # in millions
In [5]:
bt.plot(figsize=(10, 6));

Bitcoin Value in USD (Exchange Rate)

The USD/Bitcoin exchange rate.

In [6]:
be = q.get('BCHAIN/MKPRU')
In [7]:
be.plot(figsize=(10, 6));

Bitcoin Value in USD (Market Capitalization)

The market capitalization in USD.

In [8]:
bm = q.get('BCHAIN/MKTCP') / 1e9  # in billions
In [9]:
bm.plot(figsize=(10, 6));

Bitcoin Network Hashrate

The hashrate of the Bitcoin mining network. How many giga hashes per second (GH/s) does the bitcoin mining network calculate per second?

In [10]:
bh = q.get('BCHAIN/HRATE')
In [11]:
bh.plot(figsize=(10, 6));

Bitcoin Mining Difficulty

The bitcoin mining difficulty. How hard is it to mine a new bitcoin block?

In [12]:
bd = q.get('BCHAIN/DIFF') / 1e9  # in billions
In [13]:
bd.plot(figsize=(10, 6));

Hashing

From https://en.wikipedia.org/wiki/Hash_function:

A hash function is any function that can be used to map data of arbitrary size to data of fixed size. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

A Simple Hash Function

The first simplistic hash function that we consider maps any string to a three digit integer. It uses ordinal numbers of one-character string objects.

In [14]:
ord('a')
Out[14]:
97
In [15]:
chr(97)
Out[15]:
'a'

The implementation of the simplistic hash function ("average integer ordinal number").

In [16]:
def hash_function(text):
    value = sum([ord(l) for l in text]) / len(text)
    return '%03d' % value

Some examples.

In [17]:
hash_function('!')
Out[17]:
'033'
In [18]:
hash_function('yves')
Out[18]:
'113'
In [19]:
hash_function('yves2')
Out[19]:
'101'
In [20]:
hash_function('Quant4711')
Out[20]:
'080'

Collisions are easily found.

In [21]:
hash_function('yves')
Out[21]:
'113'
In [22]:
hash_function('sevy')
Out[22]:
'113'

Our function has a target space of 10 bits (only).

In [23]:
bin(999)
Out[23]:
'0b1111100111'
In [24]:
2 ** 10
Out[24]:
1024

Modern hash functions have a target space of 128 bits, i.e. $2^{128} - 1$ possible values.

In [25]:
2 ** 128
Out[25]:
340282366920938463463374607431768211456
In [26]:
hex(2 ** 128)
Out[26]:
'0x100000000000000000000000000000000'
In [27]:
len(hex(2 ** 128)) - 3
Out[27]:
32

Or 256 bits.

In [28]:
2 ** 256
Out[28]:
115792089237316195423570985008687907853269984665640564039457584007913129639936
In [29]:
hex(2 ** 256)
Out[29]:
'0x10000000000000000000000000000000000000000000000000000000000000000'
In [30]:
len(hex(2 ** 256)) - 3
Out[30]:
64

Or even 384 bits.

In [31]:
2 ** 384
Out[31]:
39402006196394479212279040100143613805079739270465446667948293404245721771497210611414266254884915640806627990306816
In [32]:
hex(2 ** 384)
Out[32]:
'0x1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
In [33]:
len(hex(2 ** 384)) - 3
Out[33]:
96

The universe is assumed to consist of $10^{80}$ atoms.

In [34]:
10 ** 80
Out[34]:
100000000000000000000000000000000000000000000000000000000000000000000000000000000
In [35]:
hex(10 ** 80)
Out[35]:
'0x35f9dea3e1f6bdfef70cdd17b25efa418ca63a22764cec100000000000000000000'
In [36]:
len(hex(10 ** 80)) - 3  # close to 2 ** 256
Out[36]:
66

We require the following properties from a hash function:

  • collision resistance: it is highly unlikely to find to inputs that yield the same output
  • hiding: it is really difficult to find the exact input if you know an output
  • puzzle friendliness (for mining purposes): if someone targets a certain output value and if parts of the input are randomly chosen then it is difficult to find another input value that hits exactly that target

MD5 Hash Codes

First, importing Python's hashing function library.

In [37]:
import hashlib

The first MD5 hash codes (cf. https://en.wikipedia.org/wiki/MD5).

In [38]:
md5_1 = hashlib.md5(b'yves').hexdigest()
md5_1
Out[38]:
'afe3bd960b4c46a68580c4e564cca24e'
In [39]:
md5_2 = hashlib.md5(b'yves2').hexdigest()
md5_2
Out[39]:
'664d839e06bada38ce04f7208896efdf'
In [40]:
hashlib.md5(b'Dr. Yves Johannes Hilpisch').hexdigest()
Out[40]:
'642394f6d25fd9fe4b88e359c8aaf051'

Brute Force Hash (Password) Cracking

We define first a character set of lower case letters only.

In [41]:
import string
charset = string.ascii_lowercase
charset
Out[41]:
'abcdefghijklmnopqrstuvwxyz'

Hash codes for all single characters in charset.

In [42]:
for c in charset:
    md5 = hashlib.md5(c.encode('ascii'))
    print(c, md5.hexdigest())
a 0cc175b9c0f1b6a831c399e269772661
b 92eb5ffee6ae2fec3ad71c777531578f
c 4a8a08f09d37b73795649038408b5f33
d 8277e0910d750195b448797616e091ad
e e1671797c52e15f763380b45e841ec32
f 8fa14cdd754f91cc6554c9e71929cce7
g b2f5ff47436671b6e533d8dc3614845d
h 2510c39011c5be704182423e3a695e91
i 865c0c0b4ab0e063e5caa3387c1a8741
j 363b122c528f54df4a0446b6bab05515
k 8ce4b16b22b58894aa86c421e8759df3
l 2db95e8e1a9267b7a1188556b2013b33
m 6f8f57715090da2632453988d9a1501b
n 7b8b965ad4bca0e41ab51de7b31363a1
o d95679752134a2d9eb61dbd7b91c4bcc
p 83878c91171338902e0fe0fb97a8c47a
q 7694f4a66316e53c8cdd9d9954bd611d
r 4b43b0aee35624cd95b910189b3dc231
s 03c7c0ace395d80182db07ae2c30f034
t e358efa489f58062f10dd7316b65649e
u 7b774effe4a349c6dd82ad4f4f21d34c
v 9e3669d19b675bd57058fd4664205d2a
w f1290186a5d0b1ceab27f4e77c0c5d68
x 9dd4e461268c8034f5c8564e155c67a6
y 415290769594460e2e485922904f345d
z fbade9e36a3f36d3d676c1b808451dd7

Now doing brute force hash code cracking — 'knowing' that the relevant word has a maximum of 4 characters.

In [43]:
import itertools as it
In [44]:
%%time
for i in range(1, 5):
    print('%d CHARACTERS USED NOW' % i)
    pm = it.product(charset, repeat=i)
    for comb in pm:
        comb = ''.join(comb)
        md5 = hashlib.md5(comb.encode('ascii')).hexdigest()
        if md5 == md5_1:
            print('SUCCESS')
            print(comb, md5)
            break
1 CHARACTERS USED NOW
2 CHARACTERS USED NOW
3 CHARACTERS USED NOW
4 CHARACTERS USED NOW
SUCCESS
yves afe3bd960b4c46a68580c4e564cca24e
CPU times: user 923 ms, sys: 10.9 ms, total: 934 ms
Wall time: 1 s

Let us enlarge the character set to include digits as well.

In [45]:
charset2 = string.ascii_lowercase + string.digits
charset2
Out[45]:
'abcdefghijklmnopqrstuvwxyz0123456789'

Time to crack the second hash code increases due to greater passworword length and larger character set.

In [46]:
from itertools import product
In [47]:
import time
t0 = time.time()
z = 0
for i in range(1, 6):
    print('%d CHARACTERS USED NOW' % i)
    pm = it.product(charset2, repeat=i)
    for comb in pm:
        comb = ''.join(comb)
        md5 = hashlib.md5(comb.encode('ascii')).hexdigest()
        z += 1
        if md5 == md5_2:
            print('SUCCESS')
            print(comb, md5)
            break
sec = time.time() - t0
print('time in sec: %.1f' % sec)
1 CHARACTERS USED NOW
2 CHARACTERS USED NOW
3 CHARACTERS USED NOW
4 CHARACTERS USED NOW
5 CHARACTERS USED NOW
SUCCESS
yves2 664d839e06bada38ce04f7208896efdf
time in sec: 94.0

The algorithm has checked about 43 mn hashes before being successful. This represents a speed of about 450,000 hashes per second.

In [48]:
z
Out[48]:
43024025
In [49]:
z / sec
Out[49]:
457617.32578829414

Using Dedicated Password Cracking Tools ("Better Software")

Dedicated password cracking tools like Hashcat (cf. http://hashcat.net) allow for a much faster and more intelligent/targeted approach.

Let us check how long Hashcat needs to find the password yves2 stored as an MD5 hash. We assume:

  • password length is 5 letters
  • only lower case letters or digits
In [50]:
r1 = '''
664d839e06bada38ce04f7208896efdf:yves2

Session.Name...: hashcat
Status.........: Cracked
Input.Mode.....: Mask (?1?1?1?1?1) [5]
Custom.Chars...: -1 ?l?d, -2 Undefined, -3 Undefined, -4 Undefined
Hash.Target....: 664d839e06bada38ce04f7208896efdf
Hash.Type......: MD5
Time.Started...: Mon Aug 22 19:34:16 2016 (2 secs)
Speed.Dev.#1...: 13710.9 kH/s (0.94ms)
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 28803600/60466176 (47.64%)
Rejected.......: 0/28803600 (0.00%)
Restore.Point..: 799470/1679616 (47.60%)

Started: Mon Aug 22 19:34:16 2016
Stopped: Mon Aug 22 19:34:22 2016

real    0m6.055s
user    0m1.750s
sys     0m1.490s
'''

Make Use of Human Traits ("Social Cracking")

Mask attacks are some of the most powerful tools (strategies) in password cracking. It relies on the fact that human beings like to use certain (easy to remember) structures for their passwords. An interesting analysis is found here: https://www.praetorian.com/blog/statistics-will-crack-your-password-mask-structure. The major finding is:

  • 50% of all passwords analyzed rely on
  • 13 password masks only out of
  • 260,000+ masks analyzed in total

An example: lisa2008 = name of daughter born in 2008

Let us consider the following case. A password is assumed to consist of upper case letters, lower case letters and digits. In this case, we make use of insights about "humanly generated passwords". I.e. we do a so-called mask attack where we implement the following rules:

  • password length is 9 letters
  • first letter is upper case (like A)
  • next four letters are lower case (like bbbb)
  • last four letters are digits (like 1992)

We assume a structure like Abbbb1992.

In [51]:
r2 = '''
8769d3723ec8f853d91f28208e97acce:Quant4711

Session.Name...: hashcat
Status.........: Cracked
Input.Mode.....: Mask (?u?l?l?l?l?d?d?d?d) [9]
Hash.Target....: 8769d3723ec8f853d91f28208e97acce
Hash.Type......: MD5
Time.Started...: Mon Aug 22 17:39:32 2016 (10 mins, 23 secs)
Speed.Dev.#1...: 42944.7 kH/s (13.43ms)
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 27138885360/118813760000 (22.84%)
Rejected.......: 0/27138885360 (0.00%)
Restore.Point..: 1543910/6760000 (22.84%)

Started: Mon Aug 22 17:39:32 2016
Stopped: Mon Aug 22 17:50:03 2016

real    10m30.744s
user    12m24.880s
sys     0m35.770s
'''

Using GPUs and ASICs ("Better Hardware")

Cheapest Nvidia GPU (about 30 EUR net of VAT) reaches a MD5 hashing speed of about 400MH/s.

High-end Nvidia GPU cluster Brutalis reaches a MD5 hashing speed of about 200GH/s (cf. https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a270c40).

Dedicated Application Specific Integrated Circuit (ASIC) chips reach even higher speed at much lower cost. AntMiner S5 achieves a speed of 1,155 GH/s for Bitcoin mining (SHA256). Used at Amazon for about 200 EUR.

Signing Messages with Hashes and RSA Keys

Let us electronically sign a message The Python Quants send to someone else. To this end, we combine hashing with RSA encryption.

In [52]:
m = b'Hello FROM The Python Quants.' * 5

We generate a hash code for the message.

In [53]:
from Cryptodome.Hash import SHA256
In [54]:
h = SHA256.new(m)
In [55]:
hd = h.hexdigest()
hd
Out[55]:
'6d7e450cf23b6f867b03ac30e613c618fb5d91cdd56cdd9f79859dc12e6a4083'

We next sign the message, i.e. we encrypt the hash code with the private key from before.

In [56]:
from Cryptodome.Signature import PKCS1_v1_5
from Cryptodome.PublicKey import RSA
In [57]:
key = RSA.generate(2048)
signer = PKCS1_v1_5.new(key)
In [58]:
signature = signer.sign(h)  # private key
signature
Out[58]:
b'|\x1c\x90\\O?ci\xc9\x9f^DCx\rY\x89\x03f6G\x81.O0\xb3,-\xfb\x86\xa5IB\x04\xf8\xa8\xe7u\xf8mV|pF\x07\xad\x8f\xcee\xd8\xa4X\xc2\xe2\xa4\x8c\x05\xc5\x91x\xba\x15%\xc63\x1b\xa48\xec\x17\xfa9\x05\xcd\x7f\\D\xae2\x8fQ\xcc\xedL\xa1\xe1\xba\xfb\x99j\x84\xf8C|\xa9\xff\x89\xe6[9\xa9q*\x05\x0f\xb7\x92^\x9a\x12\xf3\xc5?8.\xa208\xb3\x8d\xf0Z\xd49\x80\xe5\x08\xf9\xb4\xf3\x7f^\xa1.\\R[\xf33\xa9\x1agU:\xb7hv\xdap\x12m\x86z\xce\xcf\xa5\xaa2m,\xef3`\xd2\xe5\x16\xfb(\x9b2\x87\xd6\x02\xc1\xc50\x9dW\x90"\xf7\x07\x9cEP\xef\xa5\xd9k\x97\xe5\xc6\xb6Q\xb8\x93\'\x17\xb3\xad\xae\xb5\x92G$\x815\xea\xb6\xa4\xcd;\xadT\xbf\xe4\xc7\xa9=4\xd6\x91I\x9a$\x95l\xdc\x08\x93\x01\xf9\x96\x12t\xd6\x8d}\xfb\xc1\xe5\xb1"s~s\x97\xd3\x9f\xbb@(\xcc\xf4\xd8\xc7'

Someone else — i.e. the receiver of our message — who knows our public key can now verify that we have signed the message as follows.

In [59]:
hashcode = SHA256.new(m)
hashcode.hexdigest()
Out[59]:
'6d7e450cf23b6f867b03ac30e613c618fb5d91cdd56cdd9f79859dc12e6a4083'
In [60]:
PKCS1_v1_5.new(key.publickey()).verify(hashcode, signature)  # public key
Out[60]:
True

Basic Idea Behind Block Chains

Let us now illustrate the basic idea behind a block chain based on a very simple example first. Recall the properties of hash functions that we require (collision resistence, hiding, puzzle friendliness). Let us focus on collision resistence and hiding for the moment.

As a starter, it is obviously easy to calculate the hash value of a string ("first block"), for instance, as follows:

In [61]:
import hashlib
In [62]:
b1 = 'Jil, 2004'  # our first dog
h1 = hashlib.md5(b1.encode('ascii')).hexdigest()  # hash for first block
h1
Out[62]:
'db29bc3f87a84f227d3b4bc7b19a3c6a'

It is highly unlikely that another input yields the same output. It is also really difficult ("almost impossible") to find the input given a certain ouput.

A block chain can be used to document events over time (e.g. transactions, new dogs). To this end, we take the hash code from the first block, add the second block information and calculate a new hash value:

In [63]:
b2 = h1 + ', Liz, 2009'
h2 = hashlib.md5(b2.encode('ascii')).hexdigest()
h2
Out[63]:
'db71b4766173bd93b965af8888262b51'

A third block is as easily added:

In [64]:
b3 = h2 + ', Phineas, 2011'
h3 = hashlib.md5(b3.encode('ascii')).hexdigest()
h3
Out[64]:
'7d00c38e077282a822ee91c69fccd547'

Our block chain now is:

In [65]:
print(b1)
Jil, 2004
In [66]:
print(b2)
db29bc3f87a84f227d3b4bc7b19a3c6a, Liz, 2009
In [67]:
print(b3)
db71b4766173bd93b965af8888262b51, Phineas, 2011

There is one major problem with this approach: the block chain is really easy to manipulate since you only need to re-calculate the whole chain. You have all the information needed.

We need add therefore one security measure to avoid manipulation (in theory): signing of the last hash value.

In [68]:
h3
Out[68]:
'7d00c38e077282a822ee91c69fccd547'
In [69]:
# signing
from Cryptodome.Hash import MD5
md5 = MD5.new(b3.encode('ascii'))
signature = PKCS1_v1_5.new(key).sign(md5)  # private key
signature
Out[69]:
b'\x05!\x8c\x12vG\xdf7\x88]0|\xfd\r\x93\xa5\xecD\xaaf\xf4\xbc\xa8*\x95\xbb@\x90\x15%=\xce\x88\xdb R(\x03Ek\xcda\xfb\x06L\x9f\x8a\\\xdft]\x8d{\xe6\xabYv\xbc1-X\x0f\x13\xf2\x19T\xb3\xe2\xdf\x86;\x16P\x0c\xef\x81\x88\x9b\x83Z\xef`r\x9d\xdf\xb5\x84\x8ai\xbe\xb2\x95t\x99\xe4g\xdb\xdb\xac\xd2g\xbd0\xcb\xf8\xdf\x94\x0e\xf9w\\\x9c\xbe\xc8\x007\x90\x94r\x05\x91:bre\x84\xdfL\x96\x99\x84y\x9a&w^p\xc5\xb9!\x8dff\x13c\x010x}b\xb3\x19\x1fQ\x96\xe3\x7f6!\x14:\xa1!\x82J\xeb\xe0\xda8kAr.\xe41\xbauNf9\xf9\xd5\xcf\xba\x16M|\xf7\xee\xdc\xd5\x11\x97\xf7Uh\xe6\xd4$XF\x86\x1f\xad2YQ:Q\x91\xd5\xd4\xc1$\x1a\xe6\xbf\xf8\xb2\x85\xfe\xc7\xdbD\x08}\xd1\xe2\xd2\xe5\x121\xdb\xa2\xf3\xac\x04|a\xa4_\xeeJ\x9c\xc7\x86\x9fb\x1a/ZqKI\xb9\xf2'

If we can make sure that the private key is safe, then the block chain plus the signature for the final hash value are "almost impossible" to manipulate — although all the information is publicly available.

In [70]:
b1, b2, b3, h3, signature
Out[70]:
('Jil, 2004',
 'db29bc3f87a84f227d3b4bc7b19a3c6a, Liz, 2009',
 'db71b4766173bd93b965af8888262b51, Phineas, 2011',
 '7d00c38e077282a822ee91c69fccd547',
 b'\x05!\x8c\x12vG\xdf7\x88]0|\xfd\r\x93\xa5\xecD\xaaf\xf4\xbc\xa8*\x95\xbb@\x90\x15%=\xce\x88\xdb R(\x03Ek\xcda\xfb\x06L\x9f\x8a\\\xdft]\x8d{\xe6\xabYv\xbc1-X\x0f\x13\xf2\x19T\xb3\xe2\xdf\x86;\x16P\x0c\xef\x81\x88\x9b\x83Z\xef`r\x9d\xdf\xb5\x84\x8ai\xbe\xb2\x95t\x99\xe4g\xdb\xdb\xac\xd2g\xbd0\xcb\xf8\xdf\x94\x0e\xf9w\\\x9c\xbe\xc8\x007\x90\x94r\x05\x91:bre\x84\xdfL\x96\x99\x84y\x9a&w^p\xc5\xb9!\x8dff\x13c\x010x}b\xb3\x19\x1fQ\x96\xe3\x7f6!\x14:\xa1!\x82J\xeb\xe0\xda8kAr.\xe41\xbauNf9\xf9\xd5\xcf\xba\x16M|\xf7\xee\xdc\xd5\x11\x97\xf7Uh\xe6\xd4$XF\x86\x1f\xad2YQ:Q\x91\xd5\xd4\xc1$\x1a\xe6\xbf\xf8\xb2\x85\xfe\xc7\xdbD\x08}\xd1\xe2\xd2\xe5\x121\xdb\xa2\xf3\xac\x04|a\xa4_\xeeJ\x9c\xc7\x86\x9fb\x1a/ZqKI\xb9\xf2')

Another idea is to make it hard to construct another block chain with the same inputs (but in different sequence) or other inputs that satisfies a certain property. Let us define that only hash values with five zeros at the end are allowed. To this end, we must allow for an additional input parameter.

In [71]:
%%time
n = 0
while True:
    b = str(n) + ', ' + b1
    md5 = hashlib.md5(b.encode('ascii')).hexdigest()
    if md5[-5:] == '00000':
        print(b + ' --> ' + md5)
        break
    n += 1
1300915, Jil, 2004 --> 34b87ffb902576b7f2fdeea557500000
CPU times: user 3.24 s, sys: 24.1 ms, total: 3.27 s
Wall time: 3.33 s

Someone wanting to manipulate the block chain must put in much more effort in this case than without such a requirement. The difficulty can easily be increased by requiring eg more trailing zeros.

In [72]:
%%time
n = 0
while True:
    b = str(n) + ', ' + b1
    md5 = hashlib.md5(b.encode('ascii')).hexdigest()
    if md5[-6:] == '000000':
        print(b + ' --> ' + md5)
        break
    n += 1
15086119, Jil, 2004 --> ec1ddd68e721e015e44b58adf7000000
CPU times: user 36.4 s, sys: 210 ms, total: 36.6 s
Wall time: 37.1 s

The first security measure "signing" is vulnerable to stealing the private key (especially when multiple versions exist, e.g. due to backups). The second one "targeting" to sheer brute force. A combination of both, of course, works as well — and is probably more secure.

Another approach that adds some security is to use a random, publicly known, fixed initial hash.

In [73]:
import os
h0 = os.urandom(16).hex()
h0
Out[73]:
'd4683662482f68d7351e28a21c19a0fe'
In [74]:
b1 = h0 + ', Jil, 2004'  # our first dog
h1 = hashlib.md5(b1.encode('ascii')).hexdigest()  # hash for first block
h1
Out[74]:
'2654e14c495d442458ca9a7ed04fb0e9'

Basic Idea Behind Mining

Bitcoin mining is based on SHA256 hash codes (cf. https://en.wikipedia.org/wiki/SHA-2)

In [75]:
sha256 = hashlib.sha256('yves'.encode('ascii'))
sha256.hexdigest()
Out[75]:
'9195c7b9bb56d375948ba058cddea1982b49864e12c700de58d69c5c72dbd075'

The idea behind mining is to find a hash code that is 'small enough', i.e. lies below a certain target level (mainly defined by 'leading zeros' in the target hex value).

In [76]:
target = '%064x' % (1000000000 << 200)
target
Out[76]:
'0000003b9aca0000000000000000000000000000000000000000000000000000'

The original hash code for python is not small enough.

In [77]:
target
Out[77]:
'0000003b9aca0000000000000000000000000000000000000000000000000000'
In [78]:
sha256.hexdigest() < target
Out[78]:
False

However, adding (a) certain number(s) to the string, yields a hash code small enough.

In [79]:
sh = hashlib.sha256(b'%dyves' % 23240167).hexdigest()
sh
Out[79]:
'00000003b04fad4b30a527760fea6ee5beec8035ef636316c2bf2577b2789611'
In [80]:
sh < target
Out[80]:
True

The following code simulates a mining procedure.

In [81]:
%%time
i = 0
while True:
    sha256 = hashlib.sha256(b'%d' % i + b'yves')
    if sha256.hexdigest() < target:
        print('SUCCESS')
        print(i, sha256.hexdigest())
        # break
    if i % 2500000 == 0:
        print(i)
    i += 1
    if i > 55000000:
        break
0
2500000
5000000
7500000
10000000
12500000
15000000
17500000
20000000
22500000
SUCCESS
23240167 00000003b04fad4b30a527760fea6ee5beec8035ef636316c2bf2577b2789611
25000000
SUCCESS
27090678 00000007a53f0b5163e1cb7ade64881e4eb3e06f9c102ad72a19c942223ba82b
27500000
30000000
32500000
35000000
37500000
40000000
42500000
45000000
47500000
50000000
SUCCESS
50427211 000000099f6760417f3b2161a9ba9e989c62da1745d949267d5b648edaa21496
52500000
55000000
CPU times: user 2min 8s, sys: 874 ms, total: 2min 9s
Wall time: 2min 13s

The time to find a suitable hash code depends on the input string.

In [82]:
%%time
i = 0
while True:
    sha256 = hashlib.sha256(b'%d' % i + b'yveshilpisch')
    if sha256.hexdigest() < target:
        print('SUCCESS')
        print(i, sha256.hexdigest())
        # break
    if i % 2500000 == 0:
        print(i)
    i += 1
    if i > 55000000:
        break
0
2500000
5000000
7500000
10000000
12500000
15000000
17500000
20000000
22500000
25000000
27500000
30000000
32500000
35000000
37500000
40000000
42500000
45000000
47500000
50000000
52500000
55000000
CPU times: user 2min 9s, sys: 885 ms, total: 2min 10s
Wall time: 2min 13s

Mining a Bitcoin Block

The Mining Procedure

The following example is from http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html and is about a 'real' bitcoin block and how to mine it wih Python.

In what follows we need the struct module (cf. https://docs.python.org/3/library/struct.html).

In [83]:
import struct
import binascii
import hashlib

The basic elements of a bitcoin block.

The elements translated into Python code. Cf. also https://en.bitcoin.it/wiki/Block_hashing_algorithm.

In [84]:
ver = 2
prev_block = b'000000000000000117c80378b8da0e33559b5997f2ad55e2f7d18ec1975b9717'
mrkl_root = b'871714dcbae6c8193a2bb9b2a69fe1c0440399f38d94b3a0f1b447275a29978a'
time_ = 0x53058b35  # 2014-02-20 04:57:25
bits = 0x19015f53  # difficulty
bits
Out[84]:
419520339

Cf. https://bitcoinwisdom.com/bitcoin/difficulty for data about mining difficulty and hash rates. See also https://en.bitcoin.it/wiki/Difficulty.

The following code snippets illustrate the derivation of the target value which is the upper limit for a successful hash code. Cf. https://www.codecademy.com/courses/python-intermediate-en-KE1UJ/0/1 for bitwise operations in Python.

In [85]:
ex = bits >> 24
ex
Out[85]:
25
In [86]:
mant = bits & 0xffffff
mant
Out[86]:
89939
In [87]:
hex(mant)
Out[87]:
'0x15f53'
In [88]:
8 * (ex - 3)
Out[88]:
176

The concrete values for the difficulty/target hash.

In [89]:
target_hexstr = '%064x' % (mant * (1 << (8 * (ex - 3))))
target_hexstr
Out[89]:
'00000000000000015f5300000000000000000000000000000000000000000000'
In [90]:
target_str = binascii.unhexlify(target_hexstr)
target_str  # C struct
Out[90]:
b'\x00\x00\x00\x00\x00\x00\x00\x01_S\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

The nonce is the value which is to be added to the other block elements during the hash code generation. One looks for the nonce that gives a hash code smaller than the target level.

In [91]:
nonce = 850000000
In [92]:
# nonce = 856192328
# Block 286819
# check under https://blockexplorer.com

Finally, the Python code to do the mining activitiy. Basically, the nonce values gets increased by 1 during the look, a new hash code is generated and compared to the target level.

In [93]:
%%time
while nonce < 0x100000000:
    header = (struct.pack("<L", ver)
            + binascii.unhexlify(prev_block)[::-1]
            + binascii.unhexlify(mrkl_root)[::-1]
            + struct.pack("<LLL", time_, bits, nonce))
    hs = hashlib.sha256(hashlib.sha256(header).digest()).digest()
    if nonce % 200000 == 0:
        print(nonce, binascii.hexlify(hs[::-1]))
    if binascii.hexlify(hs[::-1]) < binascii.hexlify(target_str):
        print(nonce, binascii.hexlify(hs[::-1]))
        print('success')
        break
    nonce += 1
850000000 b'2d06d5717ef51ce987ec0f0e4823620f8d9d2d6556174103297a6099900a04c0'
850200000 b'879860b11769268f1e5ca7df9a763a7daf63911d2e25eb599db5bd295eba15ee'
850400000 b'b00da7ec454b36fd7f7f96fd973361e56c8f19d252b95b4ada21d49dc6751001'
850600000 b'ed546943a811deed9a2e37a74fda8a66b60d93f80db8f98d02fac8209c116862'
850800000 b'275944cb52b35cc3993bd286e3861afbc246c7de7ed274da917cc22d05cd2bf5'
851000000 b'2caff003e29dfc2ec7130a20d84118580a48d81627e0a766a6450a14404aa030'
851200000 b'5328e4e15f9143668ee761fcd6703b4bbb491962d18d0198acb036eaddec27d4'
851400000 b'961beb37d43cb3e9ec00b12ad6999e7599f2671dc9bd0529cbb4825e045ef6ad'
851600000 b'58c3abc58a3fcccda18f205cc124bcfce61d244b46bb70161da040e346429b61'
851800000 b'4e0bc5299ef60ee2e542b3db8c17a1fc37b423d7d04a90b8afdca7d637a58879'
852000000 b'47c01a53e024c253601f54332790483091888c463d8b8647c8d6f7e078eafb9b'
852200000 b'53c762141e8c9d615dc50b9f3dbc7923d770e8909b1e9a2d3a07e4bb5bc4cbe1'
852400000 b'1bd5babc0d87efd848b56e09cf7cc46c95923864cc2924aa2a1c5fc71835ec21'
852600000 b'1906382ef5f88e718a0a3f511300278828b9e066f0f17538bb0a004789ce03f1'
852800000 b'b9cd34c3e42796b3acfdbade72f80f58d6d25210dd69bd7e2e07a1b1f11dcb44'
853000000 b'4691fb160b2d339e2a54e95f81b891d46acd7f68754b0bd1f4b6942ca0c804ca'
853200000 b'61988103dd17e23ded0344340f3b0f264cb8d96132730cf48ebc34b06377a75b'
853400000 b'ee63a5132d8126c255f73d48e02eec5c7b3d3fb91c447a74aa13b9c982dbbbf6'
853600000 b'6cac0791a530a9e3b8b53d72cbaa2fb46c14c5e65d843e08ab656444bea9a9ea'
853800000 b'75ebeafea5b28c3121000dcf48e023aa829a2a2644afbe530d2b631ed8906bdf'
854000000 b'b261dbf2a4d3178730c06c7d37b3d3f104b9b0c8266cc0bf8ddcffb88ee3e620'
854200000 b'4e7174dc8891c4cf1e6e97dc62166961a289f88206bdd383ba0c31a027610e4f'
854400000 b'0da8de618a00dd0aa71af0ff3350cd78435f10b160a06dee0578caf5bcab38b9'
854600000 b'20e6560f3b64596d9632005d9464d83a3125eb2faef9e34349fd7409869fef51'
854800000 b'e1f78c86fb5afb69ade79f1ba6a0b2f059fd04c1826b74dc7dd99f1ce1148c39'
855000000 b'c56c40696263177bce80ee786c31f5d488d5bab570efb4abee25acb004088645'
855200000 b'96519548a733cc53746e0b07b62f7c0ebb1389a6c4cb8d30309b1f24d78eff26'
855400000 b'18379a22c13f193c9cdb0d5e6eccc2745bc0c95402162ca9721c0dc286c8d61d'
855600000 b'0233a14315cf2b61fce61b534d530c6087e805731bf52fd8f273a24c0806024c'
855800000 b'b0603f071da6f70c98dd66d59be05bc9da7e5a144a7cd4088eac3fccf85f2579'
856000000 b'634192b589ea881333b5e566873ecbaaf8addc5d7ef4240f8e8c1325c29c4c7a'
856192328 b'0000000000000000e067a478024addfecdc93628978aa52d91fabd4292982a50'
success
CPU times: user 46.9 s, sys: 446 ms, total: 47.4 s
Wall time: 49.7 s

Merkle Hash

The following are the hard-coded transaction hashes for the Bitcoin block under consideration (http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html).

In [94]:
# https://blockexplorer.com/rawblock/0000000000000000e067a478024addfecdc93628978aa52d91fabd4292982a50
txHashes = [
  "00baf6626abc2df808da36a518c69f09b0d2ed0a79421ccfde4f559d2e42128b",
  "91c5e9f288437262f218c60f986e8bc10fb35ab3b9f6de477ff0eb554da89dea",
  "46685c94b82b84fa05b6a0f36de6ff46475520113d5cb8c6fb060e043a0dbc5c",
  "ba7ed2544c78ad793ef5bb0ebe0b1c62e8eb9404691165ffcb08662d1733d7a8",
  "b8dc1b7b7ed847c3595e7b02dbd7372aa221756b718c5f2943c75654faf48589",
  "25074ef168a061fcc8663b4554a31b617683abc33b72d2e2834f9329c93f8214",
  "0fb8e311bffffadc6dc4928d7da9e142951d3ba726c8bde2cf1489b62fb9ebc5",
  "c67c79204e681c8bb453195db8ca7d61d4692f0098514ca198ccfd1b59dbcee3",
  "bd27570a6cbd8ad026bfdb8909fdae9321788f0643dea195f39cd84a60a1901b",
  "41a06e53ffc5108358ddcec05b029763d714ae9f33c5403735e8dee78027fe74",
  "cc2696b44cb07612c316f24c07092956f7d8b6e0d48f758572e0d611d1da6fb9",
  "8fc508772c60ace7bfeb3f5f3a507659285ea6f351ac0474a0a9710c7673d4fd",
  "62fed508c095446d971580099f976428fc069f32e966a40a991953b798b28684",
  "928eadbc39196b95147416eedf6f635dcff818916da65419904df8fde977d5db",
  "b137e685df7c1dffe031fb966a0923bb5d0e56f381e730bc01c6d5244cfe47c1",
  "b92207cee1f9e0bfbd797b05a738fab9de9c799b74f54f6b922f20bd5ec23dd6",
  "29d6f37ada0481375b6903c6480a81f8deaf2dcdba03411ed9e8d3e5684d02dd",
  "48158deb116e4fd0429fbbbae61e8e68cb6d0e0c4465ff9a6a990037f88c489c",
  "be64ea86960864cc0a0236bbb11f232faf5b19ae6e2c85518628f5fae37ec1ca",
  "081363552e9fff7461f1fc6663e1abd0fb2dd1c54931e177479a18c4c26260e8",
  "eb87c25dd2b2537b1ff3dbabc420e422e2a801f1bededa6fa49ef7980feaef70",
  "339e16fcc11deb61ccb548239270af43f5ad34c321416bada4b8d66467b1c697",
  "4ad6417a3a04179482ed2e4b7251c396e38841c6fba8d2ce9543337ab7c93c02",
  "c28a45cded020bf424b400ffc9cb6f2f85601934f18c34a4f78283247192056a",
  "882037cc9e3ee6ddc2d3eba86b7ca163533b5d3cbb16eaa38696bb0a2ea1137e",
  "179bb936305b46bb0a9df330f8701984c725a60e063ad5892fa97461570b5c04",
  "9517c585d1578cb327b7988f38e1a15c663955ea288a2292b40d27f232fbb980",
  "2c7e07d0cf42e5520bcbfe2f5ef63761a9ab9d7ccb00ea346195eae030f3b86f",
  "534f631fc42ae2d309670e01c7a0890e4bfb65bae798522ca14df09c81b09734",
  "104643385619adb848593eb668a8066d1f32650edf35e74b0fc3306cb6719448",
  "87ac990808239c768182a752f4f71cd98558397072883c7e137efb49d22b9231",
  "9b3e2f1c47d59a444e9b6dc725f0ac6baf160d22f3a9d399434e5e65b14eccb0",
  "fbe123066ae5add633a542f151663db4eb5a7053e388faadb40240671ae1b09b",
  "1dd07e92e20b3cb9208af040031f7cfc4efd46cc31ec27be20a1047965a42849",
  "2709bb9ed27353c1fd76b9240cab7576a44de68945e256ad44b2cb8d849a8060",
  "d0174db2c712573432a7869c1508f371f3a1058aeedddc1b53a7e04d7c56c725",
  "b4a16f724cddb8f77ddf3d2146a12c4be13d503885eaba3518a03da005009f62",
  "2aa706d75decbe57745e01d46f9f5d30a08dedaf3288cee14cc4948e3684e1d4",
  "ee49c5f6a5129ccaf2abebbc1d6d07a402a600af6221476b89aafaa683ca95b7",
  "bea1011c77874845e9b4c876ed2ceebd530d428dd4a564ad003d9211d40bb091",
  "f1e88ffc2b1de2aa4827002f06943ce5468735f7433f960bf01e75885b9f832b",
  "19247d017e002fb9143d1a89eb921222a94f8a3d0faaf2e05b0f594989edc4c4",
  "13f714ff62ee7d26b6d69ca980c141ebc54e9f71d2697083fe6c5efc1b02bd0f",
  "0c78cbb8246572f015fbdc53dc9798fa54d1119ec77c1f07ac310bcbcc40dbf8",
  "4bcde0ef92a6d24a2be7be50ac5e5299d776df2e6229ba5d475c2491da94f255",
  "0cfd7d1058502730cf0b2ffa880c78ef534651e06832b5d87c0d7eb84eac5b0c",
  "3a168f794d6e0c614429ad874317cc4cd67a8177214880ff6ea1704d29228c2f",
  "f9a555d817334397b402518d6fd959dc73d981ee7f5fe67969b63974ebbef127",
  "24b52691f66eaed4ce391a473902e309018257c98b9f02aaa33b399c9e6f3168",
  "a37b5e623dc26a180d9e2c9510d06885b014e86e533adb63ec40511e10b55046",
  "9dbaeb485e51d9e25a5621dc46e0bc0aaf51fb26be5acc4e370b96f62c469b80",
  "a6431d3d39f6c38c5df48405090752cab03bfdf5c77cf881b18a946807fba74a",
  "faa77e309f125373acf19855dd496fffe2f74962e545420844557a3adc7ebc11",
  "3523f52543ecfea2f78486dc91550fad0e6467d46d9d9c82ca63b2e0230bfa71",
  "a0583e358e42d77d18d1fd0533ff0a65615fc3b3112061ef92f168a00bf640c1",
  "42ae900888d5e5dde59c8e3d06e13db9e84ef05d27726d4b67fd00c50cd9406a",
  "154940777d3ff78f592ef02790131a59263c36b4958bbc836f9a767ea1a9f178",
  "6a0337de6ac75eecf748306e8ebc5bfe5c811a1481ae50f6956a9e7f26a679f5",
  "c99530c2148e09688d0b88795625943371183bf1f5d56c7446c6ed51ea133589",
  "626421dbe8ad6a0fd0d622d5dd3308a1cdc00b98575a41a91fe01a439e6f40bd",
  "b2f3a559f605a158cc395126c3cf394a7e92a53b7514c75157e1dc43a6c7f93e",
  "dffe06d1bea81f2a01c76786404bb867258f9e68013bf25454097ce935090738",
  "0860159ec7a2a51ce107c182a988c40b4bc2057a734354a1219b6c65e72640ed",
  "a405ff1bb51846b1867acc0b0da17f6f9616e592a0a7ff5ef3297c1ecfd60911",
  "a7d451924263284765f6343bca8a21b79b89ebfe611c7355dd88e0ec1c29e232",
  "41c758d08a4d3fe4d90645711589b832a2cd54dd25bd5b66e463e5d389a53aff",
  "a05c1a93a521fa5dbc1790cfbb808893453a428a65f2c6b2d51249fbb12db309",
  "90997920aa9786e10f513cfdd14e294feee6739cee1ab61b3fb1e3f42e7a915d",
  "99fcb9cb62c20a3135484a70bd3f73983f8f3b7b26266dad34f3993958a7642c",
  "e05f9a668b37e5f78bd3b9d047f29f92b33a87f11dd48390410006f858188b7b",
  "56dbc65895f7992da4a6985e7edba4d1c00879f1b28442c644c8a07658ceab27",
  "5e9004fe262b829563d0804656ba68b1de1690401f08a1915273230d8c902fc0",
  "1ea9ed3717523c5e304b7a7ac8058a87fb4f3fed8c6004769f226c9bb67e79c5",
  "f0f1a4c009b3f1b2729e89898e2f5c0fcdc312edea5df884a9c897cb90e4c566",
  "b5bb4ddf04863e6a60f33cb96c20dac8175d3bae55f335781503143c97a50e43",
  "f14cc97a20c6f627b4b78301352ae35463bc359362589cd178a06c0fa90850b7",
  "628801c8f614015c0fa0ccb2768cccc3e7b9d41ceed06071ce2534d31f7236d6",
  "3be1013c8f8da150e2195408093153b55b08b037fd92db8bb5e803f4c2538aae",
  "c9e1f8777685f54ba65c4e02915fd649ee1edcbf9c77ddf584b943d27efb86c3",
  "4274e92ed3bd02eb101baa5fb8ff7b96236830762d08273749fbb5166db8ab0b",
  "aa84c955bea04c7cee8f5bbbec97d25930fcaca363eed1b8cad37b931556d3e3",
  "d6a29c948677fb1f71aaf16debc3d071a4dd349458eb9e056dce3a000ff853da",
  "ba84bdb3d78367ca365016ac4bff9269576eb010f874c2967af73e0de5638de0",
  "1546c79951e3b541bc64d1957b565b7a2850fc87192c7b374aee6cfc69b9805e",
  "f119227d492ebe27fe9aae321980802454dfa64b2691efbe796c5075d5b07f62",
  "b8cf13d64818b32f96bbb585998b1bc9505f6a94055488e5a71fee9479c6f2a9",
  "1aaf459705b6afef2d7b83e3f181f1af55be0813daf55edce104cc59abc28ed7",
  "61ac185c8f520b5e3134953dc52ff292a40e1e96b088dab259558a9d240ec02f",
  "2da96e3154d7ec2329f787b73cb8a436b92d64cf3cc28e920d073279ea73b5f8",
  "1c4d72ce733b971b9ec4e24f37d733355f6f2ea635cc67ffb3e22748484df446",
  "2a6f89769f3272ac8c7a36a42a57627eca6b260ab2c76d8046a27d44d4034893",
  "f8d11df51a2cc113698ebf39a958fe81179d7d973d2044322771c0fe63f4d7c9",
  "f2287f17a4fa232dca5715c24a92f7112402a8101b9a7b276fb8c8f617376b90",
  "bb5ee510a4fda29cae30c97e7eee80569d3ec3598465f2d7e0674c395e0256e9",
  "647ab8c84365620d60f2523505d14bd230b5e650c96dee48be47770063ee7461",
  "34b06018fcc33ba6ebb01198d785b0629fbdc5d1948f688059158f053093f08b",
  "ff58b258dab0d7f36a2908e6c75229ce308d34806289c912a1a5f39a5aa71f9f",
  "232fc124803668a9f23b1c3bcb1134274303f5c0e1b0e27c9b6c7db59f0e2a4d",
  "27a0797cc5b042ba4c11e72a9555d13a67f00161550b32ede0511718b22dbc2c",
]

Function to generate the (pair-wise) Merkle hash.

In [95]:
# hash pairs of items recursively until a single value is obtained
def merkle(hashList):
    if len(hashList) == 1:
        return hashList[0]
    newHashList = []
    # process pairs; for odd length, the last is skipped
    for i in range(0, len(hashList)-1, 2):
        newHashList.append(hash2(hashList[i], hashList[i+1]))
    if len(hashList) % 2 == 1: # odd, hash last item twice
        newHashList.append(hash2(hashList[-1], hashList[-1]))
    return merkle(newHashList)

def hash2(a, b):
    # reverse inputs before and after hashing
    # due to big-endian / little-endian nonsense
    a1 = binascii.unhexlify(a)[::-1]
    b1 = binascii.unhexlify(b)[::-1]
    h = hashlib.sha256(hashlib.sha256(a1 + b1).digest()).digest()
    return binascii.hexlify(h[::-1])

Finally, the Merkle hash code for the above transaction hashes as found in the block header.

In [96]:
print(merkle(txHashes))
b'871714dcbae6c8193a2bb9b2a69fe1c0440399f38d94b3a0f1b447275a29978a'

The Python Quants

http://tpq.io | @dyjh | team@tpq.io

Python Quant Platform | http://quant-platform.com

Python for Finance | Python for Finance @ O'Reilly

Derivatives Analytics with Python | Derivatives Analytics @ Wiley Finance

Listed Volatility and Variance Derivatives | Listed VV Derivatives @ Wiley Finance