Algorithms: When is Random Really Random?

Tuesday, February 21, 2012

Alan Woodward

01ceb9281b3fb3dbb90c3efbe327717e

A surprising amount of what we do in computer security relies upon the use of random numbers and yet not many of us actually take the time to think about how these numbers are being generated. 

We blithely assume that our computer, when required, can generate a truly random number. 

But wait! How can a machine that works in a deterministic manner generate something that is truly random.  The simple answer is that it can't.  The best we can do is to generate a number that is pseudorandom.

The fact that we rely upon pseudorandom numbers is a potential problem for IT security. After all, if a machine is using a known algorithm to generate a number that your system then treats as random, what is to stop an attacker from calculating that same number if he knows your algorithm.

It is a fundamental truth of any strong computer security that you must assume that an attacker knows the algorithms that you have used. So, perhaps one should spend a little more time understanding how pseudorandom numbers are generated in order that they are not relied upon inappropriately.

The measure of random in a number is known as entropy.  Not the entropy as physicists use it, but entropy as cryptographers use it.  Entropy is, in essence, how uncertain you are about a number.  This means that the entropy is not necessarily related to the number of bits in a number but instead to the number of possible values that number could have taken. 

Imagine being able to discard the number of bits in a number about which you were certain.  The number of bits that remain are the number of bits of entropy. Suppose, for example, a 100 bit number could take on two possible values, then you only need one bit to differentiate those two values. 

Hence, such a number would be described as having 1bit of entropy.  Obviously this is an extreme case, and I have not complicated the issue by factoring in the different types of probability distribution. 
imageMost mathematicians tend to define the entropy of a number X as:

  H(X) ≔ − ∑P(X=x) log₂P(X=x)

where the summation is from 0 to a number x, and P(X=x) is the probability of X taking the value x. 

So, how do you generate a number that has to greatest possible entropy. There has been much research on using activities such as mouse movements or keystrokes to generate a number that can be considered random.

But so far it appears that the activities picked to generate the numbers have proved to be not quite as random as first thought: some typists are remarkably accurate and consistent!  Other work has focused on incorporating some form of physical device into a computer that can generate truly random numbers based upon provably chaotic behaviour. 

This has two drawbacks: generating large random numbers can take an unacceptably long time, and the device might fail rendering the numbers predictable, without you knowing.

imageWhich brings us back to the generation of pseudorandom numbers.  Most accept that this is the best method of generating numbers with sufficient entropy, whilst making the mechanism efficient and accessible enough for widespread use in computing. 

There are many algorithms in the literature but I would recommend an approach such as described in Fortuna which was an improvement by Ferguson, Schneier and Koho on a previous method called Yarrow.  Fortuna, like most approaches, relies upon a seed  which is a relatively small but truly random number. 

But, it allows for the use of modern encryption techniques such as AES which mean that although the algorithm is known it is highly unlikely that the resulting pseudorandom number sequence can be reproduced. 

What Fortuna shows is that rather than rely upon standard libraries on their own, you construct a routine that, yes, uses standard libraries, but recognises the potential attacks against those and seeks to minimise the threat by incorporating elements such as AES and SHA thereby reserving some secret information to the local machine, decreasing the chances that a third party could reproduce your number(s).


imageSo, in short, the best way to use random numbers is by generating pseudrandom numbers but in such a way that "unreliable" sources of randomness are replaced by effectively creating your own store of entropy upon which you can draw and keeping the means by which that store is generated secret by virtue of a secret key.  

For a superb, free book that goes into great depth about random number generation, see Luc Devroye's book here (originally published with Springer-Verlag, New York, 1986).

Follow me on Twitter!

Cross-posted from Professor Alan Woodward

Possibly Related Articles:
9284
Network->General
Information Security
Attacks Network Security Cryptography hackers Information Security Infosec Algorithms Random Number Generator IT Security Alan Woodward Entropy
Post Rating I Like this!
The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.