# Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd Edition)

All Stack Overflow 15
This Month Stack Overflow 4

This is where you really need to listen to project managers and MBAs. What you're suggesting is re-implementing parts of the STL and or standard C library. There is an associated cost in terms of time to implement and maintenance burden of doing so, so you shouldn't do it unless you really, genuinely need to, as John points out. The rule is simple: is this calculation you're doing slowing you down (a.k.a. you are bound by the CPU)? If not, don't create your own implementation just for the sake of it.

Now, if you're really interested in fast maths, there are a few places you can start. The gnu multi-precision library implements many algorithms from modern computer arithmetic and semi numerical algorithms that are all about doing maths on arbitrary precision integers and floats insanely fast. The guys who write it optimise in assembly per build platform - it is about as fast as you can get in single core mode. This is the most general case I can think of for optimised maths i.e. that isn't specific to a certain domain.

Bringing my first paragraph and second in with what thkala has said, consider that GMP/MPIR have optimised assembly versions per cpu architecture and OS they support. Really. It's a big job, but it is what makes those libraries so fast on a specific small subset of problems that are programming.

Sometimes domain specific enhancements can be made. This is about understanding the problem in question. For example, when doing finite field arithmetic under rijndael's finite field you can, based on the knowledge that the characteristic polynomial is 2 with 8 terms, assume that your integers are of size `uint8_t` and that addition/subtraction are equivalent to `xor` operations. How does this work? Well basically if you add or subtract two elements of the polynomial, they contain either zero or one. If they're both zero or both one, the result is always zero. If they are different, the result is one. Term by term, that is equivalent to xor across a 8-bit binary string, where each bit represents a term in the polynomial. Multiplication is also relatively efficient. You can bet that rijndael was designed to take advantage of this kind of result.

That's a very specific result. It depends entirely on what you're doing to make things efficient. I can't imagine many STL functions are purely optimised for cpu speed, because amongst other things STL provides: collections via templates, which are about memory, file access which is about storage, exception handling etc. In short, being really fast is a narrow subset of what STL does and what it aims to achieve. Also, you should note that optimisation has different views. For example, if your app is heavy on IO, you are IO bound. Having a massively efficient square root calculation isn't really helpful since "slowness" really means waiting on the disk/OS/your file parsing routine.

In short, you as a developer of an STL library are trying to build an "all round" library for many different use cases.

But, since these things are always interesting, you might well be interested in bit twiddling hacks. I can't remember where I saw that, but I've definitely stolen that link from somebody else on here.

The entire first chapter of Donald Knuth's seminal work Seminumerical Algorithms is taken up with the subject of random number generation. I really don't think an SO answer is going to come close to describing the issues involved. Read the book.

You're best off looking into the volume 2 of the Knuth's series.

For a shorter read, look up the corresponding chapter of the Numerical Recipes.

If you're only interested in a sort of a baseline for a MC simulation--- linear congruential generators are best avoided, Mersenne Twister is good enough in the vast majority of cases.

Don't shuffle, it's unnecessarily expensive.

There's a simple linear algorithm discussed in Jon Bentley's "Programming Pearls" (which Bentley says he learnt from Knuth's "Seminumerical Algorithms"). Use this method instead.

There are some Perl implementations about:

These two snippets implement Algortihm S(3.4.2) and Algortihm R(3.4.2) from Knuth's Art Of programming. The first randomly selects N items from an array of elements, and returns a reference to an array containing the elements. Note that it will not necessarily consider all of the elements in the list.

The second randomly selects N items from a file of indeterminate size and returns an array containing the selected elements. Records in the file are assumed to be per line, and the lines are chomped while reading. This requires only 1 pass through the list. A slight modification can be made to use the snippet in situations where N records would exceed memory limitations, however this requires slightly more than 1 pass (/msg if you need this explained)

A sparse matrix of dimension (N rows)x(M columns) has at most NxM components that can be indexed using the K=[0,N*M) integer set. For any k in K you can retrieve element indices (i,j) thanks to a Euclidean division k = i + j*N (here column major layout).

To randomly sample n elements of K (without repetition), you can use Knuth algorithm "Algorithm S (Selection sampling technique)" 3.4.2, in its book Vol2., seminumerical-Algorithms

In Julia:

``````function random_select(n::Int64,K::Int64)
@assert 0<=n<=K

sample=Vector{Int64}(n)
t=Int64(0)
m=Int64(0)

while m<n
if (K-t)*rand()>=n-m
t+=1
else
m+=1
sample[m]=t
t+=1
end
end
sample
end
``````

The next part simply retrieves the I,J indices to create the sparse matrix from its coordinate form:

``````function create_sparseMatrix(n::Int64,N::Int64,M::Int64)
@assert (0<=N)&&(0<=M)
@assert 0<=n<=N*M

nonZero = random_select(n,N*M)

# column major: k=i+j*N
I = map(k->mod(k,N),nonZero)
J = map(k->div(k,N),nonZero)

sparse(I+1,J+1,ones(n),N,M)
end
``````

Usage example: a 4x5 sparse matrix with 3 nonzero (=1.0) at random positions:

``````julia> create_sparseMatrix(3,4,5)
4×5 SparseMatrixCSC{Float64,Int64} with 3 stored entries:
[4, 1]  =  1.0
[3, 2]  =  1.0
[3, 3]  =  1.0
``````

Border case tests:

``````julia> create_sparseMatrix(0,4,5)
4×5 SparseMatrixCSC{Float64,Int64} with 0 stored entries

julia> create_sparseMatrix(4*5,4,5)
4×5 SparseMatrixCSC{Float64,Int64} with 20 stored entries:
[1, 1]  =  1.0
[2, 1]  =  1.0
[3, 1]  =  1.0
[4, 1]  =  1.0
⋮
[4, 4]  =  1.0
[1, 5]  =  1.0
[2, 5]  =  1.0
[3, 5]  =  1.0
[4, 5]  =  1.0
``````

In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at

The Art of Computer Programming, Volume 2: Seminumerical Algorithms

by Donald E. Knuth. Chapter 3 is all about random numbers.

In theory, a perefectly-random implementation of something like the Fisher-Yates algorithm would yield a completely random shuffle. In practice, howerver, Fisher-Yates is susceptible to things like modulo bias. See some of the pitfalls in relevant section in the Wikipedia entry and How Not To Shuffle The Knuth-Fisher-Yates Algorithm.

Get BigRational from Codeplex. Its part of Microsoft's Base Class Library, so it's a work-in-progress for .Net. Once you have that, then do something like:

``````System.Numerics.BigInteger x = GetDividend() ;
System.Numerics.BigInteger y = GetDivisor() ;

BigRational r     = new BigRational( x , y ) ;
double      value = (double) r ;
``````

Dealing with the inevitable overflow/underflow/loss of precision is, of course, another problem.

Since you can't drop the BigRational library into your code, evidently, the other approach would be to get out the right algorithms book and roll your own...

The easy way, of course, of "rolling one's own" here, since a rational number is represented as the ratio (division) of two integers, is to grab the explicit conversion to double operator from the BigRational class and tweak it to suit. It took me about 15 minutes.

About the only significant modification I made is in how the sign of the result is set when the result is positive or negative zero/infinity. While I was at it, I converted it to a `BigInteger` extension method for you:

``````public static class BigIntExtensions
{

public static double DivideAndReturnDouble( this BigInteger x , BigInteger y )
{
// The Double value type represents a double-precision 64-bit number with
// values ranging from -1.79769313486232e308 to +1.79769313486232e308
// values that do not fit into this range are returned as +/-Infinity
if (SafeCastToDouble(x) && SafeCastToDouble(y))
{
return (Double) x / (Double)  y;
}

// kick it old-school and figure out the sign of the result
bool isNegativeResult = ( ( x.Sign < 0 && y.Sign > 0 ) || ( x.Sign > 0 && y.Sign < 0 ) ) ;

// scale the numerator to preseve the fraction part through the integer division
BigInteger denormalized = (x * s_bnDoublePrecision) / y ;
if ( denormalized.IsZero )
{
return isNegativeResult ? BitConverter.Int64BitsToDouble(unchecked((long)0x8000000000000000)) : 0d; // underflow to -+0
}

Double result   = 0              ;
bool   isDouble = false          ;
int    scale    = DoubleMaxScale ;

while ( scale > 0 )
{
if (!isDouble)
{
if ( SafeCastToDouble(denormalized) )
{
result = (Double) denormalized;
isDouble = true;
}
else
{
denormalized = denormalized / 10 ;
}
}
result = result / 10 ;
scale-- ;
}

if (!isDouble)
{
return isNegativeResult ? Double.NegativeInfinity : Double.PositiveInfinity;
}
else
{
return result;
}

}

private const           int        DoubleMaxScale      = 308 ;
private static readonly BigInteger s_bnDoublePrecision = BigInteger.Pow( 10 , DoubleMaxScale ) ;
private static readonly BigInteger s_bnDoubleMaxValue  = (BigInteger) Double.MaxValue;
private static readonly BigInteger s_bnDoubleMinValue  = (BigInteger) Double.MinValue;

private static bool SafeCastToDouble(BigInteger value)
{
return s_bnDoubleMinValue <= value && value <= s_bnDoubleMaxValue;
}

}
``````