It's generally known information that methods such as the `array_rand()`

method in PHP are not considered cryptographically secure. I'm trying to understand under what situations generated result may be predictable.

If I know the seed, the values in the array, a single generated value and how many times the method has been called before that value was generated, I know I can easily calculate all subsequently returned values. I can do that by writing my own script that uses the same seed and generates the same number of results, thus putting it in the same state.

Assuming I don't know the seed *or* the number of previously generated values, what are my chances/how low can I get the probability of predicting future values.

To define a more concrete example and make the question less theoretical, lets assume we're using `array_rand`

to generate case-insensitive alphanumeric tokens with a length of 12. This is done by grabbing 12 values from a character array by calling array_rand 12 times thus making the token `[A-Z\d]{12}`

. I know I have one of a thousand consecutively generated tokens, but not which position it was generated in.

Can I predict the next token (assume I've not got the last token generated)? I'm assuming this can't be predicted with 100% accuracy, but what are the chances of brute forcing all possibilities for the next token, and how many would there be?

Assuming I can validate if a token is valid, how much does knowing 2 consecutive tokens (24 values) narrow my chances of predicting the 3rd, etc.

I've seen some research on cracking the state of `rand`

but the articles generally don't deal with constrained/truncated ranges.

P.S. I'm trying to understand the proof/math behind why it's insecure, not looking for suggestions of more secure approaches.

**Replay**

You could build your data based on the more secure `openssl_random_pseudo_bytes()`

from the OpenSSL library instead. That obviously implies a base conversion in order to get the required range but it doesn't rely on a seeded random table as other functions (such as `rand()`

).

Note that PHP's built-in `mt_rand()`

is actually much better than the classic `rand()`

in terms of efficiency and randomness (more normal distribution), but is still seeded.

Both methods, when properly implemented, will require the same effort to brute-force. The main difference is that once the brute force is successful, it may be possible for an attacker to either find more details on your implementation patterns or find the seed which therefore make it easier for the next attack.

In practice, cases where it will actually make a real difference are extremely rare, but there have been cases where it has been abused. The best (and only) example I know of is at a casino slot machine in Montreal where this guy spent weeks trying to find patterns and eventually did because of the seed.

Well if you want to know how to predict a PRNG, google it. Find out which PRNG is used for `array_rand`

and google it, for example "predict mersenne twister" (without quotes) gives me two github links (in the top 3 results) to people who've managed to write a program to predict next outputs based on previous ones.

You ask specifically after predicting the PRNG when the raw output is not given, e.g. when it is used to generate random characters (which have only a limited range, e.g. 0-26 for lowercase letters). This makes it a lot harder but I could imagine it still being done with some guesswork if you don't have the source code (black box test).

In a white box test, where the source code is known, it should be fairly trivial. One might need more outputs to recover the state (if part of the PRNG's output is thrown away), but it should be a very similar process.

I'm not sure there is a mathematical proof, like you ask, that proves all non-CSPRNGs must be predictable. I think they are all predictable except the ones that are made (and perhaps proven) to have certain properties.