FW: bad rand generator: distribution/prevalences/periodicity

muntoo · Post by **muntoo** » 01 Jul 2011, 04:24

afanofosc wrote:What sort of real-world LEGO NXT device are you going to run with 1 million iterations of random numbers? A rover? A humanoid? A robotic arm? A maze solver? A balancing bot?

Where's your inner geek? Your sense of nerdy pride? The love of an afternoon (or three) spent "wasting" time?

nxtreme · Post by **nxtreme** » 01 Jul 2011, 05:14

muntoo wrote:The love of an afternoon (or three) spent "wasting" time?

Mr. Hansen doesn't have an afternoon or three to waste time

. He spends his free time programming 'till the wee hours of the morning, trying to fix real bugs. He works for no pay, knowing that although he could give up anytime he wanted to, when he's sick and tired of programming so bad that he wants to become a hermit, people like you and me (yes, finally giving NXC a test drive) appreciate all the finger numbing, head scratching, eyeball burning early mornings and late nights that have gone into the most popular NXT programming language there is. You could at least thank him.

Ungrateful **..... *Vzzzt* *Click* Angry client detected, host closing connection.

muntoo · Post by **muntoo** » 01 Jul 2011, 05:29

nxtreme wrote:Mr. Hansen doesn't have an afternoon or three to waste time .

But doc* does.

nxtreme wrote:You could at least thank him.

Already did... Unless you guys want me to write another one?

HaWe · Post by **HaWe** » 01 Jul 2011, 06:25

afanofosc wrote:What sort of real-world LEGO NXT device are you going to run with 1 million iterations of random numbers? A rover? A humanoid? A robotic arm? A maze solver? A balancing bot?
And why do you persist in using a widely known to be bad algorithm for generating random numbers within a small numeric range?
John Hansen

well, can you tell me any application for which you will need a widly known bad random number generator?
Not only rand() % n is too short-period, that's also an issue for Random(n)!.
The point is not that rand/Random sequences are repetitve after 50,000 - a few million or what. The point is: it generates faulty un-random series right from the start, so that it is repetitive too soon later on!

Wikipedia Quotes wrote: One of us recalls producing a â€œrandomâ€ plot with only 11 planes, and being told by his computer centerâ€™s programming consultant that he had misused the random number generator: â€œWe guarantee that each number is random individually, but we donâ€™t guarantee that more than one of them is random.â€ Figure that out.
â€”W. H. Press et al, [3]

In this context I may quote your own post:

afanofosc wrote:In my experience, if people generally recommend not using a particular algorithm because it is a poor algorithm I stop using it. I don't care whether I understand why it is bad or whether it is only bad in certain implementations of C libraries or hardware platforms.

I showed a simple alternative RNG to the fw Lego rand RNG which is rather fast, rather simple, "rather random", and "rather long-period" : the K&R algorithm. It's easy to use by a simple header file, you may have srand seeds, and it even could be easily implemented to the fw.

But there are applications for which also this RNG probably will still be "not sufficiently random enough": e.g., a particle filter, I already mentioned this.
(Yes, John, you're right: If once established, this of course will be part of navigation modules for maze solvers)

BTW: I was not "wasting any time". I just started the RNG and let it run in any a corner of a room of mine, and let it run, and let it run,.... and it's still running.....
And I only wanted to demonstrate statistical data, everyone of course may choose his personal RNG (even if it should be the worst RNG ever, e.g. such as the well-known RANDU *), gathering suspicious results *), everybody just the way he wants.

Press, William H., et al. (1992). Numerical Recipes in Fortran 77: The Art of Scientific Computing (2nd ed.). ISBN 0-521-43064-X. wrote: *) As a result of the wide use of RANDU in the early '70s, many results from that time are seen as suspicious

Donald Knuth wrote: â€¦its very name RANDU is enough to bring dismay into the eyes and stomachs of many computer scientists!(S: see Wikipedia "RANDU", - Quotes)

To my personal opinion and according to the statistical results, Lego rand/Random even beat RANDU by miles...

ps:
Reporting that the randomization results both of Lego rand() and of the NXC function Random (based upon it) are obviously non-random, is meant to be a bug report just as if I had reported "NXT calculates 1+1=3 if the recent number of addition calculations is > 1,000,000". Having non-random random numbers is an issue quite like that.
And no one is supposed to ask as a reply "What sort of real-world LEGO NXT device are you going to run with >1 million addition calculations?"...^^

spillerrec · Post by **spillerrec** » 01 Jul 2011, 16:58

doc-helmut wrote:Reporting that the randomization results both of Lego rand() and of the NXC function Random (based upon it) are obviously non-random, is meant to be a bug report just as if I had reported "NXT calculates 1+1=3 if the recent number of addition calculations is > 1,000,000". Having non-random random numbers is an issue quite like that.

It is impossible to get random numbers without a hardware implementation. A real random function would, when given the same seed, provide two different sequences of numbers. If it doesn't it can't be truly random. Since you say this is possible with your K&R RNG, it isn't random and not a solution. It might be more random than rand(), but it isn't random.

Anyway, instead of doing all these calculations again and again for each new test I would recommend you to save the values from the RNGs on your computer. (Unmodified: no modulus, no casting, just raw values.) The transfer might slow down the process a bit but afterwards you can do the testing on the computer which will be much much faster than anything you can do in NXC... How many runs and how long they should be is something that is entirely up to you. Reusing the same data shouldn't be a mayor issue as long as your samples are large enough, which they should be anyway.

HaWe · Post by **HaWe** » 01 Jul 2011, 18:46

spillerrec,
before making the statitics we didn't have any data about different RNGs at all, especially the Lego rand RNG.
From superficial observations I had the suspect there might be issues, but I couldn't prove it, so I made some tests. Now the data that we have are meaningful enough:
There is no reason to take any numbers based on Lego rand (if you want more than "just any a number") because it's like playing with loaded dice or cubes.

Speed is no issue (at least not for me). So what else do you want? Do you wnt to pull my tests in doubt? Do you want to pull the seeding mechanisms in doubt - or didn't you already understand the details of seeding in K&R and MT RNG or stdlib C rand (and the difference to Lego RNG ) ? Do you want more tests (which?)?

I actually don't understand what your intention is. Is the Lego RNG good enough for you or not? If yes: simply take it.

If not, I presented 2 alternatives:
- if you accept very minor errors: take the K&R LCG implementation, you may seed it by constant or even variable hardware based random seeds (!)
- if you want a high level RNG: take the Mersenne Twister implementation and it will be good enough even for stochastical filters. You also may seed it by constant or even variable hardware based random seeds.

If you should find out that also the Mersenne Twister shouldn't meet your demands, tell us, I'm curious, but at the moment I have no clue for that.
If you already have the ultimate RNG answer to the question about randomize, seeds, the NXT, and the Universe: show us!

spillerrec · Post by **spillerrec** » 01 Jul 2011, 20:34

My point was, you can't call it a bug. You can only call it a bug if rand() must be theoretically completely random and in that case you will need special hardware. So it might not be sufficient for your needs, but you can't require it to be fixed. While you might argue it is not statistically random you cannot expect it to be fixed before you give a real world example (and not specific test programs) which have relevance for several NXC users.

I do not question your tests. I just say that your testing would be must quicker if you processed your data on the computer. A sample on 1,000,000 results will process in max a few seconds on a normal computer while it took 15 minutes on the NXT. (Most likely because of screen redrawing and such...) You could quickly try the difference between "rand() % 16" and "rand() * 16 / RAND_MAX" and "rand() % 128".
I do question your first test where you only looked at a range of 16 while the range of the rand function is many times larger than that. I would expect the range should be at least 1024, if not the full range.

HaWe · Post by **HaWe** » 01 Jul 2011, 21:05

I'm talking about prevalences, distributions, subsequences, and repetitive sequence lengths. Clusters are also an issue (worth while to be tested in future).
I published test results, anyone can draw conclusions from it himself.

But to be honest, I dont want to argue about any opinions of facts or speculations about facts, I want to see facts. And the facts are about all 3 RNGs (Lego, K&R, and MT).
So do the tests you wish and compare K&R and MT with the Lego rand (this thread is about the Lego fw rand RNG, I'm curious how you will do that on a PC) and show what different results of all 3 RNGs you get. You may choose any range you like, as I chose ranges I liked (some people may need randoms of 2, of 3, of 6, of 16, of 49, of 100, or of 1024). Surprise me with what RNG would be the best and which the worst with your test.

edit:
Maybe I didn't explain quite clear enough the conclusions.
Since long, "the difference between "rand() % 16" and "rand() * 16 / RAND_MAX" and "rand() % 128" is no issue any more.
Notice the knock-out tests:
rand() % anything is already knocked out by the distibution test (see standard deviation), actually no more tests about that would have been needed, nevertheless I tested further on to have a comparison.
NXC Random(anything) is finally knocked out by the periodicity test. It proved that although Random(anything) is well distributed, it has the same unaccebtable short periodicity just like the underlying fw rand and so it's also knocked out and already out of limits. As an exaggerated example, look at this series
1 3 5 7 9 8 6 4 2 0 1 3 5 7 9 8 6 4 2 0 1 3 5 7 9 8 6 4 2 0 ...
although all single numbers are perfectly equally distributed, they have NOTHING to do with random numbers. That's exaggeratedly, oversubtle the issue with rand() / Random() .
You may do tests how many and of what kind ever you want, but if they don't falsify the periodicity length test, rand() and Random() will stay knocked out anyway.

But test all 3 (!) RNGs by your own, just as you wish. Until then, good luck. And good night.

spillerrec · Post by **spillerrec** » 01 Jul 2011, 23:31

Try this: Do the test again (just for Random()), but store 20 values. (Don't drop any numbers!) Stop the test like before, that is after the 9th repetitive number, but continue to check how long it actually repeats. Something tells me it will repeat all 20 numbers (but no more than that).

But why is this important? You are not going to try to run a nuclear power plant with your NXT, are you?

Since long, "the difference between "rand() % 16" and "rand() * 16 / RAND_MAX" and "rand() % 128" is no issue any more.

It was just an example...

(this thread is about the Lego fw rand RNG, I'm curious how you will do that on a PC)

I clearly know that... -__-
What I'm meaning is:
Run rand(), Krand() and Mrand() 65535*65535 times on your NXT. (Perhaps you want to split them into three parts because of seeding.) But instead of analyzing the results on your NXT, transfer them to your computer (by BT for example) and store them in a file.
Now, make a fake rand() function on your computer which instead of calculating a random value reads the next value in the file you stored from the NXT. You now run all your tests on the same data.

It does reduce the quality of the tests a little, however if you raise the sample amount it shouldn't be a major issue.
This is just to speed up things and secondly you might be able to use the diehard tests directly instead of trying to recreate something in NXC.

m-goldberg · Post by **m-goldberg** » 02 Jul 2011, 02:42

A quote from John von Neumann (1951) -- Anyone who considers arithmetical methods for producing random digits is, of course, in a state of sin.

Mindboards

FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Re: FW: bad rand generator: distribution/prevalences/periodicity

Who is online