Weird behaviour when trying to estimate an unknown variable

I'm trying to estimate an unknown variable (p) with a very high precision. What I have is a large number of ordered values (I call them t-values). Each value has a sequence number (n). Each of those t-values is basically the result of multiplying n with p and then adding a random offset ("noise"). My idea is to simply order the t-values according to their sequence number and then take the mean of all the offsets. It works very well. Here are 10 examples of estimates (true p is 1.0 and the number of t-values is 100000):

1.0000737485173519 0.9999987583319258 1.0000688058361697 1.0002021529901506 0.9999391175701831 1.000012370796987 0.9999891218161053 1.0001566049086157 0.9999818309412788 0.9999594118399372

Close enough for what I want.

But in practice, a certain amount of t-values will also be lost. If I introduce a random loss of t-values the precision goes down dramatically, even if the number of lost t-values is a low as 0.001% - 0.01% and, this is the weird part, even if I compensate by generating more t-values so the number of t-values used in calculating the mean is the same!

Here are 10 examples when about 1% of the values were dropped:

1.0024257205135292 1.0019969333070318 1.0019520792036436 1.001061555944925 0.997728342781954 1.000205614588305 0.9964173869854615 1.0028314864552466 1.0014389330965119 0.9954499027939065

Why is this?

I have made a simulation in Python to demonstrate. To see the difference, first run it as is. Then change drop_probability to 0.01 and run again.

Python:

#!/usr/bin/python3 import random  random.seed(42)  runs = 10 effective_number_of_values = 100000  real_period=1 static_offset=0.5 lambd=0.2  drop_probability=0.00000001 #drop_probability=0.0001 #drop_probability=0.001 #drop_probability=0.01 #drop_probability=0.1 #drop_probability=0.5   for run in range(0, runs):     values = []     dropped_ts = 0      last_was_dropped = False     num_values = 0     n = 1     t = 0     while num_values < effective_number_of_values + 1:          actual_t = t         noise = static_offset + random.expovariate(lambd)         effective_t = actual_t + noise          if drop_probability is not None and \             random.random() <= drop_probability:              values.append((n, effective_t, True))             dropped_ts += 1             last_was_dropped = True         else:             values.append((n, effective_t, False))             if not last_was_dropped:                 num_values += 1             last_was_dropped = False          t += real_period         n += 1      values.sort()      last_n = 0     last_t = 0     last_was_dropped = False     avg_sum = 0     avg_n = 0     for v in values:             n, t, dropped = v              if n > 1:                     if not dropped and not last_was_dropped:                         avg_sum += t - last_t                         avg_n += 1              last_t = t             last_n = n             last_was_dropped = dropped      print(avg_sum / avg_n, "(values used: %d, dropped along the way: %.2f%% (%d))" % (avg_n, (dropped_ts/len(values))*100, dropped_ts))

Replay

Category: python Time: 2016-07-31 Views: 3