I'm trying to estimate an unknown variable (p) with a very high precision. What I have is a large number of ordered values (I call them t-values). Each value has a sequence number (n). Each of those t-values is basically the result of multiplying n with p and then adding a random offset ("noise"). My idea is to simply order the t-values according to their sequence number and then take the mean of all the offsets. It works very well. Here are 10 examples of estimates (true p is 1.0 and the number of t-values is 100000):
1.0000737485173519 0.9999987583319258 1.0000688058361697 1.0002021529901506 0.9999391175701831 1.000012370796987 0.9999891218161053 1.0001566049086157 0.9999818309412788 0.9999594118399372
Close enough for what I want.
But in practice, a certain amount of t-values will also be lost. If I introduce a random loss of t-values the precision goes down dramatically, even if the number of lost t-values is a low as 0.001% - 0.01% and, this is the weird part, even if I compensate by generating more t-values so the number of t-values used in calculating the mean is the same!
Here are 10 examples when about 1% of the values were dropped:
1.0024257205135292 1.0019969333070318 1.0019520792036436 1.001061555944925 0.997728342781954 1.000205614588305 0.9964173869854615 1.0028314864552466 1.0014389330965119 0.9954499027939065
Why is this?
I have made a simulation in Python to demonstrate. To see the difference, first run it as is. Then change drop_probability to 0.01 and run again.
#!/usr/bin/python3 import random random.seed(42) runs = 10 effective_number_of_values = 100000 real_period=1 static_offset=0.5 lambd=0.2 drop_probability=0.00000001 #drop_probability=0.0001 #drop_probability=0.001 #drop_probability=0.01 #drop_probability=0.1 #drop_probability=0.5 for run in range(0, runs): values =  dropped_ts = 0 last_was_dropped = False num_values = 0 n = 1 t = 0 while num_values < effective_number_of_values + 1: actual_t = t noise = static_offset + random.expovariate(lambd) effective_t = actual_t + noise if drop_probability is not None and \ random.random() <= drop_probability: values.append((n, effective_t, True)) dropped_ts += 1 last_was_dropped = True else: values.append((n, effective_t, False)) if not last_was_dropped: num_values += 1 last_was_dropped = False t += real_period n += 1 values.sort() last_n = 0 last_t = 0 last_was_dropped = False avg_sum = 0 avg_n = 0 for v in values: n, t, dropped = v if n > 1: if not dropped and not last_was_dropped: avg_sum += t - last_t avg_n += 1 last_t = t last_n = n last_was_dropped = dropped print(avg_sum / avg_n, "(values used: %d, dropped along the way: %.2f%% (%d))" % (avg_n, (dropped_ts/len(values))*100, dropped_ts))