Want to do dynamics? This is how many trajectories you will need.
Two of the most common questions I hear from beginners in the dynamics field is can I do dynamics for such system and how many trajectories do I need. I addressed the first question in this post some time ago. Now, I want to quickly discuss the other.
I won’t tell you anything that you didn’t learn in your basic statistics course. However, it might be useful to look at that theory through specific examples related to dynamics.
Just to recall, the central concept we need is that of a “margin of error.”
The margin of error for a sample proportion (εp) is given as
where p is the sample proportion for a sample size N. Z — given in the Table below — tells the confidence interval.
|Percentage confidence (%)||Z value|
The margin of error for a sample average (εa) is
where s is the sample standard deviation.
Now, suppose you simulated dynamics for your molecule starting in the excited state. You ran 100 trajectories and saw that:
- in 5 trajectories, the molecule dissociated in the excited state;
- in 52 trajectories, it remained bound in the excited state;
- in the other 43 trajectories, the molecule converted to the ground state.
Moreover, you analyzed the hopping time of the 43 trajectories that converted to the ground state. You got that the average hopping time was 150 fs and the standard deviation from the average was 25 fs.
Using Eq. (1), you can compute the margin of error for each sample proportion. For instance, for a 95% confidence interval, the margin of error for trajectories decaying to the ground state is
Thus, you can report that the probability of converting to the ground state is 43%, plus or minus a margin of error of 10%.
You may interpret that as if you repeated the simulations 100 times only changing the initial conditions sampling, in 95 of them, you would obtain results between 33 and 53%.
For the other channels, the values are given in the Table below.
|Channel||Proportion of trajectories (%)||Margin of error (%)|
|Bound excited state||52||10|
You can also get the margin of error for the average time of decay to the ground state. Still for 95% confidence interval, it is calculated with Eq. (2)
Then, you report that the deactivation to the ground state happed at 150 ± 5 fs. And, again, if you repeat the simulation 100 times, the average hopping times should lie between 145 and 155 fs in 95 of them.
Now that we reviewed how the margin of errors are computed, we can finally come to the question of how many trajectories we need to calculate.
The rules-of-thumb for Eqs. (1) and (2) to work are:
- For sample proportions, N×p should be larger than 5.
- For sample averages, N should be larger than 30.
They set the lowest number of trajectories you may run: the second rule tells that this number is 30, and the first rule tells that you will only be able to reasonably analyze sample fractions larger than 17%.
This may be satisfactory if you only want to determine general trends of the major processes. For instance, you run 30 trajectories and get that 15 (50%) return to the ground state. Using Eq. (1), you compute the margin of error and report that the probability of decaying to the ground state is 50 ± 18 %. This is fine. But if you also see that 3 trajectories dissociate in the excited state, you won’t be able to make any reasonable statement about this fraction, because 3 trajectories correspond to 10% and it is below the 17% limit defined by rule 1.
At the end, how many trajectories you need depends on which type of process you’re interested in. If you’re looking at tunneling, which may occur in only 1 out 1000 trajectories (0.1%), you will need more than 5000 trajectories to satisfy rule 1.
Luckily, such rare events are not what we generally worry about in ultrafast photochemistry. There, we usually look at processes occurring in at least 5% of the trajectories. Therefore, 100 trajectories may suffice. At least, as long as you are happy at reporting figures like 5 ± 4 %.
Just to make clear, all these margins of error discussed here are only about sampling. They only tell how reproducible your numbers are if you repeat the same simulation with new initial conditions. They don’t say anything about other sources of errors, like those due to the hopping algorithm or the underlying electronic structure used to compute energies, gradients, and couplings.