Changes since previous version
Overview
This version has undergone fundamental changes that were required to overcome the limitations on the duration of simulated results. The central problem with the previous version was that all results were fully kept in memory as numpy arrays. To solve this, the architecture has been changed to chunked processing, were only a stretch of time is simulated and then stored immediately to file before processing the next chunk. This required also a major refactoring, as the previous code structure was very monolithic and centered around direct manipulation of arrays, with little abstraction. In the new version, there are several subpackages responsible for reading of input files (orbits, glitches, GW response, frequency plan), signal processing (filtering, interpolation), and noise generation. Those are low-level tool applied to single chunks, and do not incorporate any chunked processing. Another newly developed subpackage named “streams” is responsible for the chunked processing. This package is not specific to the simulator and might also be useful for other data processing tasks involving long time series.
Multithreading
The new simulator can optionally use multithreading to speed up computations. The potential for speedup by using more cores is limited, however, by the granularity and interdependence of the work units making up the simulation. Your mileage may vary. See Basic usage for details.
This option uses the dask library, which is a new dependency. Dask is not used at all for the chunked processing without multithreading and might become an optional dependency in the future.
Whether to use multithreading is not specified when creating an Instrument instance anymore (the concurrent parameter for the old multithreading approach is gone) but when exporting to file or memory.
Note
Before the streams package, the first prototypes for chunked processing relied entirely on dask arrays. It turned out that the heuristic dask task scheduler was unsuitable for our use case and still lead to memory exhaustion for long simulations. Now, dask is only used on a lower level for multithreading.
Interface changes
The user interface was modified as little as possible, but some changes were unavoidable, as listed below. Also, this version only models LISA, the Hexagon simulator is not part of it any more.
Workflow
The simulate method is not required anymore. It is still there to ease transition, but does not do anything. There is a method export_hdf5 to compute results and store them directly to file. The old write method interface is deprecated and just calls the new interface. For working with simulations that do fit in memory, there is now a new step involved. The method export_numpy creates an object holding all results as attributes, in a similar fashion as the Instrument class itself did in previous versions. The exported results also have well defined metadata, while the previous versions made no clear distinction between user-facing metadata attributes and internally used attributes. See Basic usage for details on the new interface.
Margins and validity
Previous versions produced data that was invalid close to the start time, due to the use of boundary conditions. This required users to manually cut off those margins in postprocessing. This is no longer needed. The data produced by the simulator is valid on the entire duration.
The downside is that all input files now need to cover a slightly larger time interval than the simulation. Setting the simulation start time to the first time sample covered in the orbit files will no longer work. When using t0=”orbits”, the required margin is now added automatically to the simulation start time, see also New parameters
New parameters
The instrument class has aquired even more paramters
delay_isc_min (float): minimum interspacecraft delay [s] that can occur in the simulation
delay_isc_max (float): maximum interspacecraft delay [s] that can occur in the simulation
delay_clock_max (float): maximum absolute clock delays [s] that can occur in the simulation
chunk_gw_files (bool): whether to interpolate gw files in chunked fashion or globally
noises_f_min_hz (float|None): cutoff frequency used for noises or None for inverse simulation length
clock_f_min_hz (float): cutoff frequency used for clock noise
orbit_t0_margin (float): Margin to be added to simulation start time t0 for case t0=”orbits”
The parameters delay_isc_min/max are needed because delay operators in chunked processing need to set up internal margins between chunks, and need to know the required size in advance. For realistic scenarios, the default values are fine. In case the orbital separation is significantly increased, they should be adadpted proportionally. Similarly, the delay_clock_max provides a limit for the shift between the different time frames the code needs to interpolate to. There should be no need to change the default value for normal scenarios.
The chunk_gw_files parameter only exists for testing. Setting it to False will set up a global interpolation spline for the full GW data as was done in older versions. This might incur a high memory footprint for long GW data.
The noises_f_min_hz and clock_f_min_hz parameters affect the low-frequency behavior of noise models, see section Noise models below.
The orbit_t0_margin is only used in conjunction with t0=”orbits” and denotes the start time of the simulation relative to the earliest time covered in the orbit file. The default should only need to be changed when increasing the orbital separation or the sample period dt.
Parameter restrictions
The parameter validity checks have generally been improved, such that many trivial mistakes are now caught, such as specifying the number of samples as float value instead integer. There are also more significant changes to some parameters. Any parameter that allowed passing a time series now doesn’t. The reason is that time series cannot be passed as arrays anymore in the chunked processing context. I detail, the following new restrictions apply
gws does not allow passing GW response samples as arrays
fplan does not allow passing the frequency plan as arrays anymore
ranging_bias does not allow passing the bias as arrays anymore
aafilter no longer allows to specify custom filter functions. For Kaiser windows, zero attenuation is forbidden.
interpolation no longer allows to specify custom interpolation functions
The concurrent parameter is gone, see Multithreading.
Warning
When passing a dictionary with function as tdir_modulations parameter to the Instrument class and export to file, be aware that the stored metadata is not sufficient to fully reproduce the simulation. This was also the case in old versions but is now considered a design flaw. The handling of this parameter might change in the future.
Metadata changes
In the old version, there was no clear distinction to internally used Instrument class attributes, and the metadata included in HDF5 files was converted to data types valid as HDF5 attributes, e.g. converting dictionaries to string. Now, there is a list of official metadata items and a class SimMetaData collecting them all. When exporting to HDF5, this metadata is stored as JSON string and can be fully recovered. Some of the items that used to be stored as attributes were problematic because they could contain time series or functions. Such data types are now forbidden in metadata and replaced by placeholders such as None. This affects the following items:
fplan Now only contains value for constant freq plans, else None
tdir_modulations This is now always a pretty useless string, never a function
ranging_biases Now always dictionary of floats, as specifying custom time series is no longer allowed.
The frequency plan time series are now available as ordinary datasets in the set of extended debug quantities.
There is a new metadata item aafilter_group_delay which provides the group delay introduced by the antialiasing filter used internally in the simulator. The delay is currently given by
where \(N\) is the number of FIR filter coefficients and \(\Delta t_\mathrm{phys}\) the sample period before the downsampling, given by the physics_dt metadata item. This reflects the current choice for the definition of filter alignment/time stamping. In the future, the simulator might for example switch to centered FIR filters with zero group delay. When correcting for group delay, this is the value to use.
Notation and naming scheme
To comply with current conventions, a couple of acronyms has been replaced in internal variable names, user-facing method names, attribute names, and most importantly the dataset names in the result files.
isi is now sci
rfi is now ref
Time standard
The time definition in the simulator is just an implicit definition, the actual code is agnostic to the time origin. This implicit definition has changed, as was also the case for the LISAOrbits and GWResponse packages. All time values are now defined with respect to the LISA_EPOCH_TCB defined in the Lisaconstants package.
Input file formats
During the refactoring of the various file readers, a few old file formats versions have not been re-implemented:
Glitch files: versions older than V1.4 cannot be read anymore
GW files: versions older than V2.0 cannot be read anymore
GW and orbit file formats have recently been updated from version 2 to 3. The orbit file format 3.0 does not differ from the previous one except that times are defined to be with respect to lisa epoch. GW format 3.0 supposedly has only minor changes not relevant to the simulator, except that the time definition is also changed as for orbits.
The simulator accepts format version 3 and uses the times in the files without change, as the time definition is the same as the one for the simulator.
GW and orbit formats v2.* are also still accepted, and treated as version 3.
Warning
When reading orbit or GW files in format version 2, the times are not converted to the new definition based on LISA epoch. Since the reference time is just an implicit convention, using the simulator on old files works fine but one has to be aware that the result is then with respect to the same time definition. This might lead to confusion. Using GW and orbit files with old and new version is not meaningful even though the code will not prevent it.
Noise models
Many of the noise models have a low-frequency “cutoff” parameter that was hardwired to the inverse simulation duration in old versions. This might be problematic for very long simulations in several regards. First, several noise model PSDs have low frequency divergences that have no physical motivation and might lead to unrealistic results if the cutoff is set very low (it should be noted, however, that the cutoff for most models does not make the PSD flat but only reduces the order of the divergences). Second, one should expect that two simulation with different duration but otherwise identical parameters yield identical results in the overlapping time period. Tying a physical noise model parameter to the duration violates that reasonable expectation.
The new noises_f_min_hz parameter takes the place of the previously hardwired setting. Note that as a side effect, very short simulations may now take longer to complete. The reason is that most noise types are generated by filtering white noise, using a burn-in period with length that depends on the cutoff. Thus the burn-in size used to be very short for short simulations, but now it is fixed. To restore the old behavior, set noises_f_min_hz to the inverse duration.
The new clock_f_min_hz parameter is a similar parameter governing the low-frequency behavior of clock noise. It already existed in the noise model but was set to a hardcoded (fixed) value by the simulator before, which is now the default value for the new parameter.
Warning
The testmass noise with “lowfreq-relax” shape should not be used in the long simulations that are now possible. This is not caused by changes to the simulator but due to a low-frequency divergence that was always present in the noise model.