====== TransientX - Lessons learned ====== [[https://github.com/ypmen/TransientX | TransientX github]] ====== The general concept of TransientX ====== TransientX is a powerful tool to search for single pulses. It analyses the data in several steps during the search that also includes RFI mitigation and clustering of candidates. The following marks my knowledge as of 2024-07-01. The steps of ''transientx_fil'' are as follows - Skewness/kurtosis filtering - Downsampling - Normalization - Baseline removal - RFI mitigation (''-z'' options) - De-dispersion - Matched filtering (search for single pulse candidates) - Clustering of candidates - Plotting ===== General parameters ===== * Block size: This should be chosen such that one block corresponds to of order 10,000 time samples. This can, in principal, be optimized for the properties of the CPU being used. * Overlap: RFI mitigation and DM-time plot generation are applied only on the data that are within a single block. Only during dedispersion is the block size padded by the duration of the DM sweep. If the burst lands on or near a block size boundary, it could be missed. Therefore, a small amount of overlap is warranted. ===== Skewness/kurtosis filtering ===== This filter calculates the skewness and kurtosis statistics for each frequency channel within a time block and is applied to the full time/frequency resolution data (i.e. before any downsampling time/freq is applied). The IQR algorithm is used to avoid the statistics being biased by strong outlier. This filtering method is always on and controlled by the ''-zapthre'' option. There is no flag to turn it off directly. However, a high threshold basically disables it. ===== Downsampling ===== Next, the filterbank is downsampled in time and frequency according to the values given by ''--td'' and ''--fd''. Be aware that the filterbank can be downsampled in addition to the values given in the ''ddplan''. Hence, the file might be downsampled multiple times! Also, RFI algorithms that take time and frequency downsampling factors are applied in addition to the global time and frequency. ===== Normalization ===== ===== Baseline removal ===== The baseline option removes the baseline on a channel-by-channel basis using a running median filter. This is controlled by the ''--baseline'' flag. It takes two options, the first one should be left at 0, and the second one gives the width of the median filter (in s). Note, the filter width should be **wider** then the characteristic duration of the astrophysical signals being searched; otherwise, it may subtract out the signal of interest. This step is similar to the ''zdot'' option and removes the zero DM RFI of each frequency channel, where the contribution is weighted individually for each channel. However, it is smoothing the curve that is to be subtracted from the corresponding frequency channel by a factor. This is controlled by the ''--baseline'' option. It takes two options, the first one should be left at 0, and the second one gives the smoothing (in s). It should be used when the astrophysical signal is expected to have a significant dispersion delay. Otherwise, this might remove real signals as well. ===== RFI mitigation ===== TransientX has several options to mitigate RFI: * ''zdot'' This is an advanced zeroDM filter that removes a weighted zeroDM from each frequency channel. It is being used when the astrophysical signal is expected to have a significant dispersion delay. Otherwise, this might remove real signals as well. **This is effectively removing wideband RFI.** * ''KadaneF tdRFI fdRFI'' filters chunks of the data in frequency (downsampled with fdRFI (frequency) and tdRFI (time)). If they exceed the threshold (given by''--threKadaneF'') they are removed. **This is useful against "wide" (not a few time bins) signals, that cover only a small number of frequency channels.** * ''mask'' Can filter out **strong and short outliers, i.e. a few time bins/frequency channels.** The threshold is given by ''--threMask''. * ''KadaneT tdRFI fdRFI'' as ''KadaneF'' but in time. **Probably not so useful given ''zdot''.** * ''zero'' classical zero DM filtering. * ''zap fl fh'' Removes frequency channels in the given frequency range given from ''fl'' to ''fh''. Must be specified in MHz. ===== De-dispersion ===== The data is de-dispersed using subband de-dispersion. The size of the subbands is controlled by the loss of S/N controlled by ''--snrloss''. The trials are controlled by ''dms'' (Start DM), ''ndm''0 (Number of DMs) and ''ddm'' (DM step size). The buffer size is the user-specified block size plus the dispersion sweep of the largest trial DM. ===== Matched filtering ===== The De-dispersed time series is searched for single pulse candidates using matched filtering. The searched widths (''--minw'', ''--maxw'') and S/N threshold (''thre'') can be specified. ===== Clustering of candidates ===== To avoid seeing the same candidates at several DMs and adjacent time bins, transientX clusters them based on the DBSCAN algorithm. This algorithm searches for other candidates in a radius around a candidate in DM (difference in dispersive delay) and time. If the specified number of candidates is found, they are collected as a core point, i.e. summarized into the one with the highest S/N. The radius is controlled with ''-r'' and should be large enough so that the delay from the DM step size fits comfortably in the radius. If the code is running sluggishly, it is mostly likely that the clustering step has a large number of candidates to group together. Check the parameters you are using in the search. In particular avoid searching with widths that correspond to a larger number of time samples; it is better to downsample in time in these cases. ===== Plotting ===== Finally, the plots of the candidates are created. ====== replot_fil ====== Next to ''transientx_fil'', ''replot_fil'' is the second important tool when searching single pulses. The purpose of replot_fil is to do a finer search for the TOA, DM, and the width of the pulse candidates. If a candidate is RFI, the change is large and then the candidate is dropped. ===== Debugging ===== If you want to know how a candidate that is filtered out by ''replot_fil'' "looks like", you can run the software with the "no clean" option and it will return all of the ''TransientX'' candidates as they are reprocessed by ''replot_fil''.