Object Detection with Swin Vision Transformers from Raw ADC Radar Signals
| dc.contributor.author | Giroux, James | |
| dc.contributor.supervisor | Bouchard, Martin | |
| dc.contributor.supervisor | Laganière, Robert | |
| dc.date.accessioned | 2023-08-15T12:31:53Z | |
| dc.date.available | 2023-08-15T12:31:53Z | |
| dc.date.issued | 2023-08-15 | en_US |
| dc.description.abstract | Object detection utilizing frequency modulated continuous wave radar is becoming increasingly popular in the field of autonomous vehicles. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. Thus, there is a necessity for fully autonomous systems to utilize radar sensing applications in downstream decision-making tasks, generally handled by deep learning algorithms. Commonly, three transformations have been used to form range-azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This method has drawbacks, specifically the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We develop a network utilizing raw radar analog-to-digital converter output capable of operating in near real-time given the removal of all pre-processing. We obtain inference time estimates one-fifth of the traditional range-Doppler pipeline, decreasing from $\SI{156}{\milli\second}$ to $\SI{30}{\milli\second}$, and similar decreases in comparison to the full range-azimuth-Doppler cube. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, \textit{i.e.}, relatively low and high numbers of transmitters and receivers. Our network increases both average recall, and mean intersection over union performance by $\sim 6-7\%$, obtaining state-of-the-art F1 scores as a result on high-definition radar. On low-definition radar, we note an increase in mean average precision of $\sim 2.5\%$ over state-of-the-art range-Doppler networks when raw analog-to-digital converter data is used, and a $\sim5\%$ increase over networks using the full range-azimuth-Doppler cube. | en_US |
| dc.identifier.uri | http://hdl.handle.net/10393/45288 | |
| dc.identifier.uri | http://dx.doi.org/10.20381/ruor-29494 | |
| dc.language.iso | en | en_US |
| dc.publisher | Université d'Ottawa / University of Ottawa | en_US |
| dc.rights | Attribution 4.0 International | * |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | * |
| dc.subject | vision transformer | en_US |
| dc.subject | radar | en_US |
| dc.subject | object detection | en_US |
| dc.title | Object Detection with Swin Vision Transformers from Raw ADC Radar Signals | en_US |
| dc.type | Thesis | en_US |
| thesis.degree.discipline | Génie / Engineering | en_US |
| thesis.degree.level | Masters | en_US |
| thesis.degree.name | MASc | en_US |
| uottawa.department | Science informatique et génie électrique / Electrical Engineering and Computer Science | en_US |
