Translation and scale invariance in event-based object tracking

Event-based object tracking

Translation and scale invariance in real-time with receptive fields

Jens Pedersen & Raghav Singhal & Jörg Conradt

jeped@kth.se jegp@mastodon.social jepedersen.dk

Thank you - Juan P. Romero B., Emil Jansson, Harini Sudha

Scale-space theory

Lindeberg, Journal of Mathematical Imaging and Vision (2022)

A model $g$ is invariante to transformation $f$: $$ f(g(x)) = g(x) $$

Invariance properties of convolutions

Scale invariance with receptive fields

Capturing structure:
How does this work in 2 dimensions?

Lindeberg, Heliyon 7 (2021)

Lindeberg, Journal of Mathematical Imaging and Vision (2022)

Gaussian receptive field provides

Linearity between n-th gaussian derivatives
Translation invariance
Scale invariance

$\implies$Capture spatial features

But what about time?

Spatial and temporal invariances in sparse signals
Stepwise real-time predictions

What you see here is a pendulum recorded with an event-based camera. Event cameras essentially record the change in luminosity above or below a certain threshold. This is a fascinating technology, but models and algorithms for event-based vision are still struggling to keep up with conventional frame-based computer vision models. Unfortunately, we cannot directly transfer frame-based algorithms; if we take a single frame, we don't get much information. If we are to succeed in real-time event-based vision, I believe we need to tackle two problems
- We need some form of spatio-temporal integration to form an understanding about what an "event" is attached to
- We need predictions with millisecond precision to exploit the time granularity of event cameras We need some form of temporal integration or, recurrence, in the words of neural networks.

Signal processing with convolutions

Temporal heatmaps

Read out coordinates at every frame
Differentiable

Experimental setup & results

1ms frames with coordinate labels

240'000 datapoints - Bernouilli $p=0.8$

Model with 4 scale spaces

Runs at 1000Hz on GPUs

Event-based object tracking

Limitations

Only simulated data
Only on GPUs
Only for translation and scale

Event-based object tracking

Summary

SNN rivals ANN despite high density
Differentiable coordinate transformation
Real-time vision processing with events

Event-based object tracking

Translation and scale invariance in real-time with receptive fields

Jens Pedersen & Raghav Singhal & Jörg Conradt

jeped@kth.se jegp@mastodon.social jepedersen.dk

Thank you - Juan P. Romero B., Emil Jansson, Harini Sudha

Event-based object tracking

Translation and scale invariance in real-time with receptive fields

Jens Pedersen & Raghav Singhal & Jörg Conradt

Scale-space theory

Invariance properties of convolutions

Scale invariance with receptive fields

Capturing structure: How does this work in 2 dimensions?

Gaussian receptive field provides

But what about time?

Signal processing with convolutions

Temporal heatmaps

Experimental setup & results

1ms frames with coordinate labels

Model with 4 scale spaces

Runs at 1000Hz on GPUs

Event-based object tracking

Limitations

Event-based object tracking

Summary

Event-based object tracking

Translation and scale invariance in real-time with receptive fields

Jens Pedersen & Raghav Singhal & Jörg Conradt

Capturing structure:
How does this work in 2 dimensions?