Accelerating event-based processing with coroutines on CPUs and GPUs

Jens Pedersen & Jörg Conradt

Threads and buffers

Lock-free cooperation

  1. Almost no synchronization overhead
  2. No complex memory abstractions

CPU benchmarks

Do coroutines improve throughput?

  • Generated $10^6$ to $10^9$ events
  • Time storing and loading in shared memory

CPU benchmarks: mutex barrier

CPU benchmarks: relative speed

GPU benchmarks

Do coroutines improve throughput on GPUs?

SpiNNaker benchmark

  • Increases throughput from 200kev/s to 10Mev/s
  • Reduces streaming latency by 30%

Edge detection with SNN

from aestream import USBInput # Import AEStream
net = ...                     # Create SNN network

with USBInput((640, 480), device="gpu) as camera:
    while True:  # Loop forever
        tensor = # Read a tensor "frame"
        out = net(tensor) # Apply


  • Easy to use and open source
  • Supports large number of input/output pairs
  • 3x throughput for events on CPUs
  • 5x throughput for events on GPUs

AEStream - Accelerating event-based processing with coroutines

Jens Pedersen & Jörg Conradt

Thank you - Juan P. Romero B., Emil Jansson, Anders Søborg, Alexander Hadjivanov, Cameron Baker, Steven Abreu, Harini Sudha, Christian Pehle, Gregor Lenz

Join us