Compact Edge‑AI HMI: Gesture, Voice & Touch Integration

12/17/2025 Knowledge

Touch, language, gestures: A demonstrator shows how modern human-machine interaction can be implemented in confined spaces. Embedded intelligence combined with flexible interfaces enables responsive and intuitive operating concepts.

The way humans interact with machines is evolving at a rapid pace. While touch displays are now the norm, contactless operating concepts are becoming increasingly vital, especially for applications where hygiene, environmental influences or physical limitations matter.

The demonstrator presented here (Figure 1) shows how different sensor principles – radar, speech and touch – can be combined in a single embedded system to form a robust human-machine interface (HMI). The goal was to showcase the robustness of the recognition system against external influences such as bright sunlight, rain, ambient noise, or when hands are dirty or gloves are worn. Developed as part of customer and internal development projects, the demonstrator serves as a practical basis for providing expertise in hardware, software and algorithms for radar gesture recognition and voice control.

Overview of the system architecture

The demonstrator combines all the key elements of a modern, multimodal human-machine interface in a compact design. The integration of gesture and voice control, motor control and graphical display requires a carefully coordinated architecture that accommodates diverse sensor and actuator interfaces while supporting real-time parallel processing.

A key challenge was integrating heterogeneous components with varying electrical interfaces – ranging from high-speed display connections to latency-critical sensor inputs – onto a single microcontroller platform. The Infineon PSOC Edge supplies the processing power and peripherals for signal processing and control, while the FreeRTOS real-time operating system orchestrates individual tasks and manages data streams and control commands via the internal AHB interconnect bus.

Table 1 below shows the key hardware components of the demonstrator together with their most important properties and functions:

Table 1: Key hardware components of the demonstrator with their properties and functions

Component	Property	Function
BLDC motor with Hall sensors	Control via PWM and GPIOs (Hall sensors)	Rotational speed and direction are adjustable, feedback of the rotor position
IM69D130 XENSIV MEMS digital microphone from Infineon	Connection via digital PDM interface	Capturing audio signals for keyword spotting
1024-600 IPS TFT LCD touch display from Raystar	Connection via MIPI DSI, capacitive touch technology	Visualization of system states, touch operation
60 GHz radar IC BGT60TR13C from Infineon	Connection via SPI and GPIOs	Gesture recognition through analysis of magnitude, range and azimuth
IFX007T motor control board from Infineon	Triple half-bridge module	Power control of the BLDC motor
Evaluation board with PSOC Edge from Infineon	Multi-core microcontroller with NPU (neural processing unit)	Central processing of radar, audio, display and motor control signals

Gesture recognition with 60 GHz radar

Gesture recognition is the key control element of the demonstrator. Left and right motions are detected, which cause the motor to turn faster or slower, as well as a “push” motion to stop the motor. At the core lies a 60 GHz FMCW radar sensor for motion detection. Signal processing is performed entirely on the Cortex-M55 core of the PSOC Edge. As machine learning is not used here, development time is reduced and the need for training eliminated.

Gesture recognition process in the demonstrator (Figure 2):

Input data: Magnitude and AoA (azimuth only, as currently only the directions “left” and “right” are distinguished).
Capture: A 60 GHz FMCW radar sensor, featuring one transmitting and three receiving antennas, provides a separate signal per antenna.
Motion detection: Doppler FFT per antenna signal for identifying moving targets and suppressing static objects.
Direction determination: Calculation of the azimuth angle from the phase differences of the receiving antenna.
Gesture classification: Evaluation of the temporal progression of the azimuth angle to detect motion types such as “left swipe”, “right swipe” or “click”.

The latency period is approximately 10 milliseconds after the motion has stopped. Gestures can be recognized at distances of around 5 to 30 centimeters, or also from a greater distance if configured accordingly, even under complex reflection conditions and typical environmental factors such as sunlight. Incorporating elevation (vertical angle) enables potential recognition of additional gestures such as “up/down”.

Gesture recognition is supplemented by voice control using keyword spotting. Voice signals are captured by a MEMS microphone, preprocessed on the PSOC Edge Cortex-M55 and evaluated using a trained neural network. The network with multiple convolution layers is specifically optimized to recognize a limited set of clearly defined keywords, such as “start” or “stop”.

The model was developed in Python using the Keras and TensorFlow libraries and subsequently ported to the PSOC Edge for use with Infineon’s ML Configurator. Inference runs on the Cortex-M55 with an optimized TensorFlow Lite micro runtime environment.

Keyword spotting process (Figure 3):

Audio recording: The digital MEMS microphone (16 kHz sampling rate) provides PDM data.
Pre-processing: Conversion to a MEL filter bank using window slicing (duration: ~530 µs).
Inference: Evaluation of MEL spectra by CNN (multiple convolution layers).
Result: The recognized keyword is transmitted as a control command to the motor control or other system functions.

Motor control for BLDC with Hall sensors

A brushless DC motor (24 V, max. 4,800 RPM), controlled directly by the microcontroller, provides direct feedback for the gesture or voice commands. Using integrated Hall sensors, the system detects the current speed and adjusts it based on gesture or voice commands.

The IFX007T triple half-bridge module from Infineon is used for control, which in turn is controlled via PWM signals and digital control lines. The motor speed is regulated at a sampling rate of 1 kHz, ensuring that changes in speed are implemented quickly and precisely. The motor function primarily serves as a demonstration object but can easily be replaced by other actuators or display systems.

User interface with touch display

A capacitive 7” touch display (with a resolution of 1,024 × 600 pixels) is used to visualize the system states. It displays, among other data, speed values, recognized gestures and the voice control status.

The graphical user interface is generated directly on the microcontroller using the open source LVGL library. Efficient utilization of memory and processing resources enables real-time concurrent execution of the user interface, gesture recognition and speech processing. The refresh rate of approximately 10 FPS is sufficient for status displays and operational feedback.

Special challenges and learning

A key feature of the demonstrator is the direct comparison of two approaches: conventional signal processing and machine learning (ML). ML was intentionally omitted for 60 GHz radar gesture recognition, as it was not functionally necessary. This allows for more robust, low-latency recognition without the need for training. The method is insensitive to extraneous light, noise interference and variations in hand position.

In contrast, voice control via keyword spotting uses a pre-trained neural network specifically optimized for a limited set of clearly defined keywords such as “start” and “stop”. This is where machine learning excels, enabling specific responses to recurring, precisely specified events. The algorithms are based on publicly available datasets that were used to improve recognition stability.

This hybrid approach – applying conventional signal processing where it offers speed and robustness, and machine learning where it enhances recognition – shows how different methods can be optimally combined to create a versatile and practical HMI solution.

Another goal was to demonstrate that all functions, such as gesture recognition, voice control, motor control and graphical display, could be implemented entirely on a single microcontroller. Achieving this required seamless integration of hardware and software components as well as real-time processing of multiple sensor data streams within the limited resources of an embedded platform. This involved coordinating different interfaces, minimizing latency and assigning priorities in a sensible manner. The modular architecture and strict separation of functional units allow flexible adaptation to diverse application scenarios and offer developers a ready-to-use foundation for custom projects.

Outlook and transferability to real-world applications

The combination of radar, audio and motor control in a single system serves not only as a technical feasibility study but also establishes a practical platform for knowledge transfer. Customers benefit from ready-to-use software examples that they can use to perform their own tests or quickly develop their own environments based on them. This saves a considerable amount of time during project implementation.

The demonstrator is suitable as a reference platform and can be adapted to customer projects as needed. It can also be used in special laboratory environments like clean rooms or glove boxes. The hardware and software base, together with supplementary resources, including sample code, circuit diagrams, application notes and instructions, can be provided upon request. Adjustments to individual requirements can be implemented, for instance, by integrating additional functions, making changes to speech recognition or adding enhancements for gesture recognition.

An example of transferability is deploying a neural network on an RDK2 platform in conjunction with an RAB3 radar. Once the principles of data collection, training and neural network deployment are understood, these methods can be transferred to other platforms. Infineon’s toolset supports this process and simplifies porting.

Moreover, further developments in radar technology are planned to implement additional application examples and expand the range of functions. The demonstrator is therefore not only an actual technology component but also an open platform for developing smart, sensor-based HMI solutions in embedded environments.

For more information and a direct ordering option, please visit our e-commerce platform at www.rutronik24.com.

Subscribe to our newsletter and stay updated.

Figure 1: Overall view of the demonstrator with display, radar module, microcontroller board, display and motor (source: Rutronik System Solutions)

Figure 2: Gesture recognition with the BGT60TR13C radar from Infineon. (Source: Rutronik System Solutions)

Figure 3: Keyword spotting process (source: Rutronik System Solutions)

GESTURES, LANGUAGE, DISPLAY: COMPACT HMI WITH EDGE AI - Multimodal interface based on PSOC