I built this for my CS bachelor's thesis. The goal was to develop a system capable of detecting violent actions and specific weapons (knives) in video streams in real-time.
The system combines object detection, pose estimation, and temporal analysis to recognize violent actions in video. It uses YOLO11n-detect to identify knives and YOLO11n-pose to extract 17 skeletal keypoints from people in the scene. Individuals are tracked across frames using BoT-SORT so their movements can be analyzed over time. These sequences of normalized skeletal keypoints are then processed by a Bidirectional LSTM, which classifies the action as violent or non-violent.
I've also included a YouTube playlist showing the system in action on various real-world and simulated scenarios.
I’m looking for feedback on the architecture, particularly on how to better handle complex group interactions or reduce ambiguity in high-speed non-violent movements.
iraton•2h ago
The system combines object detection, pose estimation, and temporal analysis to recognize violent actions in video. It uses YOLO11n-detect to identify knives and YOLO11n-pose to extract 17 skeletal keypoints from people in the scene. Individuals are tracked across frames using BoT-SORT so their movements can be analyzed over time. These sequences of normalized skeletal keypoints are then processed by a Bidirectional LSTM, which classifies the action as violent or non-violent.
I've also included a YouTube playlist showing the system in action on various real-world and simulated scenarios.
I’m looking for feedback on the architecture, particularly on how to better handle complex group interactions or reduce ambiguity in high-speed non-violent movements.
Repo: https://github.com/lraton/real-time-violent-action-detection