So much text and not a single example, diagram, or demo.
I'm honestly skeptical this will work at all, the FOV of most webcams is so small that it can barely capture the shoulder of someone sitting beside me, let alone their eyes.
Then what you're basically looking for is callibration from the eye position / angle to the screen rectangle. You want to shoot a ray from each eye and see if they intersect with the laptop's screen.
This is challenging because most webcams are pretty low resolution, so each eyeball will probably be like ~20px. From these 20px, you need to estimate the eyeball->screen ray. And of course this varies with the screen size.
TLDR: Decent idea, but should've done some napkin math and or quick bounds checking first. Maybe a $5 privacy protector is better.
Here's an idea:
Maybe start by seeing if you can train a primary user gaze tracker first, how well you can get it with modeling and then calibration. Then once you've solved that problem, you can use that as your upper bound of expected performance, and transform the problem to detecting the gaze of people nearby instead of the primary user.
dinobones•1h ago
I'm honestly skeptical this will work at all, the FOV of most webcams is so small that it can barely capture the shoulder of someone sitting beside me, let alone their eyes.
Then what you're basically looking for is callibration from the eye position / angle to the screen rectangle. You want to shoot a ray from each eye and see if they intersect with the laptop's screen.
This is challenging because most webcams are pretty low resolution, so each eyeball will probably be like ~20px. From these 20px, you need to estimate the eyeball->screen ray. And of course this varies with the screen size.
TLDR: Decent idea, but should've done some napkin math and or quick bounds checking first. Maybe a $5 privacy protector is better.
Here's an idea:
Maybe start by seeing if you can train a primary user gaze tracker first, how well you can get it with modeling and then calibration. Then once you've solved that problem, you can use that as your upper bound of expected performance, and transform the problem to detecting the gaze of people nearby instead of the primary user.