frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How to automate aesthetic photo cropping? (CV/AI)

1•icons•3w ago
Hi everyone,

I am a backend developer currently engineering an in-house automation tool for a K-pop merchandise production company (photocards, postcards, etc.).

I have built an MVP using Python (FastAPI) + Libvips + InsightFace to automate the process where designers previously had to manually crop thousands of high-resolution photos using Illustrator.

While basic face detection and image quality preservation (CMYK conversion, etc.) are successful, I am hitting a bottleneck in automating the "Designer's Sense (Vibe/Aesthetics)."

[Current Stack & Workflow]

Tech Stack: Python 3.11, FastAPI, Libvips (Processing), InsightFace (Landmark Detection).

Workflow: Bulk Upload $\rightarrow$ Landmark Extraction (InsightFace) $\rightarrow$ Auto-crop based on pre-defined ratios $\rightarrow$ Human-in-the-loop fine-tuning via Web UI.

[The Challenges]

Mechanical Logic vs. Aesthetic Crop

Simple centering logic fails to capture the "perfect shot" for K-pop idols who often have dynamic poses or varying camera angles.

Issue: Even if the landmarks are mathematically centered, the resulting headroom is often inconsistent, or the chin is awkwardly cut off. The output lacks visual stability compared to a human designer's work.

Need for Reference-Based One-Shot Style Transfer

Clients often provide a single "Guide Image" and ask, "Crop the rest of the 5,000 photos with this specific feel." (e.g., a tight face-filling close-up vs. a spacious upper-body shot).

Goal: Instead of designers manually guessing the ratio, I want the AI to reverse-engineer the composition (face-to-canvas ratio, relative position) from that one sample image and apply it dynamically to the rest of the batch.

[Questions]

Q1. Direction for Improving Aesthetic Composition

Is it more practical to refine Rule-based Heuristics (e.g., fixing eye position to the top 30% with complex conditionals), or should I look into "Aesthetic Quality Assessment (AQA)" or "Saliency Detection" models to score and select the best crop?

As of 2026, what is the most efficient, production-ready approach for this?

Q2. One-Shot Composition Transfer

Are there any known algorithms or libraries that can extract the "compositional style" (relative position of eyes/nose/mouth regarding the canvas frame) from a single reference image and apply it to target images?

I am looking for keywords or papers related to "One-shot learning for layout/composition" or "Content-aware cropping based on reference."

Any keywords, papers, or architectural advice from those who have tackled similar problems in production would be greatly appreciated.

Thanks in advance.

Comments

nialse•3w ago
Use commercial tools and services for automating cropping. Google is your friend.

Trump Vodka Becomes Available for Pre-Orders

https://www.forbes.com/sites/kirkogunrinde/2025/12/01/trump-vodka-becomes-available-for-pre-order...
1•stopbulying•52s ago•0 comments

Velocity of Money

https://en.wikipedia.org/wiki/Velocity_of_money
1•gurjeet•3m ago•0 comments

Stop building automations. Start running your business

https://www.fluxtopus.com/automate-your-business
1•valboa•7m ago•1 comments

You can't QA your way to the frontier

https://www.scorecard.io/blog/you-cant-qa-your-way-to-the-frontier
1•gk1•8m ago•0 comments

Show HN: PalettePoint – AI color palette generator from text or images

https://palettepoint.com
1•latentio•9m ago•0 comments

Robust and Interactable World Models in Computer Vision [video]

https://www.youtube.com/watch?v=9B4kkaGOozA
1•Anon84•13m ago•0 comments

Nestlé couldn't crack Japan's coffee market.Then they hired a child psychologist

https://twitter.com/BigBrainMkting/status/2019792335509541220
1•rmason•14m ago•0 comments

Notes for February 2-7

https://taoofmac.com/space/notes/2026/02/07/2000
2•rcarmo•16m ago•0 comments

Study confirms experience beats youthful enthusiasm

https://www.theregister.com/2026/02/07/boomers_vs_zoomers_workplace/
2•Willingham•23m ago•0 comments

The Big Hunger by Walter J Miller, Jr. (1952)

https://lauriepenny.substack.com/p/the-big-hunger
2•shervinafshar•24m ago•0 comments

The Genus Amanita

https://www.mushroomexpert.com/amanita.html
1•rolph•29m ago•0 comments

We have broken SHA-1 in practice

https://shattered.io/
9•mooreds•29m ago•2 comments

Ask HN: Was my first management job bad, or is this what management is like?

1•Buttons840•30m ago•0 comments

Ask HN: How to Reduce Time Spent Crimping?

2•pinkmuffinere•32m ago•0 comments

KV Cache Transform Coding for Compact Storage in LLM Inference

https://arxiv.org/abs/2511.01815
1•walterbell•36m ago•0 comments

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9
1•PaulHoule•38m ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...
1•saikatsg•38m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot
1•aweussom•39m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
4•archb•41m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•41m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•42m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•42m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•47m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
4•dragandj•49m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/
1•maurizzzio•49m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•51m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/
1•pranay01•52m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/
1•todsacerdoti•52m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel
1•Sean766•54m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos
1•fluantix•55m ago•0 comments