frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Build visual AI workflows from a prompt – OCR, detection, editing and more

https://colab.research.google.com/github/vlm-run/vlmrun-cookbook/blob/main/notebooks/10_mcp_showcase.ipynb
5•fzysingularity•7h ago

Comments

fzysingularity•7h ago
We built a tool that lets you augment LLM agents with visual capabilities — like OCR, object detection, and video editing — using just plain English. No need to write computer vision code.

Examples:

> “Blur all faces in this image and preview it.”

> “Extract the invoice ID, email, and totals from this invoice and overlay their locations.”

> "Redact all the sensitive data in this image, and preview the result."

> “Trim this video from 0:30 to 1:10 and add captions.”

It works with any MCP-compatible agent (Claude, OpenAI, Cursor, etc.), and turns natural language into visual AI workflows. No Python. No brittle CV pipelines. Just describe what you want, and your agent handles the rest.

Here's the full showcase / our docs:

[1] Colab showcase: https://colab.research.google.com/github/vlm-run/vlmrun-cook...

[2] MCP Intro / Docs: https://docs.vlm.run/mcp/introduction

We’d love feedback — especially from devs building LLM tools, agentic frameworks, or anything that needs visual understanding.

MirajulMohin•6h ago
Tried it out. Cool!
kernel33•6h ago
Are you running everything through a single end-to-end vision model, or do you dynamically dispatch to specialized OCR, detection, and segmentation backends?
fzysingularity•4h ago
This demo showcases the latter approach with tool-calling - essentially filling in the gaps of current VLMs. That said, we're of course interested in folding all these capabilities into a single model, but that's going to take a bit more work.

What makes this approach interesting is that our VLMs need to able to understand intermediate results (sometimes in the form of images themselves), and then delegate to other specialized tools whenever it can't perform a specific action.

Show HN: Someone built a fantastic learning resource for Google's Gemini CLI

https://starshipcli.online/
1•howardV•44s ago•0 comments

Async Ruby Is the Future of AI Apps (and It's Already Here)

https://paolino.me/async-ruby-is-the-future/
1•doppp•1m ago•0 comments

Failed Global Solutions System AMRS-X

https://www.dropbox.com/scl/fo/a3uv4hgxhj25fnthpluwb/AH-GgcVjwBjePcCIizh40po?dl=0&noscript=1&rlkey=ewisyhcls8zo5hib8clhxwezi&st=mm4tfz6o
1•technovahub•1m ago•1 comments

Self-Cleaning Ants

https://gbragafibra.github.io/2025/07/06/collatz_ant6.html
1•Fibra•3m ago•0 comments

Israeli defence minister plans to move Gaza's population to camp in Rafah

https://www.bbc.com/news/articles/c8rp31lk7mzo
1•mhga•4m ago•0 comments

Intel's CEO: 'We are not in the top' of leading chip companies

https://www.oregonlive.com/silicon-forest/2025/07/intels-ceo-we-are-not-in-the-top-10-of-leading-chip-companies.html
1•osnium123•4m ago•0 comments

Steganography in floating point data with NaN payloads

https://github.com/kjordahl/steganan
1•dynm•10m ago•0 comments

The Retrofit and the Built Environment Starter Pack from Heat Pumps to Financing

https://climatedrift.substack.com/p/the-retrofit-and-the-built-environment
1•ssuds•11m ago•0 comments

Red Hat just expanded free access to RHEL for business developers

https://www.zdnet.com/article/red-hat-expands-free-access-to-rhel-for-business-developers/
1•thunderbong•13m ago•1 comments

Be History or Do History?

https://contraptions.venkateshrao.com/p/be-history-or-do-history
1•OgsyedIE•15m ago•0 comments

Java Criminally Underhyped? Not Back in 1997. (2021)

https://dylanbeattie.net/2021/07/01/java-is-criminally-underhyped.html
1•1317•17m ago•0 comments

Writing a simple JIT Compiler in about 1000 lines of C

https://kuterdinel.com/writing-a-very-simple-jit-compiler-in-about-1000-lines-of-c.html
1•Bogdanp•21m ago•0 comments

Amazon gets serious with AI Safety

https://arxiv.org/abs/2507.06260
2•prizeon•22m ago•0 comments

Escaping Groupthink

https://www.thetransmitter.org/animal-behavior/escaping-groupthink-what-animals-behavioral-quirks-reveal-about-the-brain/
1•wjb3•27m ago•0 comments

Southeast Asia's Last Culinary Frontier: The 17,000 Islands of Indonesia

https://www.youtube.com/watch?v=dr3Hsa8Fam4
2•bane•33m ago•0 comments

Show HN: Buzz0.com – Daily curated Show HN posts

https://buzz0.com/
1•Airyisland•33m ago•0 comments

Doing More Is Often Easier

https://www.raptitude.com/2025/04/doing-more-is-often-easier/
3•_vaporwave_•38m ago•1 comments

Civilian hackers in China's military cyber strategy

https://margin.re/mobilizing-cyber-power-the-growing-role-of-cyber-militias-in-chinas-network-warfare-force-structure-2/
4•aaronsdevera•40m ago•0 comments

Deep Dive into Rails Database Connection Pools

https://www.prateekcodes.dev/rails-database-connection-pooling-explained/
1•prateekkish•41m ago•0 comments

Arguing About Woodworking More Popular Hobby Than Woodworking (2013)

http://www.closegrain.com/2013/04/arguing-about-woodworking-more-popular.html
2•ecliptik•43m ago•0 comments

TikTok prepares US app with its own algorithm and user data

https://www.reuters.com/world/china/tiktok-prepares-us-app-with-its-own-algorithm-user-data-2025-07-09/
1•mfiguiere•44m ago•1 comments

Your Prize for Saving Time at Work with AI: More Work

https://www.wsj.com/lifestyle/careers/ai-work-free-time-51c8c92a
12•petethomas•54m ago•5 comments

The case for building operator interfaces before AI agents

https://www.henrypray.com/writings/the-only-saas-feature-you-should-be-building
2•henrypray•57m ago•0 comments

Type-C To Type-C Scented Cable 48in

https://www.fivebelow.com/products/up-tech-type-c-to-type-c-scented-cable-48in-9184770
2•rendx•58m ago•1 comments

Show HN: ColorConJ – Explore Spanish color names by letter

https://colorconj.com/
1•lur0913•59m ago•0 comments

Eval AI jobs new market for Mercor

https://www.gardinercolin.com/p/marketplace-memo-13
1•predogger•59m ago•0 comments

In search of more efficient learning algorithms, researchers look to infants

https://www.thetransmitter.org/neuroai/the-babylm-challenge-in-search-of-more-efficient-learning-algorithms-researchers-look-to-infants/
4•domofutu•1h ago•0 comments

HIV-1 latency reversal via ectopic expression of a viral antisense transcript

https://www.science.org/doi/10.1126/sciadv.adu8014
2•PaulHoule•1h ago•0 comments

Our Missing Pieces

https://docs.google.com/document/d/1-KSIE89xHnipRBm8T6BRbxEQb5_byr5CwkB-S7XIwjQ/edit?tab=t.0
1•jger15•1h ago•1 comments

Claude Code OAuth Authentication Fails - "OAuth account information not found

https://github.com/anthropics/claude-code/issues/1484
1•rakken•1h ago•0 comments