Launch HN: Mosaic (YC W25) – Agentic Video Editing

148•adishj•2mo ago

Hey HN! We’re Adish & Kyle from Mosaic (https://edit.mosaic.so, https://docs.mosaic.so/, https://mosaic.so). Mosaic lets you create and run your own multimodal video editing agents in a node-based canvas. It’s different from traditional video editing tools in two ways: (1) the user interface and (2) the visual intelligence built into our agent.

We were engineers at Tesla and one day had a fun idea to make a YouTube video of Cybertrucks in Palo Alto. We recorded hours of cars driving by, but got stuck on how to scrub through all this raw footage to edit it down to just the Cybertrucks.

We got frustrated trying to accomplish simple tasks in video editors like DaVinci Resolve and Adobe Premiere Pro. Features are hidden behind menus, buttons, and icons, and we often found ourselves Googling or asking ChatGPT how to do certain edits.

We thought that surely now, with multimodal AI, we could accelerate this process. Better yet, an AI video editor could automatically apply edits based off what it sees and hears in your video. The idea quickly snowballed and we began our side quest to build “Cursor for Video Editing”.

We put together a prototype and to our amazement, it was able to analyze and add text overlays based on what it saw or heard in the video. We could now automate our Cybertruck counting with a single chat prompt. That prototype is shown here: https://www.youtube.com/watch?v=GXr7q7Dl9X0.

After that, we spent a chunk of time building our own timeline-based video editor and making our multimodal copilot powerful and stateful. In natural language, we could now ask chat to help with AI asset generation, enhancements, searching through assets, and automatically applying edits like dynamic text overlays. That version is shown here: https://youtu.be/X4ki-QEwN40.

After talking to users though, we realized that the chat UX has limitations for video: (1) the longer the video, the more time it takes to process. Users have to wait too long between chat responses. (2) Users have set workflows that they use across video projects. Especially for people who have to produce a lot of content, the chat interface is a bottleneck rather than an accelerant.

That took us back to first principles to rethink what a “non-linear editor” really means. The result: a node-based canvas which enables you to create and run your own multimodal video editing agents. https://screen.studio/share/SP7DItVD.

Each tile in the canvas represents a video editing operation and is configurable, so you still have creative control. You can also branch and run edits in parallel, creating multiple variants from the same raw footage to A/B test different prompts, models, and workflows. In the canvas, you can see inline how your content evolves as the agent goes through each step.

The idea is that canvas will run your video editing on autopilot, and get you 80-90% of the way there. Then you can adjust and modify it in an inline timeline editor. We support exporting your timeline state out to traditional editing tools like DaVinci Resolve, Adobe Premiere Pro, and Final Cut Pro.

We’ve also used multimodal AI to build in visual understanding and intelligence. This gives our system a deep understanding of video concepts, emotions, actions, spoken word, light levels, shot types.

We’re doing a ton of additional processing in our pipeline, such as saliency analysis, audio analysis, and determining objects of significance—all to help guide the best edit. These are things that we as human editors internalize so deeply we may not think twice about it, but reverse-engineering the process to build it into the AI agent has been an interesting challenge.

Some of our analysis findings: Optimal Safe Rectangles: https://assets.frameapp.ai/mosaicresearchimage1.png Video Analysis: https://assets.frameapp.ai/mosaicresearchimage2.png Saliency Analysis: https://assets.frameapp.ai/mosaicresearchimage3.png Mean Movement Analysis: https://assets.frameapp.ai/mosaicresearchimage4.png

Use cases for editing include: - Removing bad takes or creating script-based cuts from videos / talking-heads - Repurposing longer-form videos into clips, shorts, and reels (e.g. podcasts, webinars, interviews) - Creating sizzle reels or montages from one or many input videos - Creating assembly edits and rough cuts from one or many input videos - Optimizing content for various social media platforms (reframing, captions, etc.) - Dubbing content with voice cloning and lip syncing.

We also support use cases for generating content such as motion graphic animations, cinematic captions, AI UGC content, adding contextual AI-generated B-Rolls to existing content, or modifying existing video footage (changing lighting, applying VFX).

Currently, our canvas can be used to build repeatable agentic workflows, but we’re working on a fully autonomous agent which will be able to do things like: style transfer using existing video content, define its own editing sequence / workflow without needing a canvas, do research and pull assets from web references, and so on.

You can try it today at https://edit.mosaic.so. You can sign up for free and get started playing with the interface by uploading videos, making workflows on the canvas, and editing them in the timeline editor. We do paywall node runs to help cover model costs. Our API docs are at https://docs.mosaic.so. We’d love to hear your feedback!

Comments

tonyoconnell•2mo ago

This is so cool. Good luck with your venture.

adishj•2mo ago

Thank you :)

callamdelaney•2mo ago

Hey, good luck with Mosaic.

Some feedback initially on the landing page, looks great but I thought that there is, for me, too much motion going on on the homepage and the use cases page. May be an unpopular opinion!

cjbarber•2mo ago

Agreed, homepage was confusing for me also. I tried to scroll around and see a demo. For a product like this that is so visual, I expected to be able to find a 30s demo clip somewhere but couldn't see one on the homepage or product page (and the scrolling on the product page was annoying for me).

adishj•2mo ago

the sad part is spent so long on the product page scrolling animation haha

very valid point though — I think a demo clip of a BEFORE vs AFTER immediately somewhere in the hero even or right below it would be helpful

thanks for the feedback

adishj•2mo ago

valid points, thanks for the feedback. i had gone for a certain aesthetic but you're right in that it may be a bit too overwhelming.

cjbarber•2mo ago

I think this is a great endeavor. I was thinking about a channel that I like watching on YouTube. They travel to exotic places by boat and film themselves, nature documentary style. To make good videos requires going to these places, a ton of filming, AND a ton of editing. They put out a video every 2 weeks or so on their trips. I imagine the editing is the hard part.

This is a long winded way of saying that I think creators need what you're making! People who have hours of awesome footage but have to spend dozens of hours cutting it down need this. Then also people who have awesome footage but aren't good at editing or hiring an editor, same thing. I'd love to see someone solve this so that 90th percentile editing is available to all, and then it can be more about who has the interesting content, rather than who has the interesting content and editing skills.

adishj•2mo ago

thanks! Mosaic can already do the rough cuts for you — so you can upload all your footage from your travel, and prompt it to "make a 2 minute highlight reel of your trip to Japan", for instance.

soon, we also plan to incorporate style transfer, so you could even give it a video from the channel you enjoy watching + your raw footage, and have the agent edit your footage in the same style of the reference video.

mrbluecoat•2mo ago

> you can upload all your footage from your travel, and prompt it to "make a 2 minute highlight reel of your trip to Japan"

In relation to the demo requests below, I think this would be a good example of how an average person might use your platform.

adishj•2mo ago

for a demo, check out this one that I put together using 81 clips from a skydiving trip we took in Monterey, CA:

https://edit.mosaic.so/links/c51c0555-3114-45f4-ab8f-c25f172...

Bishonen88•2mo ago

this seems rather basic. From watching a bit, it just seems to cut/combine the videos? No transitions, no bg music which would fit nicely with the cut timing etc.

I am waiting for a tool that does stuff along those lines for a long time. Apps like dji kinda do it but they have generic music and the cuts do not fit the tune at all, and are rather random. Doing it myself with little effort using davinci or premiere takes ~30 minutes but the results are 5 times better.

Was hoping that this app would do it for me. And even if it would, if it would cost >X$ to create a video like that, then probably I'd still do it myself.

adishj•2mo ago

Hey there, this is just the output from the Rough Cut tile. Right now each tile in Mosaic represents a modular and functional edit to the video. So after you've got your Rough Cut, given all your footage, there are other tiles which you can then use to add transitions, motion graphics, background music, etc.

That being said, the coolest part with this Rough Cut was that it decided to stand in the footage of the kite to creatively imply the freefall portion of the skydive (because we aren't able to record while we're jumping out of a plane). This ability for it to creatively decided to imply the jump because of the limited footage is what I wanted to get across here.

jaccola•2mo ago

Very cool. It definitely feels to me that the power of pro tools should be available to more people with AI.

Would have been nice if there was a killer demo on your landing page of a video made with Mosaic.

adishj•2mo ago

that's our perspective as well.

a lot of tooling is being built around generative AI in particular, but there's still a big gap for people that want to share their own stories / experiences / footage but aren't well-versed with pro tools.

valid feedback on the landing page — something we'll add in.

bluelightning2k•2mo ago

The problem is, any video demo of a tool like this is just an entirely unrelated video.

adishj•2mo ago

can you clarify what you mean here? check out this demo video: https://screen.studio/share/SP7DItVD

Forgeties79•2mo ago

> We got frustrated trying to accomplish simple tasks in video editors like DaVinci Resolve and Adobe Premiere Pro. Features are hidden behind menus, buttons, and icons, and we often found ourselves Googling or asking ChatGPT how to do certain edits.

Hidden behind a UI? Most of the major tools like blade, trim, etc. are right there on the toolbars.

> We recorded hours of cars driving by, but got stuck on how to scrub through all this raw footage to edit it down to just the Cybertrucks.

Scrubbing is the easiest part. Mouse over the clip, it starts scrubbing!

I’m being a bit tongue in cheek and I totally agree there is a learning curve to NLE’s but those complaints were also a bit striking to me.

adishj•2mo ago

hey! You're right that most of the basic tools like splitting / trimming are available right in the timeline. but things like adding a keyframe to animate a counter, for instance, I had no idea where to go or how to start.

Scrubbing is easy enough when you have short footage, but imagine scrubbing through the footage we had of 5 hours of cars driving by, or maybe a bunch of assets. This quickly becomes very tedious.

BolexNOLA•2mo ago

I don’t need to imagine, I do it haha but again I was being tongue in cheek. I personally would love an effective tool that can mark and favorite clips for me based on written prompts. Would save me an awful amount of time!

adishj•2mo ago

curious — what kind of content do you edit?

BolexNOLA•2mo ago

Now? Mostly long form educational content. But historically? Everything more or less! Freelancer for about 15 years until my current in-house producer role.

Forgeties79•2mo ago

Hey I just wanted to come back and be clear that yeah I was being tongue in cheek, but looking back at it comes off as a little snarky/“this isn’t even a thing!” and I’m sorry for that - what you built is really cool and I’m excited to try it out.

Good luck out there!

adishj•2mo ago

no worries at all. compared to some other comments in this thread, I didn't find your tone snarky at all. I appreciate your engaging with the conversation and the thread :)

andrewmlevy•2mo ago

obligatory https://news.ycombinator.com/item?id=9224

BolexNOLA•2mo ago

Like I said, the description of some of the issues was just kind of funny to me - I think this could be a potentially very useful tool.

Do you think this is the next Dropbox?

adishj•2mo ago

next dropbox? lets go!

BolexNOLA•2mo ago

That question definitely sounded way more skeptical than I intended! Man I just can’t get my tone right today

teddyh•2mo ago

Not related to NCSA Mosaic (RIP).

adishj•2mo ago

if you take a snippet of Ben Horowitz's interview out of context, he has a lot of good things to say about our product :)

shivvtrivedi•2mo ago

Mosaic team dev here Hanging in the comments all day and pushing updates as fast as we can -really appreciate the feedback!

mberlove•2mo ago

Is there a way to keep up to date on updates and new announcements? TIA.

adishj•2mo ago

yes! please join our discord https://discord.gg/26SAZzBTaP or follow us on X https://x.com/mosaic_so to keep up to date on updates

bluelightning2k•2mo ago

Good luck. I've dabbled with this myself and ultimately decided that DaVinci Resolve would end up doing this natively. But then again they haven't yet so who knows!

Good luck with it, sincerely.

adishj•2mo ago

thanks! curious what you started dabbling with and if you have any thoughts to share :)

zkmon•2mo ago

I just clicked the link and encountered a non-scrollable, dark, fixed content pane with loads of flickering images and scrolling text with random font sizes without much meaning. I felt imprisoned, subjected to unexpected suffering, can't scroll away, got scared and raced for the window close button, and then breathed easy.

pelagicAustral•2mo ago

They really managed to handcraft a unique user experience, that's for sure.

adishj•2mo ago

we did but the landing page seems to be detracting from it — head directly to https://edit.mosaic.so to try the actual canvas interface

adishj•2mo ago

seems like the landing page is detracting from the main product, this is good feedback so thanks! For now, avoid the scaries and head directly to https://edit.mosaic.so to try the actual canvas interface

conductr•2mo ago

Since video is your thing, I feel like you need to just make a very edited demo reel and put all your energy into trying to get people to watch that video. Meaning, remove almost all text and bloat from the site and just show us all the cool stuff the product does for/to video editing. Distill it to 60-120 seconds and put that on your landing, hell put it on auto play if you want to, so long as it's clear that is the one thing I'm supposed to be paying attention to

adishj•2mo ago

yeah I think a demo reel of a BEFORE vs AFTER immediately somewhere in the hero even or right below it would be helpful

dang•2mo ago

I've put the /edit and /docs links in the first sentence above to soften the blow as well :)

deepspace•2mo ago

I had the same reaction. About what you would expect from a team steeped in the Tesla mindset.

adishj•2mo ago

thanks for the feedback — you can head directly to https://edit.mosaic.so to try the actual canvas interface

dang•2mo ago

Please don't cross into personal attack. We're trying for the opposite on this site.

https://news.ycombinator.com/newsguidelines.html

ack210•2mo ago

I just signed up for a Creator plan, but it looks like the automated "Thank you for being a Mosaic Creator" email going out is not configured correctly. Instead of having my company name, it referenced a different business name and description (that seems to exist/be accurate, so not a placeholder).

adishj•2mo ago

Hey! Thanks for calling this out — looking into what happened here & fixing right now.

adishj•2mo ago

This has been fixed now.

cube00•2mo ago

Could you please expand on how this happened and what else was at risk?

ansc•2mo ago

Woah, yikes.

djeastm•2mo ago

Move fast, break things... like privacy regulations.

echelon•2mo ago

Can you make this a desktop app?

I'm really tired of editing videos in the cloud. I'm also also tired of all these AI image and video tools that make you work over a browser. Your workflow seems so second class buried amongst all the other browser tabs.

I understand that this is how to deploy quickly to customers, but it feels so gross working on "heavy" media in a browser.

supportengineer•2mo ago

There's plenty of great native desktop apps for video editing. And there have been for almost 30 years. I also don't understand why anyone would want to use a browser for this.

adishj•2mo ago

there is some friction even in downloading a new app

if our goal is to bring more people into the fold, minimizing the steps for them to start editing is something we want to optimize for

that being said, being on the browser presents its own set of challenges, many of which are rightfully mentioned in this thread

kleiba•2mo ago

Sorry, not buying the argument. I think it's more like: that's the current zeitgeist.

adishj•2mo ago

we've done a ton of work to optimize the uploads / downloads / transcoding of videos to handle beefy files using proxies, and also allow you to XML export back to traditional editing tools that can link back to your "heavy" media, but I hear you and I think anything running locally on device is just going to feel faster

it does present its own set of challenges, but something we've thought about

sails•2mo ago

I’ve had a lot of fun with Remotion and Claude Code for CLI video editing. I’ve been impressed with how much traditional video editing I can manage.

I will be checking this out!

adishj•2mo ago

that's super interesting — what kind of things have you done with remotion and Claude Code?

they're very powerful, when you put them together, it almost feels like Cursor for Video Editing

sails•2mo ago

Mostly using it for technical marketing/explainer videos eg https://x.com/mattarderne/status/1987441582413345016

hypnagogicjerk•2mo ago

Interested in your workflow @sails

sails•2mo ago

Posted a video in the thread, it’s pretty rudimentary (Claude code does everything) at the moment but I think this has a lot of possibilities.

danishSuri1994•2mo ago

Really interesting direction. The node-based canvas feels like a more scalable abstraction for video automation than the usual chat-only interface. I’m curious how you’re handling long-form content where temporal context matters (e.g., emotional shifts, pacing, narrative cues).

Multimodal models are good at frame-level recognition, but editing requires understanding relationships between scenes, have you found any methods that work reliably there?

adishj•2mo ago

hey, thanks for the comment!

we've actually found that multimodal models are surprisingly good at maintaining temporal context as well

that being said, there's also a bunch of additional processing using more traditional CV / audio analysis we do to extract this information out as well (both frame-level and temporal) in your video understanding

for example, with the mean-motion analysis — you can see how subjects move over a period of time, which can help determine where important things are happening in the video, which ultimately can lead to better placements of edits.

hypnagogicjerk•2mo ago

Side note, just for context, since there seem to be primarily video hobbyists responding to the OP:

Node based workflows are typical in NLE software. See Fusion & Color panels in Davinci Resole, Fusion (color grading), etc. Industry folks will take to this node based canvas with ease.

Great question @danishSuri1994

moinism•2mo ago

Hey, this is super cool. congrats on the product and the launch!

I'm building something exactly similar and couldn't believe my eyes when I saw the HN post. What i'm building (chatoctopus.com) is more like a chat-first agent for video editing, only at a prototype stage. But what you guys have achieved is insane. Wishing you lots of success.

to healthy competition!

adishj•2mo ago

thank you! chatoctopus looks pretty cool, I'm trying it out right now!

how did you find the chat-first interface to work out for video? what we found is that the response times can be so long that the chat UX breaks down a bit. how are you thinking about this?

adishj•2mo ago

looks like I got a network error

Tetraslam•2mo ago

this is going to save me so much time, hell yeah guys!

adishj•2mo ago

thank you! let us know if you have any feedback!

HanClinto•2mo ago

I absolutely love your approach of "expert tools". If I understand your approach, you aren't just feeding a video into a multimodal LLM and asking it "what is the bounding box of the optimal caption region?" -- you have built tools with discrete algorithms (using traditional CV techniques) that use things like object detection boxes + traditional motion analysis techniques to give "expert opinions" to the LLM in the form of tool calls -- such as finding the regions of minimal saliency + minimal movement to be the best places for caption placement.

If the LLM needs to place captions, it calls one of these expert discrete-algorithm tools to determine the best place to put the captions -- you aren't just asking the LLM to do it on its own.

If I'm correct about that, then I absolutely applaud you -- it feels like THIS is a fantastic model for how agentic tools should be built, and this is absolutely the opposite of AI slop.

Kudos!

adishj•2mo ago

thanks for the comment, thats exactly right

we're using a mix of out-of-the-box multimodal AI capability + traditional audio / video analysis techniques as part of our video understanding pipeline, all of which become context for the agent to use during its editing process

supportengineer•2mo ago

Can we stop with the overloaded names? "Mosaic" is a well-known web browser.

adishj•2mo ago

naming is hard

our original name was Frame, only to realize that frame.io existed already.

we brainstormed names for a while and had several notes full of possible names

mosaic is one which stood out to us because it not only represents artwork, but also the tiles (nodes) in the canvas come together to form your mosaic — we thought that was a fitting name

dang•2mo ago

"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

https://news.ycombinator.com/newsguidelines.html

skeeter2020•2mo ago

>> "Mosaic" is a well-known web browser.

Not really relevant anymore, though? As long is it's not called "Project: Prometheus" I think we count it as a win.

Kiro•2mo ago

> "Mosaic" is a well-known web browser

A browser that was discontinued 30 years ago.

homeonthemtn•2mo ago

These comments real sus.

adishj•2mo ago

i agree, things are a bit too kind. give me some more feedback.

dang•2mo ago

Yes, a bunch of positive comments from accounts without much posting history appeared in this thread. I assume that the OP's friends got wind of their launch.

We tell founders to avoid that (scroll down to the bold part of https://news.ycombinator.com/yli.html for how we try to scare YC founders into not doing it!) - but to be fair, (1) this is not always easy to control, and (2) people posting such comments think they're helping and don't have enough experience of HN to realize that it has a counter-effect.

I'm going to move the overly sus ones to a collapsed stub now. (https://news.ycombinator.com/item?id=45988584)

adishj•2mo ago

thanks dan

primitivesuave•2mo ago

Last year, I made a YouTube documentary series showcasing the prolific corruption in a small city government. I downloaded all the city government meetings, used Whisper to transcribe them, and then set up a basic RAG so I could query across a decade of committee meetings (around 1 TB of video). Once I got the timestamps that I'm interested in, I then have to embark on a tedious manual process of locating the file, cutting out a few seconds/minutes from a multi-hour video, and then order all the clips into a cohesive narrative.

These seem like problems that LLMs are especially well-suited for. I might have spent a fraction of the time if there was some system that could "index" my content library, and intelligently pull relevant clips into a cohesive storyline.

I also spent an ungodly amount of time on animations - it felt like "1 hour of work for 1 minute of animation". I would gladly pay for a tool which reduces the time investment required to be a citizen documentarian.

adishj•2mo ago

hey, thanks for sharing about your documentary series. would love to check it out if you don't mind linking it!

we don't yet support that volume of footage (1TB), however if you'd like to try this at a smaller scale, you can already do this today with the Rough Cut tile — simply prompt it for the moments that you're interested in (it can take visual cues, auditory cues, timestamp cues, script cues) and it will create an initial rough cut or assembly edit for you.

I'd also recommend checking out the new Motion Graphics tile we added for animations. You can also single-point generate motion graphics using the utility on the bottom right of the timeline. Let me know if you have any questions on that.

primitivesuave•2mo ago

Absolutely - the channel is called "Dolton Documentaries" on YouTube. I'll definitely check out the features you mentioned, and am super excited to see where this goes!

hypnagogicjerk•2mo ago

An additional suggestion for OP, working with large video archives:

- Batch transcribe your videos to smaller proxy files preserving the same file names (to allow easy re-linking to full quality media later) - Upload proxys to Mosaic - Do your Agentic rough-cut with Mosaic - Export EDL or NLE project file - In NLE, Re-link proxy media to full-quality video & render locally.

To Mosaic:

I need to look deeper at your project, but support for EDL export (Avid, Premiere, Final Cut compatible, as well as commercial grading and conform software workflows) and upload/management of proxy media could be helpful additional features.

adishj•2mo ago

Hey there! We already support XML exports to DaVinci Resolve, Final Cut Pro, and Premiere Pro!

We also do transcoding of all uploaded files to lower-res proxies, which can be re-linked when brought back into a more traditional NLE.

breadislove•2mo ago

you should check mixedbread out. we support indexing multimodal data and making data ready for ai. we are adding video and audio support by the end of the year. might be interesting for the OP as well.

we have couple investigative journalists and lawyers using us for a similar usecase.

adishj•2mo ago

curious how does this compare to something like Memories.ai when it comes to video in particular?

robotswantdata•2mo ago

Gemini 3 would rip through that problem, but equally you could slice the video with existing open source tooling such as FFMPEG then combine with blender for the video curation. Gemini 3 could probably write you the workflow as well.

mauflows•2mo ago

What part would Gemini do well at? What would you feed it?

kul•2mo ago

Can it work for this use-case? I have lots of 15 seconds to 1 min duration videos) of my kids and want to upload them all (let's say 10 videos) and have the agent make a single video with all the best bits of them?

adishj•2mo ago

yes! you can upload as many videos as you want (file limits currently are at 20GB and 90 minutes, per file). then I'd recommend using either the Rough Cut tile or the Montage tile to stitch them all together. In those tiles, you can prompt particular visual cues in terms of how you want the videos to be combined. Let me know if any questions.

nrhrjrjrjtntbt•2mo ago

Loom for Loom?

adishj•2mo ago

loom is focused on screen recordings / demos

nrhrjrjrjtntbt•2mo ago

It can record video and had AI editing.

dwrodri•2mo ago

Have y'all talked with Max and the Ozone team? Suppose you would have lots to learn from them as you take on this space. Best of luck, video is hard!

adishj•2mo ago

Haven't chatted with them but their platform looks interesting!

Video is hard, but it's also a fun modality which presents some interesting challenges. And is where content is converging towards.

dang•2mo ago

[under-the-rug stub]

[see https://news.ycombinator.com/item?id=45988611 for explanation]

penne_pastaa•2mo ago

this is so cool, can we see some demos of edits you'd make with it?

adishj•2mo ago

thanks! check out the demo video here of the latest version of the interface: https://screen.studio/share/SP7DItVD

i playback parts of the cinematic edit I made to the conversation between Dwarkesh Patel and Satya Nadella (e.g. added cinematic captions, motion graphics)

i can post the full edit as well if you're interested

lava123•2mo ago

YOOOO, this is super awesome. Love this for you all. Lets make life easier for more creators.

adishj•2mo ago

thanks!

shambu2k•2mo ago

Damn, you beat me to it. I was building something similar but got too caught up optimizing the context extraction. I actually ended up building a full spec for it—basically a PoC of "grep for videos."

My end goal was to let an agent make semantic changes (e.g., "remove the parts where the guy in the blue dress is seen") by simply grepping the context spec for the relevant timestamps and using ffmpeg to cut them out.

How are you extracting context from videos?

adishj•2mo ago

how would this be different from vector embeddings / semantic search?

shambu2k•2mo ago

Vector embeddings are fuzzy on finding boundaries. With my spec approach, my goal is to get precise start/end times for ffmpeg to do edits. The downside is, that there is a lot of pre-processing of raw footage in my approach. Vectors win on zero-shot flexibility here.

adishj•2mo ago

if you have an example you could share i'd be very curious on what you mean.

anthonySs•2mo ago

As a creator who films long form content, editing (specifically clipping for short form) is such a nightmare - this solves such a huge problem and the ui is insanely clean.

Will be using this a ton in the future

adishj•2mo ago

great to hear — I'd recommend using the clips tile to create clips, but you can also use the rough cut tile to help edit down the raw footage for the long-form

heyyfurqan•2mo ago

Damn this is good.

adishj•2mo ago

Thank you! :)

sashagoncharov•2mo ago

best of luck guys!!

adishj•2mo ago

thank you! let us know if you have any feedback!

rishabhaiover•2mo ago

When I see a hn post with no critical comments I assume all comments are either seeded or biased (commenting on my own bias)

adishj•2mo ago

scroll down and you'll see all the critical comments about the landing page lol

supportengineer•2mo ago

Submarining - well-known issue on HN.

soperj•2mo ago

it's hilarious how many have less than 5 karma.

skeeter2020•2mo ago

Well, half the comments are a variation of "this is so cool... I'm building something similar" so you'd expect them to be incredibly supportive, and with the churn in the AI field a 6 month old account with 100+ karma is relatively ancient!

filkny•2mo ago

This is one of those ideas that seems obvious after you hear about it, yet somehow didn't exist yet. So many potential applications. Met the founder back in SF and he's one of the coolest, down to earth dudes there is. Best of luck to the team!

adishj•2mo ago

thank you so much for the kind word!

camcaine•2mo ago

Agree this looks very promising.

adishj•2mo ago

thank you! if you get a chance to try it, let me know if you have any feedback

dakshbhatia•2mo ago

You can see the care in every little decision, workflow, and feature — I’ve never had this much fun editing videos.

I didn’t expect great video editing to become democratized so quickly. Kudos to the team!!

- a happy customer

adishj•2mo ago

thanks for the kind word and for being an early supporter!

news4abhi•2mo ago

Been following this team from the early days. Amazing founder story, even better product. Just what people need today

adishj•2mo ago

thanks for the kind word and for being an early supporter!

vivzkestrel•2mo ago

did not get this part "After talking to users though, we realized that the chat UX has limitations for video: (1) the longer the video, the more time it takes to process. Users have to wait too long between chat responses. (2) Users have set workflows that they use across video projects. Especially for people who have to produce a lot of content, the chat interface is a bottleneck rather than an accelerant." what are you processing? frame by frame images?

adishj•2mo ago

what I mean here is that processing / analyzing a beefy file format like video will take longer than processing text input

same with returning that back to the user as manipulated output (text / code generation is much more rapid than rendering a video)

rsancheti•2mo ago

And what’s the plan for determinism? For repeat workflows it’s important that the same pipeline produces the same cut each time. Are node outputs consistent or does the model vary run to run?

adishj•2mo ago

since we're building on top of LLMs which are by nature probabilistic, you won't produce the exact same frame-level cut each time, but of course there is still determinism in the expected outputs

for example, if you have a workflow setup to create 5 clips from a podcast and add b-rolls and captions and reframe to a few different aspect ratios, any time you invoke this workflow (regardless of which podcast episode you're providing as input), you'll get 5 clips back that have b-rolls, captions, and are reframed to a few different aspect ratios

however, which clips are selected, what b-rolls are generated, where they're placed — this is all non-deterministic

you can guide the agent via prompting the tiles individually, but that's still just an input into a non-deterministic machine

satvikpendem•2mo ago

Or just let the user adjust the seed and temperature themselves, or hide it under a checkbox that says deterministic with your chosen seed and temperature.

adishj•2mo ago

good point — we could enable these more granular-level knobs for users if it seems to be something people want

shraey_92•2mo ago

I like the tile-based workflow approach. I’m curious, is integration with tools like 11labs/cartesia or HeyGen on the cards? It would make it much easier to produce influencer-style POV/first-person content using digital avatars and cloned voice-overs.

Also, do you have an API available to trigger workflows programmatically?

sbfeibish•2mo ago

The home page is a big turn off. Change it.

PeakRhinoceros•2mo ago

I found it difficult to understand what Mosaic is based off the home page. It would be helpful to synthesize the overview from the docs so visitors don’t have to navigate around to find out what the tool is. Providing some screenshots would be good too, ideally with a video showcasing how the tool works.

djeastm•2mo ago

That's not very constructive criticism. Change it in what way(s)?

ttoinou•2mo ago

This website seems to be coded with AI. The sign up only works in Chrome not Chromium, and then I'm stuck forever at the step 3 of signing up

South Korean crypto firm accidentally sends $44B in bitcoins to users

Apache Poison Fountain

Web.whatsapp.com appears to be having issues syncing and sending messages

Google in Your Terminal

Shannon: Claude Code for Pen Testing

Anthropic: Latest Claude model finds more than 500 vulnerabilities

Brooklyn cemetery plans human composting option, stirring interest and debate

Why the 'Strivers' Are Right

Brain Dumps as a Literary Form

Agentic Coding and the Problem of Oracles

Malicious packages for dYdX cryptocurrency exchange empties user wallets

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

Arcan Explained: A browser for different webs

What did we learn from the AI Village in 2025?

An open replacement for the IBM 3174 Establishment Controller

The P in PGP isn't for pain: encrypting emails in the browser

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

We Mourn Our Craft

Jim Fan calls pixels the ultimate motor controller

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

AI UX Playground: Real-world examples of AI interaction design

The Field Guide to Design Futures

The Other Leverage in Software and AI

AUR malware scanner written in Rust

Free FFmpeg API [video]

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

South Korean crypto firm accidentally sends $44B in bitcoins to users

Apache Poison Fountain

Web.whatsapp.com appears to be having issues syncing and sending messages

Google in Your Terminal

Shannon: Claude Code for Pen Testing

Anthropic: Latest Claude model finds more than 500 vulnerabilities

Brooklyn cemetery plans human composting option, stirring interest and debate

Why the 'Strivers' Are Right

Brain Dumps as a Literary Form

Agentic Coding and the Problem of Oracles

Malicious packages for dYdX cryptocurrency exchange empties user wallets

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Penisgate erupts at Olympics; scandal exposes risks of bulking your bulge

Arcan Explained: A browser for different webs

What did we learn from the AI Village in 2025?

An open replacement for the IBM 3174 Establishment Controller

The P in PGP isn't for pain: encrypting emails in the browser

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

We Mourn Our Craft

Jim Fan calls pixels the ultimate motor controller

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

AI UX Playground: Real-world examples of AI interaction design

The Field Guide to Design Futures

The Other Leverage in Software and AI

AUR malware scanner written in Rust

Free FFmpeg API [video]

Are AI agents ready for the workplace? A new benchmark raises doubts

Show HN: AI Watermark and Stego Scanner

Clarity vs. complexity: the invisible work of subtraction

Launch HN: Mosaic (YC W25) – Agentic Video Editing

Comments