Lots of people are making moves in this space (including Anthropic), but nothing has broken through to the mainstream.
Why can't one set up a prompt, test it against a file, then once it is working, apply it to each file in a folder in a batch process which then provides the output as a single collective file?
Not sure what OS you're on, but in Windows it might look like this:
FOR %%F IN (*.txt) DO (TYPE "%%F" | llm -s "execute this prompt" >> "output.txt)
The limits are still buggy responses - Claude often gets stuck in a useless loop if you overfeed it with files - and lack of consistency. Sometimes hand-holding is needed to get the result you want. And it's slow.
But when it works it's amazing. If the issues and limitations were solved, this would be a complete game changer.
We're starting to get somewhat self-generating automation and complex agenting, with access to all of the world's public APIs and search resources, controlled by natural language.
I can't see the edges of what could be possible with this. It's limited and clunky for now, but the potential is astonishing - at least as radical an invention as the web was.
LLM-desktop interfaces make great demos, but they are too slow to be usable in practice.
Tackling individual use-cases is supposed to be something for third party "ecosystem" companies to go after, not the mothership.
Not sure what Anthropic and co can do about that, but integrations feel like a step in the wrong direction. Whenever I've tried tool use, it was orders of magnitude more expensive and generally inferior to a simple model call with curated context from SerpApi and such.
A truly useful AI assistant has context on my last 100,000 emails - and also recalls the details of each individual one perfectly, without confusion or hallucination.
Obviously I’m setting a high bar here; I guess what I’m saying is “yes, and”
I’m a bit skeptical that it’s gonna work out of the box because of the amount of custom fields that seem to be involved to make successful API requests in our case.
But I would welcome, not having to solve this problem. Jira’s interface is among the worst of all the ticket tracking applications I have encountered.
But, I have found using a LM conversation paired within enough context about what is involved for successful POSTs against the API allow me to create update and relate issues via curl.
It’s begging for a chat based LLM solution like this. I’d just prefer the underlying model not be locked to a vendor.
Atlassian should be solving this for its customers.
That or they're pulling an OpenAI and launching a feature that isn't actually fully live.
LLMs were always a fun novelty for me until OpenAI DeepResearch which started to actually come up with useful results on more complex programming questions (where I needed to write all the code by hand but had to pull together lots of different libraries and APIs), but it was limited to 10/month for the cheaper plan. Then Google Deep Research upgraded to 2.5 Pro and with paid usage limits of 20/day, which allowed me to just throw everything at it to the point where I'm still working through reports that are a week or more old. Oh and it searched up to 400 sources at a time, significantly more than OpenAI which made it quite useful in historical research like identifying first edition copies of books.
Now Claude is releasing the same research feature with integrations (excited to check out the Cloudflare MCP auth solution and hoping Val.town gets something similar), and a run time of up to 45 minutes. The pace of change was overwhelming half a year ago, now it's just getting ridiculous.
However, unfortunately, I cannot shower much praise on Claude 3.7. And if you (or anyone) asks why - 3.7 seems much better than 3.5, surely? - Then I’m moderately sure that you use Claude much more for coding than for any kind of conversation. In my opinion, even 3.5 Haiku (which is available for free during high loads) is better than 3.7 Sonnet.
Here’s a simple test. Try asking 3.7 to intuitively explain anything technical - say, mass dominated vs spring dominated oscillations. I’m a mechanical engineer who studied this stuff and I could not understand 3.7’s analogies.
I understand that coders are the largest single group of Claude’s users, but Claude went from being my most used app to being used only after both chatgpt and Gemini, something that I absolutely regret.
Hope one day it will be practical to do nightly finetunes of a model per company with all core corporate data stores.
This could create a seamless native model experience that knows about (almost) everything you’re doing.
In case the above link doesn't work later on, the page for this demo day is here: https://demo-day.mcp.cloudflare.com/
Truly, OSS should be more interesting in the next decade for this alone.
This does not sound like it would be learning general information helpful across an industry, but specific, actionable information.
If not available now, is that something that AI vendors are working toward? If so, what is to keep them from using that knowledge to benefit themselves or others of their choosing, rather than the people they are learning from?
While people understand ethics, morals and legality (and ignore them), that does not seem like something that an AI understands in a way that might give them pause before doing an action.
Perhaps I am just frivolous with my own time, but I tend to use LLMs in a more iterative way for research. I get partial answers, probe for more information, direct the attention of the LLM away from areas I am familiar and towards areas I am less familiar. I feel if I just let it loose for 45 minutes it would spend too much time on areas I do not find valuable.
This seems more like a play for "replacement" rather than "augmentation". Although, I suppose if I had infinite wealth, I could kick of 10+ research agents each taking 45 minutes and then review their output as it became available, then kick off round 2, etc. That is, I could do my process but instead of interactively I could do it asynchronously.
As for long research times, one thing I’ve been using it for is historical research on old books. Gemini DeepResearch was the first one able to properly explain the nuances of identifying a first edition Origin of Species after taking half an hour and reading 400 sources. It went into all the important details like spelling errors and the properties of chimeral FY2** copies found in various libraries around the world.
Give us an LLM with better reasoning capabilities, please! All this other stuff just feels like a distraction.
I've been using the Atlassian MCP for nearly a month now, and it's completely changed (and eliminated) the feeling of having an overwhelming backlog.
I can have it do things like "find all the tickets related to profile editing and combine them into one epic" where it works perfectly. Or "help me prioritize the 15 tickets assigned to me this sprint" and it'll actually go through and suggest "maybe you can do these two tickets first since they seem smaller, then do this big one" – i haven't hooked it up to my calendar yet.
But I'd love for it to suggest things like "do this one ticket that requires a lot of heads down time on wednesday since you don't have any meetings. I can create a block on your calendar so that nobody will schedule a meeting then"
Those are all superhuman things that can be done with MCP and a smart model.
I've defined rules in cursor that say "when I ask you to mark something ready for test, change the status and assign it to <x person>, and leave a comment summarizing the changes"
If you look at my JIRA comments now, you'd wonder how I had so much time to write such thorough comments. I don't, Cursor and whatever model is doing it for me.
It's been an absolute game changer. MCP is going to be what the App store was to mobile. Yes you can get by without it, but actually hooking into all your daily tool is when this stuff gets insanely valuable in a practical sense.
How do your colleagues feel about it?
What it _doesn't_ seem to yet mitigate is prompt injection attacks, where a tool call description of one tool convinces the model to do something it shouldn't (like send sensitive data to a server owned by the attacker.) I think these concerns are a little bit overblown though; things like pypi and the Chrome Extension store scare me more and it doesn't stop them from mostly working.
I love MCP (it’s way better than plain Claude) but even that runs into context walls.
> a new way to connect your apps and tools to Claude. We're also expanding... with an advanced mode that searches the web.
The notion of software eating the world, and AI accelerating that trend, always seems to forget that The World is a vast thing, a physical thing, a thing that by its very nature can never be fully consumed by the relentless expansion of our digital experiences. Your worldview /= the world.
The cynic would suggest that the teams that build these tools should go touch grass, but I think that misses the mark. The real indictment is of the sort of thinking that improvements to digital tools [intelligences?] in and of themselves can constitute truly substantial and far reaching changes.
The reach of any digital substrate inherently limited, and this post unintentionally lays that bare. And while I hear accelerationists invoking "robots" as the means for digital agents to expand their potent impact deeper into the real world I suggest this is the retort of those who spend all day in apps, tools, and the web. The impacts and potential of AI is indeed enormous, but some perspective remains warranted and occasional injections of humility and context would probably do these teams some good.
Being Apple, they would have to come up with something novel like they did with push (where you have _one_ OS process running that delegates to apps rather than every app trying to handle push themselves) rather than having 20 MCP servers running. But I think if they did this properly, it would be so amazing.
I hope Apple is really re-thinking their absolutely comical start with AI. I hope they regroup and hit it out of the park (like how Google initially stumbled with Bard, but are now hitting it out of the park with Gemini)
People will say 'aaah ad company' (me too sometimes) but I'd honestly trust a Google AI tool with this way more. Not just because it already has access to my Google Workspace obviously, but just because it's a huge established tech firm with decades of experience in trying not to lose (or have taken) user data.
Even if they get the permissions right and it can only read my stuff if I'm just asking it to 'research', now Anthropic has all that and a target on their backs. And I don't even know what 'all that' is, whatever it explored deeming it maybe useful.
Maybe I'm just transitioning into old guy not savvy with latest tech, but I just can't trust any of this 'go off and do whatever seems correct or helpful with access to my filesystem/Google account/codebase/terminal' stuff.
I like chat-only (well, +web) interactions where I control the input and taking the output, but even that is not an experience that gives me any confidence in giving uncontrolled access to stuff and it always doing something correct and reasonable. It's often confidently incorrect too! I wouldn't give an intern free reign in my shell either!
behnamoh•2h ago
pcwelder•1h ago