This Week's Sponsor:

Turbulence Forecast

Know before you go. Get detailed turbulence forecasts for your exact route, now available 5 days in advance.


Posts tagged with "AI"

Adobe Announces Image and PDF Integration with ChatGPT

Source: Adobe.

Source: Adobe.

Adobe announced today that it has teamed up with OpenAI to give ChatGPT users access to Photoshop, Express, and Acrobat from inside the chatbot. The new integration is available starting today at no additional cost to ChatGPT users.

Source: Adobe.

Source: Adobe.

In a press release to Business Wire, Adobe explains that its three apps can be used by ChatGPT users to:

  • Easily edit and uplevel images with Adobe Photoshop: Adjust a specific part of an image, fine tune image settings like brightness, contrast and exposure, and apply creative effects like Glitch and Glow – all while preserving the quality of the image.
  • Create and personalize designs with Adobe Express: Browse Adobe Express’ extensive library of professional designs to find the best one for any moment, fill in the text, replace images, animate designs and iterate on edits – all directly inside the chat and without needing to switch to another app – to create standout content for any occasion.
  • Transform and organize documents with Adobe Acrobat: Edit PDFs directly in the chat, extract text or tables, organize and merge multiple files, compress files and convert them to PDF while keeping formatting and quality intact. Acrobat for ChatGPT also enables people to easily redact sensitive details.
Source: Adobe.

Source: Adobe.

This strikes me as a savvy move by Adobe. Allowing users to request image and PDF edits and design documents with natural language prompts makes its tools more approachable. That could attract new users who later move to an Adobe subscription to get more control over their creations and Adobe’s other offerings.

From OpenAI’s standpoint, this is clearly a response to the consumer-facing Gemini features that Google has begun releasing, which include new image and video generation tools and reportedly caused Sam Altman to declare a “code red” inside the company. I understand the OpenAI freakout. Google has a huge user base and has been doing consumer products far longer than OpenAI, but I can’t say I’ve been very impressed with Gemini 3. Perhaps that’s simply because I don’t care for generative images and video, but these latest moves by Google and OpenAI make it clear that they see them as foundational to consumer-facing AI tools.


How Stu Maschwitz Vibe Coded His Way Into an App Rejection and What It Means for the Future of Apps

This week on AppStories, Federico and I talked about the personal productivity tools we’ve built for ourselves using Claude. They’re hyper-specific scripts and plugins that aren’t likely to be useful to anyone but us, which is fine because that’s all they’re intended to be.

Stu Maschwitz took a different approach. He’s had a complex shortcut called Drinking Buddy for years that tracks alcohol consumption and calculates your Blood Alcohol Level using an established formula. But because he was butting up against the limits of what Shortcuts can do, he vibe coded an iOS version of Drinking Buddy.

Two things struck me about Maschwitz’s experience. First, the app he used to create Drinking Buddy for iOS was Bitrig, which Federico and I mentioned briefly on AppStories. His experience struck a chord with me:

It’s a bit like building an app by talking to a polite and well-meaning tech support agent on the phone — only their computer is down and they can’t test the app themselves.

But power through it, and you have an app.

That’s exactly how scripting with Claude feels. It compliments you on how smart you are, gets you 90% of the way to the finish line quickly, and then tortures you with the last 10%. That, in a nutshell, is coding with AI, at least for anyone with limited development skills, like myself.

But the second and more interesting lesson from Maschwitz’s post is what it portends for apps in general. App Review rejected Drinking Buddy’s Blood Alcohol Level calculation on the basis of Section 1.4, the Physical Harm rule.

Maschwitz appealed and was rejected, even though other Blood Alcohol Level apps are available on the App Store. However, instead of pushing the rejection with App Review further, Maschwitz turned to Lovable, another AI app creation tool, which generates web apps. With screenshots from his rejected iOS app and a detailed spec in hand, Maschwitz turned Drinking Buddy into a progressive web app.

Maschwitz’s experience is a great example of what we covered on AppStories. App creation tools, whether they generate native apps or web apps, are evolving rapidly. And, while they can be frustrating to use at times, are limited in what they can produce, and don’t solve a myriad of problems like customer support that we detail on AppStories, they’re getting better at code quickly. Whether you’re building for yourself, like we are at MacStories, or to share your ideas with others, like Stu Maschwitz, change is coming to apps. Some AI-generated apps will be offered in galleries inside the tools that created them, others will be designed for the web to avoid App Review, and some will likely live as perpetual TestFlight betas or scripts sitting on just one person’s computer, but regardless of the medium, bringing your ideas to life with code has never been more possible.

Permalink

John Giannandrea’s Retirement From Apple Announced

Today Apple announced the retirement of John Giannandrea, the company’s senior vice president for Machine Learning and AI Strategy. Giannandrea will remain at Apple as an advisor until next spring.

News of Giannandrea’s retirement was paired with an announcement that Apple has hired Amar Subramanya as vice president of AI. Subramanya, who worked at Microsoft since this past summer, previously worked at Google for 16 years on projects including the company’s Gemini Assistant. Subramanya will take the lead on Apple Foundation Models, ML research, and AI Safety and Evaluation, while other areas of Giannandrea’s work will be inherited by Sabih Khan and Eddy Cue.

Apple CEO Tim Cook thanked Giannandrea for his tenure at the company:

We are thankful for the role John played in building and advancing our AI work, helping Apple continue to innovate and enrich the lives of our users. AI has long been central to Apple’s strategy, and we are pleased to welcome Amar to Craig’s leadership team and to bring his extraordinary AI expertise to Apple. In addition to growing his leadership team and AI responsibilities with Amar’s joining, Craig has been instrumental in driving our AI efforts, including overseeing our work to bring a more personalized Siri to users next year.

Given the troubled history of Apple’s AI efforts, the retirement of Giannandrea isn’t surprising. It will be interesting to see if Subramanya settles into his new role given the frequency with which top AI talent tends to turn over in the tech industry.


Why is ChatGPT for Mac So Good?

Great post by Allen Pike on the importance of a great app experience for modern LLMs, which I recently wrote about. He opens with this line, which is a new axiom I’m going to reuse extensively:

A model is only as useful as its applications.

And on ChatGPT for Mac specifically:

The app does a good job of following the platform conventions on Mac. That means buttons, text fields, and menus behave as they do in other Mac apps. While ChatGPT is imperfect on both Mac and web, both platforms have the finish you would expect from a daily-use tool.

[…]

It’s easier to get a polished app with native APIs, but at a certain scale separate apps make it hard to rapidly iterate a complex enterprise product while keeping it in sync on each platform, while also meeting your service and customer obligations. So for a consumer-facing app like ChatGPT or the no-modifier Copilot, it’s easier to go native. For companies that are, at their core, selling to enterprises, you get Electron apps.

I don’t hate Electron as much as others in our community, but I can’t deny that ChatGPT is one of the nicest AI apps for Mac I’ve used. The other is the recently updated BoltAI. And they’re both native Mac apps.

Permalink

The AI App Experience Matters More Than Benchmarks Now

Different experiences with app connectors in Claude, Perplexity, and ChatGPT.

Different experiences with app connectors in Claude, Perplexity, and ChatGPT.

I was catching up on different articles after the release of Claude Opus 4.5 earlier this week, and this part from Simon Willison’s blog post about it stood out to me:

I’m not saying the new model isn’t an improvement on Sonnet 4.5—but I can’t say with confidence that the challenges I posed it were able to identify a meaningful difference in capabilities between the two.

This represents a growing problem for me. My favorite moments in AI are when a new model gives me the ability to do something that simply wasn’t possible before. In the past these have felt a lot more obvious, but today it’s often very difficult to find concrete examples that differentiate the new generation of models from their predecessors.

This is something that I’ve felt every few weeks (with each new model release from the major AI labs) over the past year: if you’re really plugged into this ecosystem, it can be hard to spot meaningful differences between major models on a release-by-release basis. That’s not to say that real progress in intelligence, knowledge, or tool-calling isn’t being made: benchmarks and evaluations performed by established organizations tell a clear story. At the same time, it’s also worth keeping in mind that more companies these days may be optimizing their models for benchmarks to come out on top and, more importantly, that the vast majority of folks don’t have a suite of personal benchmarks to evaluate different models for their workflows. Simon Willison thinks that people who use AI for work should create personalized test suites, which is something I’m going to consider for prompts that I use frequently. I also feel like Ethan Mollick’s advice of picking a reasoning model and checking in every few months to reassess AI progress is probably the best strategy for most people who don’t want to tweak their AI workflows every other week.

Read more


I Finally Tested the M5 iPad Pro’s Neural-Accelerated AI, and the Hype Is Real

The M5 iPad Pro.

The M5 iPad Pro.

The best kind of follow-up article isn’t one that clarifies a topic that someone got wrong (although I do love that, especially when that “someone” isn’t me); it’s one that provides more context to a story that was incomplete. My M5 iPad Pro review was an incomplete narrative. As you may recall, I was unable to test Apple’s promised claims of 3.5× improvements for local AI processing thanks to the new Neural Accelerators built into the M5’s GPU. It’s not that I didn’t believe Apple’s numbers. I simply couldn’t test them myself due to the early nature of the software and the timing of my embargo.

Well, I was finally able to test local AI performance with a pre-release version of MLX optimized for M5, and let me tell you: not only is the hype real, but the numbers I got from my extensive tests over the past two weeks actually exceed Apple’s claims.

Read more


Trying to Make Sense of the Rumored, Gemini-Powered Siri Overhaul

Quite the scoop from Mark Gurman yesterday on what Apple is planning for major Siri improvements in 2026:

Apple Inc. is planning to pay about $1 billion a year for an ultrapowerful 1.2 trillion parameter artificial intelligence model developed by Alphabet Inc.’s Google that would help run its long-promised overhaul of the Siri voice assistant, according to people with knowledge of the matter.

There is a lot to unpack here and I have a lot of questions.

Read more


On MiniMax M2 and LLMs with Interleaved Thinking Steps

MiniMax M2 with interleaved thinking steps and tools in TypingMind.

MiniMax M2 with interleaved thinking steps and tools in TypingMind.

In addition to Kimi K2 (which I recently wrote about here) and GLM-4.6 (which will become an option on Cerebras in a few days, when I’ll play around with it), one of the more interesting open-source LLM releases out of China lately is MiniMax M2. This MoE model (230B parameters, 10B activated at any given time) claims to reach 90% of the performance of Sonnet 4.5…at 8% the cost. You can read more about the model here; Simon Willison blogged about it here; you can also test it with MLX on an Apple silicon Mac.

What I find especially interesting about M2 is that it’s the first model to support interleaved thinking steps in between responses and tool calls, which is something that Anthropic pioneered with Claude Sonnet 4 back in May. Here’s Skyler Miao, head of engineering at MiniMax, in a post on X (unfortunately, most of the open-source AI community is only active there):

As we work more closely with partners, we’ve been surprised how poorly community support interleaved thinking, which is crucial for long, complex agentic tasks. Sonnet 4 introduced it 5 months ago, but adoption is still limited.

We think it’s one of the most important features for agentic models: it makes great use of test-time compute.

The model can reason after each tool call, especially when tool outputs are unexpected. That’s often the hardest part of agentic jobs: you can’t predict what the env returns. With interleaved thinking, the model could reason after get tool outputs, and try to find out a better solution.

We’re now working with partners to enable interleaved thinking in M2 — and hopefully across all capable models.

I’ve been using Claude as my main “production” LLM for the past few months and, as I’ve shared before, I consider the fact that both Sonnet and Haiku think between steps an essential aspect of their agentic nature and integration with third-party apps.

That being said, I have been testing MiniMax M2 on TypingMind in addition to Kimi K2 for the past week and it is, indeed, impressive. I plugged MiniMax M2 into TypingMind using their Anthropic-compatible endpoint; out of the box, the model worked with interleaved thinking and the several plugins I’ve built for myself in TypingMind using Claude. I haven’t used M2 for any vibe-coding tasks yet, but for other research or tool-based queries (like adding notes to Notion and tasks to Todoist), M2 effectively felt like a version of Sonnet not made by Anthropic.

Right now, MiniMax M2 isn’t hosted on any of the fast inference providers; I’ve accessed it via the official MiniMax API endpoint, whose inference speed isn’t that different from Anthropic’s cloud. The possibility of MiniMax M2 on Cerebras or Groq is extremely fascinating, and I hope it’s in the cards for the near future.


AI Experiments: Fast Inference with Groq and Third-Party Tools with Kimi K2 in TypingMind

Kimi K2, hosted on Groq, running in TypingMind with a custom plugin I made.

Kimi K2, hosted on Groq, running in TypingMind with a custom plugin I made.

I’ll talk about this more in depth in Monday’s episode of AppStories (if you’re a Plus subscriber, it’ll be out on Sunday), but I wanted to post a quick note on the site to show off what I’ve been experimenting with this week. I started playing around with TypingMind, a web-based wrapper for all kinds of LLMs (from any provider you want to use), and, in the process, I’ve ended up recreating parts of my Claude setup with third-party apps…at a much, much higher speed. Here, let me show you with a video:

Kimi K2 hosted on Groq on the left.Replay

Read more