AI Agents can do things for you in the real world

In partnership with

❝

"Argue with idiots, and you become an idiot."

Paul Graham

Date: {{current_date_full_with_day}}

Hey {{first_name | AI enthusiast}},

This week’s edition connects a few threads that matter more than they first appear.

One piece shows how a simple prompt tweak improves model accuracy without bigger models or more compute. Another highlights how AI subscriptions are evolving into infrastructure bundles.

And a deeper analysis asks how OpenAI actually builds a durable advantage (or not!). Nisha Pillai dives into Claude Cowork use cases.

Individually, these feel tactical. Together, they signal something larger. AI is becoming operating infrastructure.

In this edition

Google’s AI Pro and Ultra Plans Raise the Stakes f …
AI & Error: The Employee You Don't Have to Watch
OpenAI’s Value Trap: Why Lead in AI Isn’t a Lead i …
🗂️ Learn AI: How to Create 10 Thumbnail Concepts …
AI Agents can now do things for you in the real wo …
Learn AI : Repeat your Prompts /Repeat your Prompt …
References

I hope you enjoy reading this edition!

— Renjit Philip

PS: If you want to unleash the power of AI agents to grow your business, setup time speak to me, here»

Set up time to discuss

Google’s AI Pro and Ultra Plans Raise the Stakes for Developers

Google just reshaped its AI subscription lineup. The company introduced new Premium tiers aimed at developers and power users who want more access, more speed, and more scale.

Here’s what changed and why it matters.

• Google launched upgraded AI plans called Pro and Ultra, designed for developers building serious workloads rather than casual experiments.

• 🚀 The Pro plan focuses on higher usage limits and stronger performance, giving builders more room to test, iterate, and deploy without hitting ceilings too quickly.
• The Ultra tier pushed this further. It targets advanced users who needed maximum throughput, priority access, and deeper integration across tools.
• 🧠 These plans build on Google’s broader AI ecosystem, connecting model access with developer tooling in a more structured way.

• The move signals a shift from general-purpose AI access to tiered, workload-based offerings. Instead of one-size-fits-all, Google segmented by usage intensity.

• ⚙️ Developers who previously worked around rate limits or staggered tasks can now plan with more predictable capacity. That changed how teams budgeted compute and scheduled releases.
• The structure also reflects growing demand from startups and enterprises. As more teams embedded AI into products, reliability and scale became core requirements, not optional upgrades.

• 📈 The announcement showed how AI platforms are evolving into infrastructure businesses. Pricing, access tiers, and performance guarantees are now part of the strategic conversation.

So what?

This was not just a pricing update. It marks a maturing phase in the AI market. As usage deepens, providers like Google are packaging AI less like a feature and more like a core utility. AI access is no longer just about intelligence. It is about scale, speed, and structure. The developers who understand that shift will build with fewer limits and clearer economics.

AI & Error: The Employee You Don't Have to Watch

By Guest Writer: Nisha Pillai

Most AI tools assume you're sitting next to them. Cowork assumes you're not.

I've spent the last few months building AI systems that need me in the room, and writing about them. Leo needs me to check in. The quiz generator needed me to iterate on difficulty calibration. The workflows I've built AI assistants for need me to run the workflow, review, edit, and iterate as needed. Everything follows the same pattern: me and the AI, working together, in real time. This is the copilot pattern that most existing AI tools optimize for.

Recently, we’ve been hearing a lot about agents that work independently. Agents optimize for the delegation pattern. Describe the output you need, hand it off, go do something else, come back to finished work product. Anthropic calls this the "AI employee" pattern when they released Claude Cowork. I wanted to know if it actually holds up.

Cowork is Anthropic's new desktop agent, currently in research preview, available on all paid plans. It runs in a local VM and works with permissions you explicitly provide. It doesn't access your filesystem by default. I chose this platform (instead of, say, OpenClaw) because of that baseline security guarantee, and because I've gravitated to Claude Code as my default for AI-assisted tasks over the past few months.

Since I'm authenticating into the same subscription account across all surfaces (mobile app, terminal, browser, and now - the Cowork desktop client), I can maintain workflow continuity without switching context across devices.

Testing the "delegate to an employee" model

I gave Cowork a messy folder full of documents of different types and asked it to organize them logically, rename anything that didn't have a descriptive name, clearly identify duplicates, and create a summary file describing the contents of the folder. It did. Flawlessly. It helps that Cowork can look inside documents to see what they are. So it can, for example, review "receipt.jpg" and rename it to "2026_02_12_blue_bottle_coffee_receipt.jpg".

I then pointed it at a folder full of receipts and told it to create an expense tracking spreadsheet after organizing the receipts, add totals, and include a cross-reference so I can find source files if I needed to check the originals. Then I went on to do other things. When I came back to check, there was a finished and formatted Excel file with working formulas and an audit trail linking each number back to its source. It just did it.

PowerPoint generation worked similarly — correct formatting, proper structure. More interesting: Cowork can store brand guidelines and templates, so the output matches your organization's look instead of defaulting to whatever the AI decides looks nice. I haven't tested this with my actual templates yet, but that's the obvious next step.

The delegation has worked really well for anything I could fully describe upfront — formatting, data extraction, file generation. If the output spec was clear, the handoff was clean and the result was exactly what I wanted. The tool also allows you to save workflow elements as skills so they're easy to repeat. I'll be using the expense tracking skill again, and refining it as we go.

Where it fell apart

Browser automation is the feature that sounds best on paper. Cowork can navigate your authenticated browser sessions — dashboards, CRMs, internal wikis — and pull data. In corporate environments, this breaks almost immediately. VPN requirements, multi-factor auth prompts, session timeouts, portals that redirect through three different login screens. Cowork itself told me browser automation is "hit or miss" and recommended I handle the authentication while it handles the extraction. Honest, but a long way from the pitch.

And it crashes. VM restarts, error messages that don't mean much, tasks that stall mid-execution. This is expected for research preview software and it should stabilize quickly. Right now, plan extra time if you need to finish on a deadline.

It also burns through your plan allocation faster than regular chat. Complex multi-step tasks are compute-intensive. Batching related work in a single session helps. Quick questions that don't need file access are still better served by regular Claude chat.

When to use what

I now use three Claude surfaces regularly, and they serve different purposes. Regular chat is for thinking out loud — asking questions, evolving my reasoning, drafting things where the direction shifts as I go. Claude Code runs independently and can go for hours as an agent, but it lives in the terminal and works with code and files in that environment.

As fast as it caught on with technically inclined users, this model won’t scale to every computer user. Cowork fills that gap. It also works independently, but the difference is what it produces. It creates Excel files with working formulas, PowerPoint decks in correct format, and documents — the kind of Office output that most knowledge workers actually hand to someone else.

Cowork is betting that a meaningful chunk of knowledge work is describable, delegatable, and not worth your attention while it's happening. After a week of testing, I think that bet is more right than wrong. The delegation pattern has implications beyond personal productivity.

If an AI can produce correctly formatted Excel files, PowerPoint decks, and organized documents from a description — and do it reliably — that's a chunk of work that currently sits with junior staff, contractors, or SaaS tools charging per-seat fees. The infrastructure isn't reliable enough today, but the direction does matter.

Nisha Pillai transforms complexity into clarity for organizations from Silicon Valley startups to Fortune 10 enterprises. A patent-holding engineer turned MBA strategist, she bridges technical innovation with business execution—driving transformations that deliver measurable impact at scale. Known for her analytical rigor and grounded approach to emerging technologies, Nisha leads with curiosity, discipline, and a bias for results. Here, she is testing AI with healthy skepticism and real constraints—including limited time, privacy concerns, and an allergy to hype. Some experiments work. Most don't. All get documented here.

OpenAI’s Value Trap: Why Lead in AI Isn’t a Lead in Business

A recent essay by Benedict Evans, laid out a clear, no-fluff question for founders and tech leaders: OpenAI’s dominance in generative AI looks big on paper, but beneath the surface there are cracks in how the business captures real value and fights off competition.

Key insights from the article

🔍 OpenAI today doesn’t have tech that no one else can match; rival companies are producing models with similar core capabilities, and breakthroughs jump back and forth between them rather than leaving competitors behind.

📉 The company’s huge user count doesn’t translate into deep engagement; most people use the product a few times a week or less, and only a small percentage pay for access, meaning the connection between users and revenue is weak.

💡 Unlike classic tech winners like operating systems or mobile platforms, OpenAI hasn’t yet created a self-reinforcing network effect where more users draw more developers and more usage in a loop that’s hard for others to disrupt.

- Many competitors are building around models with their own distribution and workflow advantages; the incumbents aren’t just matching the technology, they’re embedding it deeper into everyday tools where people already work.

🔄 Internally, the company has tried many strategic plays at once, from apps to infrastructure pushes, in what sometimes looks like a “flood the zone” approach rather than a single cohesive product strategy.

⚙️ There’s a vision laid out to build a full tech stack with chips, infrastructure, cloud and developer tools, aiming for a platform that others build on; this mirrors classic platform playbooks but the conditions that made past winners like Windows or iOS successful aren’t clearly present yet.

🧠 At its core the piece argues that OpenAI’s situation today is far more about finding product-market fit and durable value capture than just having cutting-edge models or a big user count.

So what?

This isn’t a story of a rival catching up; it’s a reminder that real advantage in tech comes not just from breakthrough models but from deep engagement, real products people rely on daily, and business mechanics that turn users into lasting value.

🗂️ Learn AI: How to Create 10 Thumbnail Concepts Without a Designer

I found this tutorial in Alvaro Cintas’s excellent newsletter. It won’t make you better than Mr. Beast at making thumbnails, but will get you close enough!

This tutorial walks through a simple workflow to generate professional YouTube thumbnails using Nano Banana Pro. No Photoshop. No external designer. Just structured prompting.

🧰 Who This Is For

Content creators who want higher click-through rates.
Solopreneurs building their own brand.
Marketers running fast A/B tests.
Beginners who find design tools overwhelming.

Step 1: Open Nano Banana Pro

Go to your Gemini workspace.
Navigate to Tools.
Select “Create images.”
Switch the model to Nano Banana Pro; it may appear as the Thinking model.

You’ll now see a prompt box optimized for structured visual reasoning. This model handles layout logic and composition better than basic image generators.

Step 2: Lock Your Identity

Consistency matters. Viewers recognize faces instantly.

Upload a high-quality headshot.
Use a structured prompt such as:

“Use the person from this image. Keep facial features identical. Change expression to shocked. Place subject on the left third of the frame pointing right.”

This keeps your face consistent across every variation while allowing emotional contrast; shocked, excited, confused, or serious.

Step 3: Generate 10 Concepts at Once

Instead of creating one thumbnail, generate multiple structured variations. This speeds up testing and improves selection quality.

Use a prompt like:

“Generate 10 unique thumbnail variations for a video about ‘[Your Topic]’. Include one Before vs After split, one glowing mystery box concept, and one bold 3D neon text layout reading ‘[Your Title]’. Use high contrast and 4K resolution.”

The model reasons through layout, spacing, and typography placement. You receive different compositions ready for review.

Step 4: Refine Using Natural Language

Do not restart from scratch if one concept is close but imperfect.

Use follow-up instructions such as:

“I like concept 3. Change the background to a dark tech studio with purple neon lighting. Make the arrow larger and brighter.”

The model edits in context. This preserves composition while improving specific elements.

Step 5: Export and Test

Once satisfied, download the thumbnail in 2K or 4K resolution.

Upload to YouTube.
Run A/B testing if available.
Track CTR and retention impact.

Final Takeaway

Thumbnail performance often drives more growth than the video itself. Structured prompting and batch variation let you test creative angles quickly. Instead of waiting on a designer, you control the iteration cycle directly.

AI Agents can now do things for you in the real world?

A startup called DoAnythingApp has created an AI agent that can do activities even in the physical realm.

This is the post by the founder (@thegarretscott on X)

“I 100% believe Gemini Pro 3.1 can one-shot a coffee shop into existence.

Yesterday I ran our "Open and run a coffee shop in SF" benchmark with Gemini Pro 3.1 on
@doanythingapp.

This morning it reached out to me with a status update that included:
- a location ready that it already discussed with a broker - a brand/site
- a weeks worth of Instagram posts ready
- actively talking with a bank about an SBA loan terms
- LLC ready to file
- An (sic) full plan to get open with full financials
- Found and reached out to investors
- Emailed the city for permit guidance
- Came up with a ton of creative ideas that make the coffeeshop one I'd actually want to go to
- Plan to survey the neighborhood for feedback

It's the first model that I'm confidant will achieve the benchmark.

Starting a few more agents with the same task in different cities, and will post an update on their performance as they continue to work.”

Read the full thread here»

Will it work? It is worth watching and I have to admit this is entertaining stuff as the founder posts updates…And the implications are endlessly enthusing and somewhat sobering for humans!

Learn AI : Repeat your Prompts /Repeat your Prompts

A machine learning paper recently showed that something as simple as repeating the input text can make many popular AI models perform better on certain tasks. It’s not about bigger models or more compute; it’s about how the prompt is structured.

Key insights from the paper

🧠 The core idea tested was called prompt repetition. Instead of sending a single prompt to a model, the researchers doubled it up so the exact same text appeared twice in a row.

💡 This change targeted non-reasoning tasks. That means questions or classification tests where the model isn’t asked to do complex step-by-step reasoning.

📊 When researchers ran benchmarks on several leading language models including Gemini, GPT, Claude, and Deepseek, they found the repeated prompt method won more cases than it didn’t. In fact, across many tasks it improved accuracy without hurting generation speed or making the outputs longer.

🔄 One practical twist here is about how these models read text. They process tokens in sequence, left to right, so they sometimes miss context that appears later if the prompt isn’t framed well. Repeating the prompt effectively gives each part of the instruction a second chance to be seen in full context. That helped accuracy without adding latency.

⚙️ The paper also tested variations like repeating the prompt more than twice, and compared the technique to padding the input with extra junk. The gains were real for repetition and not just because the input was longer.

📈 Across tests, the technique brought wins without losses in the non-reasoning setups. When reasoning prompts were enabled, repetition didn’t hurt but only gave slight gains or neutral results.

So what?

For teams integrating LLMs into products, this work shows you don’t always need bigger models or fancy training tricks. Sometimes you just need to rethink how you ask the question.

References

Google AI Pro & Ultra Now Include Developer Program Premium Benefits- https://blog.google/innovation-and-ai/technology/developers-tools/gdp-premium-ai-pro-ultra/?utm_source=www.onemorethinginai.com&utm_medium=newsletter&utm_campaign=newsletter95
Prompt Repetition Improves Non-Reasoning LLM Performance- https://arxiv.org/pdf/2512.14982?utm_source=www.onemorethinginai.com&utm_medium=newsletter&utm_campaign=newsletter95
How Will OpenAI Compete? – Benedict Evans- https://www.ben-evans.com/benedictevans/2026/2/19/how-will-openai-compete-nkg2x?utm_source=www.onemorethinginai.com&utm_medium=newsletter&utm_campaign=newsletter95
Gemini App Update (X Post)- https://x.com/GeminiApp/status/2024516781302497524?s=20&utm_source=www.onemorethinginai.com&utm_medium=newsletter&utm_campaign=newsletter86
Garrett Scott on AI Agents Benchmark (X Post)- https://x.com/thegarrettscott/status/2025337142134616385?s=20&utm_source=www.onemorethinginai.com&utm_medium=newsletter&utm_campaign=newsletter95
Simplifying Complexity- https://www.simplifyingcomplexity.tech/?utm_source=www.onemorethinginai.com&utm_medium=newsletter&utm_campaign=newsletter95

How did you like this edition?

Your feedback helps us to improve.