Subscribe NOW

Text Message our CEO:

650-283-8008

Free Resources

Click Here to learn more

What This AI Meetup Needs is a Webpage

by Larry Chiang on June 19, 2025

Vibe coding w Bitcoin Maxi’s via Socratic #3 topics

Agents

OpenAI Codex: https://openai.com/index/introducing-codex/

Google Coding Agent: https://jules.google
Github Copilot https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/
Claude Code SDK: https://docs.anthropic.com/en/docs/claude-code/sdk

https://github.com/OpenSecretCloud/Maple/issues/120

Amp https://ampcode.com/how-i-use-amp
Don’t build multi agents: https://cognition.ai/blog/dont-build-multi-agents
Anthropic agent research system: https://www.anthropic.com/engineering/built-multi-agent-research-system

LLM memory: https://grantslatton.com/llm-memory
Circuit tracing tools for LLMs https://www.anthropic.com/research/open-source-circuit-tracing
Jack vibing https://github.com/block/goose/issues?q=is%3Apr+is%3Aopen+author%3Ajackjackbits
AI Skeptic: https://fly.io/blog/youre-all-nuts/
Models

CPU music model https://x.com/StabilityAI/status/1922675163411497094
Devstral dev model: https://mistral.ai/news/devstral
New model for windsurf: https://windsurf.com/blog/windsurf-wave-9-swe-1
Apple https://machinelearning.apple.com/research/apple-foundation-models-2025-updates
Claude 4

Gemini 2.5 Family: https://blog.google/products/gemini/gemini-2-5-model-family-expands/
Creative Commons model: https://huggingface.co/blog/stellaathena/common-pile
Magistra (Mistral Thinking Model)l: https://mistral.ai/news/magistral
Psyche

Evals

Eval startups fail: https://thomasliao.com/eval-startups

Privacy

Signal prez https://x.com/mer__edith/status/1925236198672314615
Secure minions https://ollama.com/blog/secureminions

MCP

GitHub MCP vulnerability: https://invariantlabs.ai/blog/mcp-github-vulnerability
Blocks Playbook for MCP: https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers
ChatGPT adds MCP: https://platform.openai.com/docs/mcp

Research

RLVR on Qwen2.5-Math models with completely random or incorrect rewards: https://rethink-rlvr.notion.site/Spurious-Rewards-Rethinking-Training-Signals-in-RLVR-1f4df34dac1880948858f95aeb88872f
Apple paper about reasoning llms https://x.com/RubenHssd/status/1931389580105925115

Claude responds to Apple’s Paper: https://www.alphaxiv.org/abs/2506.09250

ChatGPT is making us stupid: https://www.brainonllm.com/

Socratic #3 Ideas

https://x.com/xai/status/1923183620606619649
Cool podcast about BAML – typesafe DSL for language model interactions. Guy uses small language models instead of LLMs, and makes them expert at narrow tasks. Need for better “ontologies” for multi-modal graph RAG e.g. Lettria. Big fan of Kuzu.

https://open.spotify.com/episode/3agvlypsGrXZa0Hkw4m1wc

New model for windsurf: https://windsurf.com/blog/windsurf-wave-9-swe-1
Psyche https://x.com/NousResearch/status/1922744483571171605
CPU music gen model https://x.com/StabilityAI/status/1922675163411497094
OpenAI Codex: https://openai.com/index/introducing-codex/
LLM memory: https://grantslatton.com/llm-memory
Google Coding Agent: https://jules.google
GitHub Coding Agent: https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/
mobile first gemma model https://developers.googleblog.com/en/introducing-gemma-3n/
Devstral dev model: https://mistral.ai/news/devstral
https://x.com/mer__edith/status/1925236198672314615
Eval startups fail: https://thomasliao.com/eval-startups
OpenAI adds MCP: https://openai.com/index/new-tools-and-features-in-the-responses-api/
Jony Ive startup https://www.bloomberg.com/news/articles/2025-05-21/openai-to-buy-apple-veteran-jony-ive-s-ai-device-startup-in-6-5-billion-deal

https://x.com/bengeskin/status/1925552927885640124

Reason-ModernCOLBERT reasoning-intensive retrieval model https://x.com/antoine_chaffin/status/1925555110521798925
Claude 4 https://www.anthropic.com/news/claude-4

Gemini diffusion https://simonwillison.net/2025/May/21/gemini-diffusion/
The way of code: https://www.thewayofcode.com
Finding 0days in the Linux kernel: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/
No rag for coding agents: https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-for
GitHub MCP vulnerability: https://invariantlabs.ai/blog/mcp-github-vulnerability
Mistral agents api https://x.com/MistralAI/status/1927364741162307702
Relace coding models https://news.ycombinator.com/item?id=44108206
https://x.com/ctnzr/status/1927391895879074047
RLVR on Qwen2.5-Math models with completely random or incorrect rewards: https://rethink-rlvr.notion.site/Spurious-Rewards-Rethinking-Training-Signals-in-RLVR-1f4df34dac1880948858f95aeb88872f
Circuit tracing tools for LLMs https://www.anthropic.com/research/open-source-circuit-tracing
Human in the loop for MCP: https://modelcontextprotocol.io/specification/draft/client/elicitation
AI diplomacy https://x.com/alxai_/status/1928504013030347180
Claude code analysis: https://southbridge-research.notion.site/claude-code-an-agentic-cleanroom-analysis
https://fly.io/blog/youre-all-nuts/
Guide to meta prompting: https://www.prompthub.us/blog/a-complete-guide-to-meta-prompting

Secure minions https://ollama.com/blog/secureminions
New Gemini 2.5 Pro https://blog.google/products/gemini/gemini-2-5-pro-latest-preview/
Creative Commons model: https://huggingface.co/blog/stellaathena/common-pile
Apple paper about llms https://x.com/RubenHssd/status/1931389580105925115
https://x.com/jsngr/status/1932200732994019641
Magistra (Mistral Thinking Model)l: https://mistral.ai/news/magistral
Don’t build multi agents: https://cognition.ai/blog/dont-build-multi-agents
Claude responds to Apple’s Paper: https://www.alphaxiv.org/abs/2506.09250
Anthropic agent research system: https://www.anthropic.com/engineering/built-multi-agent-research-system
ALE Bench: https://sakana.ai/ale-bench/
Blocks Playbook for MCP: https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers
Monitoring sabotage: https://www.anthropic.com/research/shade-arena-sabotage-monitoring
OpenAI preventing misalignment: https://openai.com/index/emergent-misalignment/

Eval Workshop

Why evals

xAI “unauthorized system prompt changes”

Experience with OpenAI Evals

Promptfoo

Tweet generation
Movies: external data in csv, compare models, js evaluation
Jailbreak

Huggingface Rag Evaluation

https://huggingface.co/learn/cookbook/en/rag_evaluation

Ragas

WordPress’d from my personal iPhone, 650-283-8008, number that Steve Jobs texted me on

https://www.YouTube.com/watch?v=ejeIz4EhoJ0

Leave a Comment

Previous post: P R O J E C T W E E G A M E O V E R

Next post: What Agents Do. How City’s Agency Changes

Register Get a $45 Rebate
Larry Chiang on Facebook
Email*
sign up for the Newsletter subscribe to the RSS Feed follow Larry on Twitter
About Duck9

Duck9 is a credit score prep program that is like a Kaplan or Princeton Review test preparation service. We don't teach beating the SAT, but we do get you to a higher credit FICO score using secret methods that have gotten us on TV, Congress and newspaper articles. Say hi or check out some of our free resources before you pay for a thing. You can also text the CEO: 650-283-8008
Awesome Brands