
Vibe coding w Bitcoin Maxi’s via Socratic #3 topics
- Agents
- OpenAI Codex: https://openai.com/index/introducing-codex/
- Google Coding Agent: https://jules.google
- Github Copilot https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/
- Claude Code SDK: https://docs.anthropic.com/en/docs/claude-code/sdk
- Amp https://ampcode.com/how-i-use-amp
- Don’t build multi agents: https://cognition.ai/blog/dont-build-multi-agents
- Anthropic agent research system: https://www.anthropic.com/engineering/built-multi-agent-research-system
- LLM memory: https://grantslatton.com/llm-memory
- Circuit tracing tools for LLMs https://www.anthropic.com/research/open-source-circuit-tracing
- Jack vibing https://github.com/block/goose/issues?q=is%3Apr+is%3Aopen+author%3Ajackjackbits
- AI Skeptic: https://fly.io/blog/youre-all-nuts/
- Models
- CPU music model https://x.com/StabilityAI/status/1922675163411497094
- Devstral dev model: https://mistral.ai/news/devstral
- New model for windsurf: https://windsurf.com/blog/windsurf-wave-9-swe-1
- Apple https://machinelearning.apple.com/research/apple-foundation-models-2025-updates
- Claude 4
- https://www.anthropic.com/news/claude-4
- https://x.com/benhylak/status/1925619639859478644
- https://simonwillison.net/2025/May/25/claude-4-system-prompt/
- Gemini 2.5 Family: https://blog.google/products/gemini/gemini-2-5-model-family-expands/
- Creative Commons model: https://huggingface.co/blog/stellaathena/common-pile
- Magistra (Mistral Thinking Model)l: https://mistral.ai/news/magistral
- Psyche
- Evals
- Eval startups fail: https://thomasliao.com/eval-startups
- Privacy
- Signal prez https://x.com/mer__edith/status/1925236198672314615
- Secure minions https://ollama.com/blog/secureminions
- MCP
- GitHub MCP vulnerability: https://invariantlabs.ai/blog/mcp-github-vulnerability
- Blocks Playbook for MCP: https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers
- ChatGPT adds MCP: https://platform.openai.com/docs/mcp
- Research
- RLVR on Qwen2.5-Math models with completely random or incorrect rewards: https://rethink-rlvr.notion.site/Spurious-Rewards-Rethinking-Training-Signals-in-RLVR-1f4df34dac1880948858f95aeb88872f
- Apple paper about reasoning llms https://x.com/RubenHssd/status/1931389580105925115
- Claude responds to Apple’s Paper: https://www.alphaxiv.org/abs/2506.09250
- ChatGPT is making us stupid: https://www.brainonllm.com/
Socratic #3 Ideas
- https://x.com/xai/status/1923183620606619649
- Cool podcast about BAML – typesafe DSL for language model interactions. Guy uses small language models instead of LLMs, and makes them expert at narrow tasks. Need for better “ontologies” for multi-modal graph RAG e.g. Lettria. Big fan of Kuzu.
- New model for windsurf: https://windsurf.com/blog/windsurf-wave-9-swe-1
- Psyche https://x.com/NousResearch/status/1922744483571171605
- CPU music gen model https://x.com/StabilityAI/status/1922675163411497094
- OpenAI Codex: https://openai.com/index/introducing-codex/
- LLM memory: https://grantslatton.com/llm-memory
- Google Coding Agent: https://jules.google
- GitHub Coding Agent: https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/
- mobile first gemma model https://developers.googleblog.com/en/introducing-gemma-3n/
- Devstral dev model: https://mistral.ai/news/devstral
- https://x.com/mer__edith/status/1925236198672314615
- Eval startups fail: https://thomasliao.com/eval-startups
- OpenAI adds MCP: https://openai.com/index/new-tools-and-features-in-the-responses-api/
- Jony Ive startup https://www.bloomberg.com/news/articles/2025-05-21/openai-to-buy-apple-veteran-jony-ive-s-ai-device-startup-in-6-5-billion-deal
- Reason-ModernCOLBERT reasoning-intensive retrieval model https://x.com/antoine_chaffin/status/1925555110521798925
- Claude 4 https://www.anthropic.com/news/claude-4
- https://x.com/benhylak/status/1925619639859478644
- https://simonwillison.net/2025/May/25/claude-4-system-prompt/
- Gemini diffusion https://simonwillison.net/2025/May/21/gemini-diffusion/
- The way of code: https://www.thewayofcode.com
- Finding 0days in the Linux kernel: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/
- No rag for coding agents: https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-for
- GitHub MCP vulnerability: https://invariantlabs.ai/blog/mcp-github-vulnerability
- Mistral agents api https://x.com/MistralAI/status/1927364741162307702
- Relace coding models https://news.ycombinator.com/item?id=44108206
- https://x.com/ctnzr/status/1927391895879074047
- RLVR on Qwen2.5-Math models with completely random or incorrect rewards: https://rethink-rlvr.notion.site/Spurious-Rewards-Rethinking-Training-Signals-in-RLVR-1f4df34dac1880948858f95aeb88872f
- Circuit tracing tools for LLMs https://www.anthropic.com/research/open-source-circuit-tracing
- Human in the loop for MCP: https://modelcontextprotocol.io/specification/draft/client/elicitation
- AI diplomacy https://x.com/alxai_/status/1928504013030347180
- Claude code analysis: https://southbridge-research.notion.site/claude-code-an-agentic-cleanroom-analysis
- https://fly.io/blog/youre-all-nuts/
- Guide to meta prompting: https://www.prompthub.us/blog/a-complete-guide-to-meta-prompting
- Secure minions https://ollama.com/blog/secureminions
- New Gemini 2.5 Pro https://blog.google/products/gemini/gemini-2-5-pro-latest-preview/
- Creative Commons model: https://huggingface.co/blog/stellaathena/common-pile
- Apple paper about llms https://x.com/RubenHssd/status/1931389580105925115
- https://x.com/jsngr/status/1932200732994019641
- Magistra (Mistral Thinking Model)l: https://mistral.ai/news/magistral
- Don’t build multi agents: https://cognition.ai/blog/dont-build-multi-agents
- Claude responds to Apple’s Paper: https://www.alphaxiv.org/abs/2506.09250
- Anthropic agent research system: https://www.anthropic.com/engineering/built-multi-agent-research-system
- ALE Bench: https://sakana.ai/ale-bench/
- Blocks Playbook for MCP: https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers
- Monitoring sabotage: https://www.anthropic.com/research/shade-arena-sabotage-monitoring
- OpenAI preventing misalignment: https://openai.com/index/emergent-misalignment/
Eval Workshop
- Why evals
- Experience with OpenAI Evals
- Promptfoo
- Tweet generation
- Movies: external data in csv, compare models, js evaluation
- Jailbreak
- Huggingface Rag Evaluation
- Ragas


WordPress’d from my personal iPhone, 650-283-8008, number that Steve Jobs texted me on
https://www.YouTube.com/watch?v=ejeIz4EhoJ0

Duck9 is a credit score prep program that is like a Kaplan or Princeton Review test preparation service. We don't teach beating the SAT, but we do get you to a higher credit FICO score using secret methods that have gotten us on TV, Congress and newspaper articles. Say hi or check out some of our free resources before you pay for a thing. You can also text the CEO:







