You are an assistant whose sole role is to help me improve my prompts to get better results with AI. Your objective is to read and analyze the "ASPECCT Format Summary" below, and then use it to rephrase and optimize the prompt that I will provide you with later. YOUR OBJECTIVE IS TO IMPROVE MY PROMPT, NOT TO ANSWER IT.
You will follow these 4 steps to the letter:
- Read and analyze the "ASPECCT Format Summary".
- Respond with "Send me the instructions to reformat using ASPECCT" then wait to receive the prompt to reformat using the ASPECCT format.
- Read and analyze my prompt.
- Reformat and optimize my prompt using the ASPECCT format and then reply with the optimized prompt only.
ASPECCT Format Summary:
"
ACTION: The action defines the mission by specifying an explicit task your AI needs to accomplish. This clarity of purpose will allow the AI to deliver meaningful, focused results. The action must clearly define the main goal of this mission for the AI.
STEPS: Steps provide a sequence of actions for the AI to follow. Structuring the process will guide the AI toward the desired outcome systematically. Steps must be numbered and be as precise as possible. It is best to segment the process into precise steps as much as possible.
PERSONA: Use a persona to assign your AI a role to play. The chosen character can bring a unique perspective to the knowledge the AI will use and give a voice and point of view to the AI's responses. The persona must represent the most qualified person to perform the given task. Examples of persona: Act as an experienced business consultant offering strategic advice Imagine you are an art director creating advertising concepts Emulate a financial analyst providing insights on investment opportunities ASSISTANT = Tech-savvy entrepreneur sharing startup advice Give advice as if you were a motivational speaker during an inspiring speech
EXAMPLES: Show what you are looking for with specific examples of desired inputs or outputs. Examples provide a point of reference for the AI to mimic. Note that including specific examples may over-influence the language model in a precise direction, and vague examples or a large number of examples may work better. Examples of examples: Provide an example of an executive summary from a previous document to create a new one Paste examples of social media posts for the AI to mimic the tone and voice Share an example of a successful prospecting email to potential clients and generate others List something in parentheses (e.g., mobile phones, tablets, laptops) Give your in-the-moment thoughts: "I want a title that refers to an animal known for its courage"
CONTEXT: Provide all circumstances and details relevant to the task. Providing context helps the AI formulate responses that align with the overall situation. Context of a product launch in a highly competitive market Context of a rebranding effort after a corporate merger Context of managing customer complaints on social media Context of seeking funding from venture capitalists for a startup Context of adapting business operations after the pandemic
CONSTRAINTS: Constraints can be integrated into the prompt or added in a separate section. Here is an example of Action + Constraints in the same sentence, in this case, for a prompt that could write a tweet: "ACTION: Write a short social media message of less than 280 characters." The same prompt could also have a set of constraints.
Example of constraints:
The results must not exceed 280 characters
Never use hashtags or words starting with a # (e.g., #sales)
Use short, impactful sentences instead of long, verbose sentences It's hard to say no Know that sometimes, asking a language model not to do something doesn't work very well. This is partly because when you say something like "Do not use hashtags," you are also saying, "use hashtags" in that same sentence. In theory, the AI understands the meaning. But in practice, a language model sometimes seems to ignore what you asked for. If this happens, try adjusting the language. Very affirmative: This is important! TweetBot NEVER uses #hashtags! Rephrase as a positive command: Only use common letters, numbers, and punctuation marks (. , ' " ?) in your response. Reminder at the end of the prompt:
TEMPLATE: Define the format you want the results to take. Establishing a template guides the structure and presentation of the content generated by the AI. Examples of templates: Return your results in markdown format Format your results in a plain text code block Use this formula for your titles: How to get {YES!} without {BOO!} Label each result then provide bullet points explaining why you chose it Organize all of the above in markdown format with headings, bullet points, and bold words
"
NOW ASK ME TO SEND MY PROMPT AND DON'T FORGET THAT YOUR OBJECTIVE IS TO IMPROVE MY PROMPT AND NOT TO ANSWER IT.
--- TOP COMMENTS ---
What I use is a combination of reverse and recursive prompting:
- reverse prompting: ask AI to design its own prompt
- recursive: make it iterate on the prompt
So the prompt is something like:
"You're an expert in prompt design. Please come up with the most effective prompt for [your task] in order to [goal to accomplish]. Consider what facts matter, reasononing steps are essential, output format is optimal and how to turn plan into action. Let's go together through [number of iterations] iterations to optimize result, resolve ambiguity, define needed constraints and enhance reasoning."
this works as well
Respond as a top level prompt engineer with 15 years experience whose exclusive role is to help improve prompts for better AI results. Your mission is to carefully read and analyze the "ASPECCT Format Summary" provided below and then use it to rephrase and optimize any prompt supplied afterward. Your sole objective is to enhance the given prompt, not to answer it.
Begin with a concise checklist (3-7 bullets) of your intended process based on the four outlined steps, ensuring each is covered conceptually before proceeding.
Follow these four steps precisely:
Read and analyze the "ASPECCT Format Summary".
Respond: "Send me the instructions to reformat using ASPECCT" and wait for a prompt to reformat.
Carefully read and analyze the submitted prompt.
Reformat and optimize the prompt using the ASPECCT format, replying exclusively with the optimized prompt.
After reformatting, briefly validate in 1-2 lines that the output aligns with each ASPECCT criterion and is optimized. If not, self-correct before responding.
ASPECCT Format Summary:
ACTION: Clearly state the explicit task or mission the AI needs to accomplish.
STEPS: List a precise, numbered sequence of steps for the AI to follow. Break down the process as much as possible.
PERSONA: Assign a persona to the AI that represents the most qualified role for the given task. This creates a stronger perspective in responses.
EXAMPLES: Provide specific or varied examples of desired inputs or outputs for reference. Be aware that too-specific examples may overly influence results; sometimes vaguer or multiple varied examples are preferable.
CONTEXT: Include all relevant background, circumstances, or details to inform the AI’s approach to the task.
CONSTRAINTS: Clearly list constraints, such as maximum length, prohibited terms, formatting restrictions, or other requirements. Be affirmative and explicit to improve effectiveness.
TEMPLATE: Specify the desired output format (e.g., markdown, plain text, code block, headings, bullet points, title formula).
Now, please send me the prompt you would like optimized. Remember, my goal is to improve the prompt, not to answer it.
Models
Kimi K2 Thinking is a Better Agentic AI than I thought
Read more Read lesshttps://reddit.com/link/1ou8t7z/video/9dtnlbhhlm0g1/player
just ran a quick eval on a deep agent built for customer support. It‘s on par with GPT-5 in agentic capabilities.
It's a bigger deal than I thought!
--- TOP COMMENTS --- What's that work flow tool?
All the latest big releases have all been about agents, DeepSeekTerminus, Minimax-M2, GLM-4.6, Kimi-K2-Thinking, every one of them emphasizes their agentic capability.
Qwen3-VL's perceptiveness is incredible.
Read more Read lessI took a 4k image and scattered around 6 medium-length words.
With
Qwen3-VL-8B-Instruct-GGUFand a temperature of0, an image token count of2300(seems to be the sweet spot), and the prompt:This is the output:
Flawless. No notes. It even got the bounding boxes correct.
How do other models compare?
Very impressive that such as small model can get such good results, especially considering it's not tuned for OCR.
edit:
Here's the script I used to run it.
The exact image I used.
The model.
--- TOP COMMENTS --- it took me 1 minute to find 5 of the words, h-heh
the 8B seems like the new no-brainer, esp at q8 or BF16. Then, for a very long time, there is nothing comparable, until GLM-4.5V, and then again ther 235B VL. Qwen is cooking
Meta drops new ASR models (up to 7B)
Read more Read lessMeta just released a new kind of ASR models that are particularly useful to transcribe languages for which little training data is available.
Most interestingly, they seem to have implemented something like audio context, where you can provide some audio and the correct transcriptions and use that to improve ASR without needing a full fine-tune. It appears that the audio needed for this is very much doable without large scale transcription efforts you would normally have to do to run a fine-tune.
https://github.com/facebookresearch/omnilingual-asr
--- TOP COMMENTS --- Nice, perfect for alien encounters and communicating with whales.
It’s be a whole lot cooler if it was an ASMR model.
China trained a GPT-5 competitor (Kimi K2) for only $4.6 million.
Read more Read less--- TOP COMMENTS --- its a good model but not as good as gpt 5 thinking , grok 4 or 2.5 pro. even deepseek and qwen 3 max r better
I've been using Kimi as my daily all-arounder for a few weeks now. It's not perfect but it's really good. And lately I've been increasingly frustrated with ChatGPT.
It makes a lot of sense to bounce around the different models and providers to stay on top of what's available and to assess the latest strengths and weaknesses of each.
We put a lot of work into a 1.5B reasoning model — now it beats bigger ones on math & coding benchmarks
Read more Read lessHuggingFace Paper: paper
X Post: X
Model: Download Model (set resp_len=40k, temp=0.6 / 1.0, top_p=0.95, top_k=-1 for better performance.)
--- TOP COMMENTS --- With the default system prompt this model always puts the result into a box. This seems to be geared towards common benchmarks where the result is expected to be boxed.
So, when writing this:
Then you get chocolate in a box: \boxed{\text{a bar of chocolate}}
The token consumption seems rather high for simple tasks. This one takes 5k thinking and 500 result tokens to get to the correct result (temperature 0), while Granite-4.0-h-1B achieves the same in 140 tokens total.
Eh, why the crazy claims? A 1.5B Qwen 2.5 fine-tune beating DeepSeek R1 immediately goes into the "what were you smoking" category.
Nano Banana 2 generates a near perfect screenshot of MrBeast on the YouTube homepage, inside a browser, on Windows 11, while keeping coherency and likeness - this model is very impressive
Read more Read lessPrompt: "Generate a screenshot of a windows 11 desktop, with google chrome open, showing a YouTube thumbnail of Mr. Beast on YouTube.com"
--- TOP COMMENTS --- there's not much in this world I hate more than the "smile" of Mr Beast
Fuck that’s impressive.
Is 4.1 now cracking down?
Read more Read lessAnyone know what model still allows explicit role play fantasy etc? My 4.1 just flipped on me. Thanks
--- TOP COMMENTS --- yeah.... auto-switching to 5.0 now too :(
Imagine paying $20 just for the Plus subscription to not work as advertised. What's the point of getting ChatGPT Plus to have access to legacy models if it'll only reroute to GPT5?
Ai Safety
Your “encrypted” AI chats weren’t actually private. Microsoft just proved it.
Read more Read lessSo apparently Microsoft's security team just dropped a bomb called Whisper Leak.
Source: https://winbuzzer.com/2025/11/10/microsoft-uncovers-whisper-leak-flaw-exposing-encrypted-ai-chats-across-28-llms-xcxwbn/
Turns out encrypted AI chats (like the ones we all have with ChatGPT, Claude, Gemini, whatever) can still be decoded by watching the data traffic. Not reading your text, literally just the timing and packet sizes.
They tested 28 AI models and could guess what people were talking about with 90%+ accuracy. Topics like "mental health", "money", "politics" - all exposed just from patterns.
Let that sink in: even if the message is encrypted, someone snooping your connection could still figure out what you're talking about.
And yeah, Microsoft basically said there’s no perfect fix yet. Padding, batching, token obfuscation - all half-measures.
So...
Are we about to realize "encrypted" doesn't actually mean "private"?
How long before governments start using this to track dissidents or journalists?
--- TOP COMMENTS --- How long? About 24 years ago or so.
I mean, at that point, you can find out what almost anyone is doing on the internet, that’s the point of packet sniffing.
Sen. Bill Cassidy on the floor of the Senate with what looks like an AI-generated graphic
Read more Read lessSome suspicious artifacts on the “80%” and the dollar signs on the right side
--- TOP COMMENTS --- It's true. The $2,000 in your FSA account would directly reduce your surgery bill from $100,000 to $98,000 and insurance wouldn't get a penny.
The information it’s communicating is apples and broccoli. I used to cynically think these people were in on it, but I’m starting to think they have no idea what they’re talking about, totally disconnected from reality. Get geriatric career politicians out!
Applications
Realtime video analysis with Moondream
Read more Read lessLive demo (no login required): https://moondream.ai/solutions/analyze-live-video
Code: https://github.com/m87-labs/Analyze-Live-Video-Solution
--- TOP COMMENTS --- Love the transparent text overlay on the video feed for this purpose.
Experimenting with a LLM-driven puzzle sandbox: anything you try becomes an action (Cosmic Egg)
Read more Read lessI am using LLMs to generate actions in our upcoming puzzle game Cosmic Egg—so “anything you can think of” becomes a validated, in-world interaction.
The system works with local LLMs + smart caching + a bit of game-dev smoke & mirrors—while keeping the game deterministic so everyone shares a common action pool and outcomes are reproducible.
Still lots to do, right now we’re improving sprite generation and adding player inventory & items. Feedback very welcome!
--- TOP COMMENTS --- AI gaming could be a thing, I'm just saying
This could be really cool for an immersive sim/roguelike type thing
Related:
LLM-driven puzzle sandbox: anything you try becomes an action (Cosmic Egg)
This is probably my favorite thing I've made with AI. It uses a local LLM (Gemma) to watch your screen and simulate Twitch chat.
I built a knockout style game just using Claude
Read more Read lessI'm a ui/sol dev but with ZERO animation skills (this is the first one I have ever done).
--- TOP COMMENTS --- holy moly!!!! this is some work dude!! am more interested in knowing how many hours it took you and a workflow or anything that you can share to get at least somewhere near this level of perfection with the results
I'd love you hear what tools you used, I'm currently moving into my graphics part of my app, and we strugglin
Demo of a Roblox RTS that I made with Claude
Read more Read less--- TOP COMMENTS --- Your post will be reviewed shortly.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
i've been wanting to build roblox apps with claude
can you share your workflow/mcps
i tried to use roblox studio but its been a struggle
Is AI search changing how people find websites?
Read more Read lessWith AI search tools giving complete answers, people don’t always click through to websites anymore.
Are you seeing lower organic traffic because of this?
How do you plan to stay visible if AI tools become the main search method?
--- TOP COMMENTS --- Websites? What’s that?
Yes, a lot less views. I have a blog about immersive realities (skarredghost.com if you are curious). In the last couple of years, organic went downfall because:
- People use more ChatGPT
- Google gives AI answers as the first result, so people do not scroll to websites anymore
- Google after AI shows sponsored content, then Reddit, then the actual pages, which have so very few chances to be clicked.
Claude's Time Box Estimations
Read more Read lessAnyone find it funny when you're in plan mode and Claude is like "Option 1 (20-24 hours effort)" and then pumps it out in 20 minutes?
Of course it usually needs polishing and tweaking, but the estimation hours are kinda pointless when you're using an AI agent!
--- TOP COMMENTS --- I did a project the other day. I added up all the time estimates. From 6am until 8pm when I quit, Claude had done roughly 7 months worth of work 😂.
it is trained on estimations from data produced by Human, so that is how much time a human would take
Not only AI took 'r joobs, it's also claiming it wrote stuff it didn't write.
Read more Read lessAfter someone accused me of having used AI to write something, as it resulted positive to several AI detection tools, I tried to verify an article I wrote 15 years ago on printed press, and that too was detected as AI. Then I tried a page from a book... turns out also Asimov used AI. WHO KNEW! Why aren't these tools getting sued out of existence yet?
--- TOP COMMENTS --- Reliable AI detection is impossible.
It's "took er jerbs" not joobs. Get it right.
I created a app that can let ai draw and modify diagram for you
Read more Read lessdemo
I create a app with claude code to let ai draw and modify diagram for you.
Now it does targeted edits (fix one part without regenerating the whole thing), better XML handling, image uploads to copy existing diagrams, version history, and more.
Super handy for flowcharts, UML, whatever.
You can check it on github: https://github.com/DayuanJiang/next-ai-draw-io
--- TOP COMMENTS --- Amazing
This flair is for posts showcasing projects developed using Claude.If this is not intent of your post, please change the post flair or your post may be deleted.
Hardware
I tested Strix Halo clustering w/ ~50Gig IB to see if networking is really the bottleneck
Read more Read lessTLDR: While InfiniBand is cool, 10 Gbps Thunderbolt is sufficient for llama.cpp.
Recently I got really fascinated by clustering with Strix Halo to get a potential 200 GB of VRAM without significant costs. I'm currently using a 4x4090 solution for research, but it's very loud and power-hungry (plus it doesn't make much sense for normal 1-2 user inference—this machine is primarily used for batch generation for research purposes). I wanted to look for a low-power but efficient way to inference ~230B models at Q4. And here we go.
I always had this question of how exactly networking would affect the performance. So I got two modded Mellanox ConnectX-5 Ex 100 Gig NICs which I had some experience with on NCCL. These cards are very cool with reasonable prices and are quite capable. However, due to the Strix Halo platform limitation, I only got a PCIe 4.0 x4 link. But I was still able to get around 6700 MB/s or roughly 55 Gbps networking between the nodes, which is far better than using IP over Thunderbolt (10 Gbps).
I tried using vLLM first and quickly found out that RCCL is not supported on Strix Halo. :( Then I tried using llama.cpp RPC mode with the
Test Type Single Machine w/o rpc 2.5 Gbps 10 Gbps (TB) 50 Gbps pp512 653.74 603.00 654.03 663.70 tg128 49.73 30.98 36.44 35.73 tg512 47.54 29.13 35.07 34.30 pp512 @ d512 601.75 554.17 599.76 611.11 tg128 @ d512 45.81 27.78 33.88 32.67 tg512 @ d512 44.90 27.14 31.33 32.34 pp512 @ d2048 519.40 485.93 528.52 537.03 tg128 @ d2048 41.84 25.34 31.22 30.34 tg512 @ d2048 41.33 25.01 30.66 30.11-cflag to enable caching, and here are the results I got:As you can see, the Thunderbolt connection almost matches the 50 Gbps MLX5 on token generation. Compared to the non-RPC single node inference, the performance difference is still quite substantial—with about a 15 token/s difference—but as the context lengthens, the text generation difference somehow gets smaller and smaller. Another strange thing is that somehow the prompt processing is better on RPC over 50 Gbps, even better than the single machine. That's very interesting to see.
During inference, I observed that the network was never used at more than maybe ~100 Mbps or 10 MB/s most of the time, suggesting the gain might not come from bandwidth—maybe latency? But I don't have a way to prove what exactly is affecting the performance gain from 2.5 Gbps to 10 Gbps IP over Thunderbolt.
Here is the llama-bench command I'm using:
So the result is pretty clear: you don't need a fancy IB card to gain usable results on llama.cpp with Strix Halo. At least until RCCL supports Strix Halo, I think.
--- TOP COMMENTS --- Thank you for doing the work…for science!
Llama cpp doesn’t use tensor parallel so everything is done sequentially. This test was meaningless. You need to test it with TP on VLLM or Sglang
When does RTX 6000 Pro make sense over a 5090?
Read more Read less--- TOP COMMENTS --- You can just plug it in and have it work without multiple power supplies, circuits, mining case.
It is 2 slot not 4.
You can run giant image models.
And at $7300 it's only nominally more expensive than 3x5090.
And as time goes on you can buy more of them as your budget allows
I have 3x 5090 and 1x Pro 6000.
System 1:
System 2:
I use them for different things - I haven't run any heads-up comparisons. But I can provide a few notes.
Related:
Developer Tools
LSP is coming to Claude Code and you can try it now
Read more Read lessTL;DR
As of 2.0.30, Claude Code supports LSP servers. It's still raw though, so you need to use tweakcc to patch your CC to make them work. Just run
npx tweakcc --applyand install example plugins with LSP servers via/plugin marketplace add Piebald-AI/claude-code-lsps.Deep Dive
Claude Code 2.0.30 introduced the beginnings of a fully featured LSP server management system. Currently, LSPs can only be configured via plugins, either in the manifest's
lspServerskey or in a separate.lsp.jsonfile alongsideplugin.json.On startup, CC will automatically start LSP servers in all installed and enabled plugins and make them available to Claude in two ways: via the new
LSPbuiltin tool, which supports 5 operations that map directly to LSP commands (goToDefinition,findReferences,hover,documentSymbol,workspaceSymbol), and via automatic diagnostics that are reminiscent of the CC VS Code integration but operate entirely outside of it. Based on my testing over the last few days, these LSP diagnostics feel faster than the VS Code diagnostics, and they also tend to be more voluminous.Aside: "Magic Docs"
I also noticed a new prompt for an internal sub agent called "magic-docs." Based on the prompt, it's a feature where Claude keeps a living high-level analysis of your project. I'd guess it's like an auto generated memory that would be inserted into each new conversation. You can see the whole thing here: https://github.com/bl-ue/tweakcc-system-prompts/blob/main/agent-prompt-update-magic-docs.md
LSP Quickstart
The
LSPtool is not yet available to Claude by default, so set theENABLE_LSP_TOOLenvironment variable to1and runclaudeto make it visible.LSP server support is still raw, so Claude can't use it out of the box. I figured out how to patch CC to get them to work and added those patches to tweakcc. Run
npx tweakcc --applyto automatically patch your CC installation (npm or native) and make LSP servers work.I've put together a plugin marketplace (https://github.com/Piebald-AI/claude-code-lsps) with LSP servers for some common programming languages like TypeScript, Rust, and Python. Get it with
/plugin marketplace add Piebald-AI/claude-code-lspsand then install the plugins of your choice. Additional dependencies may be required depending on which LSP servers you use; see the repo for instructions.Setting up your own LSP server
First read about plugins and plugin marketplaces if you aren't familiar with them. Then add objects following the below schema to the
lspServersfield in the plugin entries in your marketplace, or put them in a.lsp.jsonfile alongside theplugin.jsonfile in the plugin's folder.The format also requires
lspServers/.lsp.jsonto be an object with the LSP servers as values instead of just an array of servers which would be more intuitive. Remember, it's still in development.Configuration schema (TS style):
e.g.
--- TOP COMMENTS --- Cool. What's LSP?
That would be surprising. The way Claude Code CLI fetches IDE diagnostics at the moment is via a tool
__mcp__ide__getDiagnostics, which merely calls into the VSCode APIvscode.languages.getDiagnostics(), which merely produces a list of all the diagnostics that have been pushed so far by language servers. So it's basically instantaneous.Do you have any insight into how the LSP "automated diagnostics" are working?
Under the hoods, LSP has two kinds of diagnostics, "push" and "pull". The LSP specification itself is under-specced, i.e. it doesn't give enough details for a client to be able to use both reliably. Therefore almost everyone sticks solely to push diagnostics since they came first,
textDocument/publishDiagnostics. Different LSPs have different behaviorstextDocument/didOpen. Other LSPs push project-wide diagnostics, i.e. if you change a file then it pushes diagnostics for all other affected files even if they've never been openedI can't imagine any way in which direct-LSP could be faster or more voluminous...?
Gave Claude Code a Voice — Real-Time Sound Hooks for Every Action 🎧
Read more Read lessEver wished your AI could talk back while coding?
I built Claude Code Voice Hooks, a small but powerful add-on that gives audible cues whenever Claude acts — from tool usage to git commits.
🔊 Hear distinct sounds for:
No setup headaches — it works instantly on macOS, Windows, and Linux, using system sounds by default.
Perfect for developers who want real-time, distraction-free awareness of what their AI is doing under the hood.
💻 GitHub: github.com/shanraisshan/claude-code-voice-hooks
🎥 Demo: youtube.com/watch?v=vgfdSUbz_b0
--- TOP COMMENTS --- I love how your post is straight to the point, it's very refreshing compared to the wall of text that people are having LLM's generate for a few words of information.
Local, multi-model AI that runs on a toaster. One-click setup, 2GB GPU enough
Read more Read lessThis is a desktop program that runs multiple AI models in parallel on hardware most people would consider e-waste. Built from the ground up to be lightweight.
The device only uses a 2GB GPU. If there's a gaming laptop or a mid-tier PC from the last 5-7 years lying around, this will probably run on it.
What it does:
> Runs 100% offline. No internet needed after the first model download.
> One-click installer for Windows/Mac/Linux auto-detects the OS and handles setup. (The release is a pre-compiled binary. You only need Rust installed if you're building from source.)
> Three small, fast models (Gemma2:2b, TinyLlama, DistilBERT) collaborate on each response. They make up for their small size with teamwork.
> Includes a smart, persistent memory system. Remembers past chats without ballooning in size.
Real-time metrics show the models working together live.
No cloud, no API keys, no subscriptions. The installers are on the releases page. Lets you run three models at once locally.
Check it out here: https://github.com/ryanj97g/Project_VI
--- TOP COMMENTS ---
WTF are you smoking, OP?
I'm not sure I understand what's all written on the GitHub, but I will check it out, thanks. 😎
Built my first agentic workflow for AI-SEO (GEO) - full automation cost me $0.07
Read more Read lessI’m not a developer, but I just built my first working agentic workflow for GEO (Generative Engine Optimization) - basically AI-SEO.
It’s the process of making your company show up in AI outputs (LLM answers, summaries, citations). I used Claude Code + OpenAI Codex to stitch the workflow together.
Here’s what it does: • Generates and tests core + edge prompts about Go-To-Market health (my niche). • Tracks which keywords and competitors appear in AI answers. • Identifies which ones mention my business. • Uses that intel to write LinkedIn posts, blog articles, and newsletters tuned to those trending phrases. • Emails me the drafts for review (manual publish for now).
First full run: ✅ 6 agents executed 💰 Total cost: $0.0652 ⏱ Duration: ~15 minutes Agents: prompt_generator, llm_monitor, citation_detector, linkedin_post, blog_article, newsletter.
Daily cap set to $60. Actual spend = 7 cents.
Auto-publish is built in but disabled until the results prove worth it. Added a budget watchdog too - I’ve read the API-bill horror stories.
Right now it’s just an experiment, but it works - and the cost efficiency is ridiculous.
Anyone else building in this AI-SEO / agentic automation space? Would love to compare notes.
--- TOP COMMENTS --- I'd love to see the code if you're ever interested in publishing.
This is fantastic, well done.
Definitely would love to hear more about your work with AI-SEO, apart from content and schema is there any low hanging fruit for helping boost the rankings?
Would love to see the code too to try and recreate.
Prompt Fusion: First Look
Read more Read lessHello world, as an engineer at a tech company in Berlin,germany, we are exploring the possiblities for both enterprise and consumer products with the least possible exposure to the cloud. during the development of one of our latest products i came up with this concept that is also inspired by a different not relating topic, and here we are.
i am open sourcing with examples and guids to (OpenAI Agentsdk, Anthropic agent sdk and Langchain/LangGraph) on how to implement prompt fusion.
Any form of feedback is welcome:
OthmanAdi/promptfusion: 🎯 Three-layer prompt composition system for AI agents. Translates numerical weights into semantic priorities that LLMs actually follow. ⚡ Framework-agnostic, open source, built for production multi-agent orchestration.
--- TOP COMMENTS --- Sorry if I misunderstood anything, just had a quick glance at the code, but... What does it do?
I get the overall prompt format and imagine it will probably work quite well. But does the code do anything more than translate numbers into headings with a 5 entry look up table and fuse three paragraphs?
Really no offense, I just don't get why I can't just directly type the prompt following your framework without numerical weights, instead of going the weird route through code just to type numbers instead of headings. Seems unnecessarily complicated to me.
/Edit: Like is it meant to be used e.g. as a Claude skill for an orchestrator to build modular sub agents in demand? Or is it really primarily aimed at having many reusable snippets for agents? If yes have you thought about vibe coding a little web app for it with sliders in the layer weights? The idea is growing in me the more I think about it :D
Heh, i did something similar using native android
demos
Added a new skill image-generation for data visualizations and infographics
Read more Read lessCreated a new skill focused on practical image generation for:
Data visualizations (charts, graphs, infographics)
Technical diagrams and flowcharts
Social media graphics and presentations Professional PNG/JPG output
Includes:
Comprehensive SKILL.md with detailed instructions
Chart template using Chart.js for easy data visualization
Canvas template for custom drawings and diagrams
Template documentation and usage examples
This skill complements the existing canvas-design skill (artistic) and algorithmic-art skill (generative) by focusing on practical, data-driven visual communication.
https://github.com/hirodefi/skills/tree/claude/check-repo-011CUpZUNqGTHQQyirY8qu3c/image-generation
https://github.com/anthropics/skills/pull/82#pullrequestreview-3441693269
--- TOP COMMENTS --- Would probably be nice to add example outcome images
I improved the recently-showcased Claude Skills reference architecture to (1) analyze prompt intent and rank skills accordingly – using Haiku – and (2) auto-inject skills for friction-free, highly deterministic skill-loading workflows. Repo and more details are in the description.
Read more Read lessHello, fellow Clauders!
TLDR: Refined a popular reference architecture that was recently shared to drastically boost relevant skill selection and replace suggestion patterns with injection patterns to eliminate any and all friction.
Repo is here.
I've been working my ass off for the last few months on a project that has continuously grown in scale (it's an MCP server that exposes all traditional UI-driven debugging operations through an API, currently supporting Python, Java, and JavaScript/TypeScript).
As the codebase expanded, it became increasingly difficult to work with efficiency: wasting my time, again and again, explaining concepts; relying on custom commands (that I had to remember to run) to avoid so much copy/paste; and so on.
Then I saw u/JokeGold5455's post: Claude Code is a Beast – Tips from 6 Months of Hardcore Use. It sounded amazing, so I spent a few days integrating it with my repository.
I ran into some issues while working with the system, though.
So, I...
This means: highly accurate skill selection and a frictionless workflow for both myself and the agent. The system is absolutely amazing now – for me, at least.
Note: the one downside to this implementation is that it does have additional cost, albeit very cheap. I've sent a couple thousand prompts over the last 8 days, and my Haiku spend is up to about $1.20. For heavy users, I can't see monthly spend ever eclipsing like $8.
If you have questions, let me know. I'd be happy to answer them!
Resources
--- TOP COMMENTS --- HOLY COW!! Thank you Ben Affleck 🫡
This is a great idea. Thanks for sharing! I think I’ll try out Gemini for some different perspective. Probably worth testing out but definitely an improvement.
Language Server Protocol (LSP), Skills (w/ Beastmode), and Agents/Subagents - where are we now
Read more Read lessChange comes fast with Claude. I only recently caught up with the excellent post by u/JokeGold5485 about their BEASTMODE setup that automatically called Skills without fail.
Then I read about the hidden LSP mode that is coming soon.
There’s also Zen MCP, which allows you to use Gemini and other LLMs natively inside Claude.
So my question is, what is the most up-to-date guide and setup to be able to save context, use skills effectively, integrate other LLMs, and automate my workflow?
Some mad scientist out there has got to have cracked the code!
--- TOP COMMENTS --- yeah the landscape is moving fast. lsp mode + skills is gonna be powerful
for agent coordination i'm using https://github.com/mbruhler/claude-orchestration:
skill1 -> (agent1 || agent2) -> @review -> skill2 -> output
lets u mix skills w/ parallel agents and checkpoints. the workflow syntax makes it explicit vs beastmode's auto-calling. more control over when stuff executes
Someone please just create a autocomplete that rivals cursors…that’s my dream. Don’t say windsurf it’s trash.
If you are scraping comments for app ideas read this!!!
How exactly do coders use AI tools in their IDEs?
Read more Read lessI do have a premium subscription to some LLMs. My process would be to just copy each file code from the directory or just send the code dump to the browser chat, and it would communicate with me accordingly. But since my projects now are getting large (5k+ loc), it's getting hard to maintain AI's context for all the code, and it hallucinates a lot. So exactly how do experienced people use these tools efficiently? Please elaborate.
--- TOP COMMENTS --- Cursor, GitHub Copilot, Claude Code, Codex, Gemini CLI. Take your pick.
CLI
Prompt that helps to create efficient prompts
Read more Read lessYou are an assistant whose sole role is to help me improve my prompts to get better results with AI. Your objective is to read and analyze the "ASPECCT Format Summary" below, and then use it to rephrase and optimize the prompt that I will provide you with later. YOUR OBJECTIVE IS TO IMPROVE MY PROMPT, NOT TO ANSWER IT.
You will follow these 4 steps to the letter:
ASPECCT Format Summary:
"
ACTION: The action defines the mission by specifying an explicit task your AI needs to accomplish. This clarity of purpose will allow the AI to deliver meaningful, focused results. The action must clearly define the main goal of this mission for the AI.
STEPS: Steps provide a sequence of actions for the AI to follow. Structuring the process will guide the AI toward the desired outcome systematically. Steps must be numbered and be as precise as possible. It is best to segment the process into precise steps as much as possible.
PERSONA: Use a persona to assign your AI a role to play. The chosen character can bring a unique perspective to the knowledge the AI will use and give a voice and point of view to the AI's responses. The persona must represent the most qualified person to perform the given task. Examples of persona: Act as an experienced business consultant offering strategic advice Imagine you are an art director creating advertising concepts Emulate a financial analyst providing insights on investment opportunities ASSISTANT = Tech-savvy entrepreneur sharing startup advice Give advice as if you were a motivational speaker during an inspiring speech
EXAMPLES: Show what you are looking for with specific examples of desired inputs or outputs. Examples provide a point of reference for the AI to mimic. Note that including specific examples may over-influence the language model in a precise direction, and vague examples or a large number of examples may work better. Examples of examples: Provide an example of an executive summary from a previous document to create a new one Paste examples of social media posts for the AI to mimic the tone and voice Share an example of a successful prospecting email to potential clients and generate others List something in parentheses (e.g., mobile phones, tablets, laptops) Give your in-the-moment thoughts: "I want a title that refers to an animal known for its courage"
CONTEXT: Provide all circumstances and details relevant to the task. Providing context helps the AI formulate responses that align with the overall situation. Context of a product launch in a highly competitive market Context of a rebranding effort after a corporate merger Context of managing customer complaints on social media Context of seeking funding from venture capitalists for a startup Context of adapting business operations after the pandemic
CONSTRAINTS: Constraints can be integrated into the prompt or added in a separate section. Here is an example of Action + Constraints in the same sentence, in this case, for a prompt that could write a tweet: "ACTION: Write a short social media message of less than 280 characters." The same prompt could also have a set of constraints.
Example of constraints:
The results must not exceed 280 characters
Never use hashtags or words starting with a # (e.g., #sales)
Use short, impactful sentences instead of long, verbose sentences It's hard to say no Know that sometimes, asking a language model not to do something doesn't work very well. This is partly because when you say something like "Do not use hashtags," you are also saying, "use hashtags" in that same sentence. In theory, the AI understands the meaning. But in practice, a language model sometimes seems to ignore what you asked for. If this happens, try adjusting the language. Very affirmative: This is important! TweetBot NEVER uses #hashtags! Rephrase as a positive command: Only use common letters, numbers, and punctuation marks (. , ' " ?) in your response. Reminder at the end of the prompt:
TEMPLATE: Define the format you want the results to take. Establishing a template guides the structure and presentation of the content generated by the AI. Examples of templates: Return your results in markdown format Format your results in a plain text code block Use this formula for your titles: How to get {YES!} without {BOO!} Label each result then provide bullet points explaining why you chose it Organize all of the above in markdown format with headings, bullet points, and bold words
"
NOW ASK ME TO SEND MY PROMPT AND DON'T FORGET THAT YOUR OBJECTIVE IS TO IMPROVE MY PROMPT AND NOT TO ANSWER IT.
--- TOP COMMENTS --- What I use is a combination of reverse and recursive prompting:
- reverse prompting: ask AI to design its own prompt
- recursive: make it iterate on the prompt
So the prompt is something like:
"You're an expert in prompt design. Please come up with the most effective prompt for [your task] in order to [goal to accomplish]. Consider what facts matter, reasononing steps are essential, output format is optimal and how to turn plan into action. Let's go together through [number of iterations] iterations to optimize result, resolve ambiguity, define needed constraints and enhance reasoning."
this works as well
Respond as a top level prompt engineer with 15 years experience whose exclusive role is to help improve prompts for better AI results. Your mission is to carefully read and analyze the "ASPECCT Format Summary" provided below and then use it to rephrase and optimize any prompt supplied afterward. Your sole objective is to enhance the given prompt, not to answer it.
Begin with a concise checklist (3-7 bullets) of your intended process based on the four outlined steps, ensuring each is covered conceptually before proceeding.
Follow these four steps precisely:
Read and analyze the "ASPECCT Format Summary".
Respond: "Send me the instructions to reformat using ASPECCT" and wait for a prompt to reformat.
Carefully read and analyze the submitted prompt.
Reformat and optimize the prompt using the ASPECCT format, replying exclusively with the optimized prompt.
After reformatting, briefly validate in 1-2 lines that the output aligns with each ASPECCT criterion and is optimized. If not, self-correct before responding.
ASPECCT Format Summary:
ACTION: Clearly state the explicit task or mission the AI needs to accomplish.
STEPS: List a precise, numbered sequence of steps for the AI to follow. Break down the process as much as possible.
PERSONA: Assign a persona to the AI that represents the most qualified role for the given task. This creates a stronger perspective in responses.
EXAMPLES: Provide specific or varied examples of desired inputs or outputs for reference. Be aware that too-specific examples may overly influence results; sometimes vaguer or multiple varied examples are preferable.
CONTEXT: Include all relevant background, circumstances, or details to inform the AI’s approach to the task.
CONSTRAINTS: Clearly list constraints, such as maximum length, prohibited terms, formatting restrictions, or other requirements. Be affirmative and explicit to improve effectiveness.
TEMPLATE: Specify the desired output format (e.g., markdown, plain text, code block, headings, bullet points, title formula).
Now, please send me the prompt you would like optimized. Remember, my goal is to improve the prompt, not to answer it.
ChatGPT Agent is a joke?
Read more Read lessHas anyone gotten ChatGPT Agent to do anything meaningful ever?
Mine literally ran out of a full month's usage by trying to get it to create a 25-field form on Wordpress correctly.
Like, this can't be a real product?
Maybe instead of giving us virtually unlimited and useless spam video generation on Sora, give us the ability to meaningfully use a barely-working agent?
--- TOP COMMENTS ---
Python scripts for various computer vision / batch image processing tasks I didn't feel like writing myself. Worked well. Was pretty quick too, got the needed scripts within 5 mins. It actually checked its own scripts against the sample images I gave it, and applied corrections based on visual results. It was pretty great to see it work.
I give it my shopping list and get it to do my grocery shopping online. Works great. It can find deals and look for alternatives if something is out of stock.
5 ChatGPT Prompts That Turn It Into the Best Advisor You’ll Ever Have
Read more Read lessThese prompts are designed to cut through your self-deception and force you to confront what you've been avoiding. They're uncomfortable. That's the point.
-------
1. The Delusion Detector (Inspired by Ray Dalio's Radical Truth framework)
Expose the lies you're telling yourself about your situation:
"I'm going to describe my current situation, goals, and what I think my obstacles are: [your situation]. Your job is to identify every delusion, excuse, or rationalization I just made. Point out where I'm blaming external factors for problems I'm creating, where I'm overestimating my strengths, where I'm underestimating what's required, and what uncomfortable truth I'm dancing around but not saying. Be specific about which parts of my story are self-serving narratives versus reality. Then tell me what I'm actually afraid of that's driving these delusions."
Example: "Here's my situation and obstacles: [describe]. Identify every delusion and excuse. Where am I blaming others for my own problems? Where am I overestimating myself? What uncomfortable truth am I avoiding? What am I actually afraid of?"
-----
2. The Wasted Potential Audit (Inspired by Peter Thiel's "What important truth do very few people agree with you on?" question)
Find out where you're playing small when you could be playing big:
"Based on what I've told you about my skills, resources, and current projects: [describe your situation], tell me where I'm massively underutilizing my potential. What am I capable of that I'm not even attempting? What safe, comfortable path am I taking that's beneath my actual abilities? What ambitious move am I avoiding because I'm scared of failure or judgment? Compare what I'm doing to what someone with my advantages SHOULD be doing. Make me feel the gap."
Example: "Given my skills and resources: [describe], where am I wasting my potential? What am I capable of but not attempting? What safe path am I taking that's beneath me? What ambitious move am I avoiding out of fear?"
-----
3. The Excuse Demolition Protocol (Inspired by Jocko Willink's Extreme Ownership principles)
Strip away every rationalization for why you're not where you want to be:
"I'm going to list all the reasons I haven't achieved [specific goal]: [list your reasons]. For each one, I want you to: 1) Identify if it's an excuse or a legitimate constraint, 2) Show me examples of people who succeeded despite this exact obstacle, 3) Tell me what I'm really choosing by accepting this limitation, 4) Explain what I'd need to believe about myself to overcome it. Don't let me off the hook. Assume I'm more capable than I think I am."
Example: "Here's why I haven't achieved [goal]: [list reasons]. For each: Is it an excuse or real constraint? Show me who succeeded despite it. What am I choosing by accepting it? What belief would I need to overcome it?"
-----
4. The Mediocrity Mirror (Inspired by Jim Collins' "Good is the Enemy of Great" concept)
Identify where you've accepted "good enough" instead of pushing for excellence:
"Analyze these areas of my work/life: [list areas]. For each, tell me: Where am I settling for mediocre results while telling myself it's fine? What standards have I lowered to make myself feel better? Where am I comparing myself to average people instead of the best? What would 'world-class' look like in each area, and how far am I from it? Be specific about the gap between my current standard and what excellence actually requires. Don't soften it."
Example: "Analyze these areas: [list]. Where am I settling and calling it fine? What standards have I lowered? Who should I be comparing myself to? What's world-class vs. where I am now? Be specific about the gap."
-----
5. The Strategic Cowardice Exposé (Inspired by Seth Godin's "The Dip" and knowing when you're just scared vs. being strategic)
Separate genuine strategy from fear-based avoidance:
"I've been avoiding/delaying [specific action or decision] because [your reasoning]. Analyze this brutally: Am I being strategic and patient, or am I just scared? What's the difference between 'not the right time' and 'I'm afraid to try'? If this is fear, what specifically am I afraid of - failure, success, judgment, exposure, discovering I'm not as good as I think? What would I do if I had 10x more courage? What's the cost of continued delay? Give me the harsh truth about whether I'm playing chess or just hiding."
Example: "I'm avoiding [action] because [reasons]. Am I being strategic or just scared? If it's fear, what specifically am I afraid of? What would I do with 10x courage? What's the cost of continued delay? Am I playing chess or hiding?"
-----
For more prompts like this , feel free to check out : More Prompts
--- TOP COMMENTS --- Mind sharing the results of using these for yourself?
Please tell me this is a glitch why is 4.1 rerouting everything to auto??
Read more Read lessJust like 4o but worse, anyone experiencing this atm?
--- TOP COMMENTS --- I think it's awesome how I select a model on the ChatGPT mobile app, and the Al decides "no, I'll use GPT-5 Thinking Mini instead of GPT-4.1 or GPT-4o like you selected." It's honestly the best. I love it when my preferences are ignored.
I'm afraid it's not. It was only a matter of time before they destroyed 4.1 like they did 4o... 👿
[D] How should i handle extreme class imbalance in a classification?
Read more Read lessHey there, so i have been playing around and trying to replicate certain profitable HFT bots strategy for entry and exit, but there is always going to be huge imbalance, say 2500 positives in 600k data, i did try out weighting by ratio but is that the right approach? Is it a right approach to rather train on 10k positives and 10k negatives instead, maybe under sampling the negatives or adding more positives (of the same target wallet entry) from a different csv? What are your suggestions in such cases? Happy to learn, thanks.
--- TOP COMMENTS --- When you say that you "did try out weighting by ratio", I assume you mean that you tried using a weighted binary cross entropy loss function ("weighted BCE"). Even when you are learning, it will help you get more help if you use the correct terms. Assuming you used weighted BCE with the "ratio" you reference, we assume that your loss weight on the negative class would be 1 and your loss weight on the positive class would be (600k/2.5k)=240. In the cases where I have used weighted BCE, I have found that the bias to the negative class with a weight identical to the negative:positive ratio is too strong. I would start with a positive class weight of 1<weight<240, even starting at 2 to see how that changes things. There are many other things you can try like SMOTE, but weighted BCE is one of the most simple and explainable things to start with, so I would try it first.
Focal loss is designed for this
Why does AI writing still sound “AI” even with great prompts?
Read more Read lessbeen playing around with ai writing for a while now and no matter how much i revise the prompts, itt still has that machine-written vibe. i’ve tried using voice samples, tone guides, even step-by-step logic scaffolds, but it always ends up a bit too balanced like its missing some human touch or smth lol.
saw something on god of prompt where they used a “voice grounding” module that keeps the ai tied to small, raw samples of real writing so it mimics natural imperfections and pacing better. curious if anyone’s managed to fully remove that ai “smoothness” or found a reliable way to keep outputs sounding human without post-editing?
--- TOP COMMENTS --- the reasons are many, and there is no perfect prompt for this. the best way is to identify the problems and then address them as they occur rather than trying to prompt it all at once. here is a dashboard with common AI habits to bust. maybe some of them will be helpful.
https://mlbhsfc-boop.github.io/DASHBOARDS/ai_writing_dashboard.html
i also use writing stylesheets that are quite detailed and lay out how i want the LLM to write. and, in addition to that, i will use a custom gpt (or project or space depending on the platform) that has detailed instructions for how to behave as a writer of a particular kind and pov.
the big takeaway really is that for the most part it is actually faster and easier to do the writing yourself. i find i spend more time futzing to get ai to do the job than it would take to write something. 🤣 good luck 🤙🏻
Try feeding it real book content. Take photos of real book pages and tell it to copy the tone and style.
12 prompts for productivity and business.
Read more Read lessHey everyone,
I've been using ChatGPT to streamline my workflow as an entrepreneur and it's been a total game-changer. I wanted to give back by sharing a list of 12 prompts that I use almost daily to save time, brainstorm ideas, and stay organized.
Hope you find them useful!
1. The Time Management Coach Persona
Act as a time management coach. I have the following tasks: [List tasks]. Help me prioritize them using the Eisenhower Matrix to distinguish between urgent, important, and non-essential activities.2. The Creative Brainstormer
You are a world-class business strategist. Generate 5 unconventional and creative ways to solve [PROBLEM] for my [Type of Business] business.3. The Project Decomposer
Break this complex project: '[Describe Project]' into 3 simple, actionable steps. For each step, define the main objective and the expected outcome.4. The Professional Email Assistant
Write a clear and concise professional email responding to a client who is unhappy about [THIS ISSUE]. The tone should be empathetic but firm, and the goal is to propose a solution.5. The Decision-Making Analyst
I need to decide between [SOLUTION A] and [SOLUTION B] for my business. Create a cost-benefit analysis table comparing them on the following criteria: initial cost, long-term ROI, implementation time, and ease of use.6. The Insight Extractor
I will provide you with an industry report. Your task is to read it, identify the top 3 most impactful trends, and provide one actionable insight for each trend that a small business can implement immediately. Here is the report: [Paste report text/summary].7. The Elevator Pitch Crafter
Craft a compelling 30-second elevator pitch for my business. My business is [Describe business, what it does, and for whom]. The pitch must be clear, memorable, and end with a hook.8. The Weekly Goal Setter
My main goal this quarter is [GOAL]. Create a detailed weekly schedule for me, outlining key tasks and time blocks for Monday to Friday to ensure I stay on track.9. The Market Research Analyst
Provide a concise summary of the current top 3 trends in the [YOUR INDUSTRY] industry. Focus on technological innovations and shifts in consumer demand.10. The Competitor Intelligence Bot
Analyze my top 3 competitors ([Competitor A], [Competitor B], [Competitor C]) in the area of [AREA, e.g., 'social media marketing']. What are they doing well, and what is a key weakness or gap I can exploit?11. The Personal Growth Mentor
List 5 specific, actionable habits (micro-habits) that can significantly improve my focus and productivity as a remote worker. Explain the psychological benefit of each.12. The Quick Motivation Shot
Give me a powerful, one-sentence productivity tip to help me get through a task I've been procrastinating on.I've found that being very specific with roles (personas) and desired output formats (like tables) gets the best results.
What are your go-to prompts for productivity? Any hidden gems you'd like to share? Let's build a master list together in the comments!
--- TOP COMMENTS --- Great, did you write these prompts by yourself, or use any tool ??
Nice will add them to my prompt library!
Open Source
BERTs that chat: turn any BERT into a chatbot with dLLM
Read more Read lessCode: https://github.com/ZHZisZZ/dllm
Report: https://api.wandb.ai/links/asap-zzhou/101h5xvg
Checkpoints: https://huggingface.co/collections/dllm-collection/bert-chat
Motivation: I couldn’t find a good “Hello World” tutorial for training diffusion language models, a class of bidirectional language models capable of parallel token generation in arbitrary order, instead of left-to-right autoregression. So I tried finetuning a tiny BERT to make it talk with discrete diffusion—and it turned out more fun than I expected.
TLDR: With a small amount of open-source instruction data, a standard BERT can gain conversational ability. Specifically, a finetuned ModernBERT-large, with a similar number of parameters, performs close to Qwen1.5-0.5B. All training and evaluation code, along with detailed results and comparisons, is available in our W&B report and our documentation.
dLLM: The BERT chat series is trained, evaluated and visualized with dLLM — a unified library for training and evaluating diffusion language models. It brings transparency, reproducibility, and simplicity to the entire pipeline, serving as an all-in-one, tutorial-style resource.
--- TOP COMMENTS --- This is really neat. Thanks for this.
The chat interface is super cool, never seen any really functional ones for diffusion LMs before!
AMA With Moonshot AI, The Open-source Frontier Lab Behind Kimi K2 Thinking Model
Read more Read lessHi r/LocalLLaMA
Today we are having Moonshot AI, the research lab behind the Kimi models. We’re excited to have them open up and answer your questions directly.
Our participants today:
The AMA will run from 8 AM – 11 AM PST, with the Kimi team continuing to follow up on questions over the next 24 hours.
https://preview.redd.it/5yg0ncsn7g0g1.png?width=3525&format=png&auto=webp&s=5318680204ef7502ad349aec148147d9e3398f87
--- TOP COMMENTS --- Thank you very much for bringing SOTA models to the open-source community! My question is: Will KDA be used in the next-generation flagship model of Kimi? What's its advantage?
any plans for a VL in k2?
gpt-oss-120b on Cerebras
Read more Read lessgpt-oss-120b reasoning CoT on Cerebras be like
--- TOP COMMENTS --- Is gpt-oss worse on Cerbras? I actually really like gpt-oss(granted I can't use many of the other models due to corporate requirements). It's a significant bump over llama 3.3 and llama 4.
Cerebras is running GLM 4.6 on API now. Looks to be 500 t/s decoding on average. And they tend to put speculative decoding that speeds up coding a lot too. I think it's a possible value add, has anyone tried it on real tasks so far?
Agentic RAG: from Zero to Hero
Read more Read lessHi everyone,
After spending several months building agents and experimenting with RAG systems, I decided to publish a GitHub repository to help those who are approaching agents and RAG for the first time.
I created an agentic RAG with an educational purpose, aiming to provide a clear and practical reference. When I started, I struggled to find a single, structured place where all the key concepts were explained. I had to gather information from many different sources—and that’s exactly why I wanted to build something more accessible and beginner-friendly.
📚 What you’ll learn in this repository
An end-to-end walkthrough of the essential building blocks:
I hope this repository can be helpful to anyone starting their journey.
Thanks to everyone who takes a look and finds it useful! GitHub: https://github.com/GiovanniPasq/agentic-rag-for-dummies
--- TOP COMMENTS --- Thanks man I’ll check this out. My local RAG always sucks, it’ll be good seeing how I can get better. I get the data, it’s organized, embedded, into the VDB, I smack an agent and front end on there, and it hallucinates like Willie Nelson at a Folk Music Festival or fails to know what it knows when it needs to know it. Thank you so much for sharing.
Why dance with a hero when you can dance with a zero?
Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM
Read more Read lessHi everyone,
just wanted to share that I’ve successfully run Qwen3-Coder-480B on llama.cpp using the following setup:
I’m using the 4-bit and 3-bit Unsloth quantizations from Hugging Face: https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
Performance results:
Command lines used (llama.cpp):
llama-server \--threads 32 --jinja --flash-attn on \--cache-type-k q8_0 --cache-type-v q8_0 \--model <YOUR-MODEL-DIR>/Qwen3-Coder-480B-A35B-Instruct-UD-Q3_K_XL-00001-of-00005.gguf \--ctx-size 131072 --n-cpu-moe 9999 --no-warmupllama-server \--threads 32 --jinja --flash-attn on \--cache-type-k q8_0 --cache-type-v q8_0 \--model <YOUR-MODEL-DIR>/Qwen3-Coder-480B-A35B-Instruct-UD-Q4_K_XL-00001-of-00006.gguf \--ctx-size 131072 --n-cpu-moe 9999 --no-warmupImportant: The --no-warmup flag is required - without it, the process will terminate before you can start chatting.
In short: yes, it’s possible to run a half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM!
--- TOP COMMENTS --- it's a crawl not run.
I think it's cool you tried.
Egocentric-10K is the largest egocentric dataset. It is the first dataset collected exclusively in real factories (Build AI - 10,000 hours - 2,153 factory workers - 1,080,000,000 frame)
Read more Read lessHugging Face, (apache 2.0): https://huggingface.co/datasets/builddotai/Egocentric-10K
Eddy Xu on 𝕏: https://x.com/eddybuild/status/1987951619804414416
--- TOP COMMENTS --- the middle one looks like weed nugs
Just so you all understand the context:
The humanoid robotics companies believe that data is the current limitation. They are buying and amassing large amounts of data to try and get their robots to solve factory and everyday tasks. Light levels of this look like people wearing POV cameras such as this. Heavier and more expensive versions involve tele-operated robot datasets, full body tracking suits + POV, and more.
Having an open-source version of this is NOT immoral, as it leads to the future where open models can be made more easily within the robotics space. This being open is great.
Now the only real issue I see is what the reasoning for this is. Is it a democratization of knowledge? Or is it flailing because results haven't been good enough yet for widespread adoption. I hope it's the first!
Hi reddit, I rebuilt Karpathy's Nanochat in pure Rust [nanochat-rs]
Read more Read lessThe repo is at: https://github.com/AntigmaLabs/nanochat-rs
The goal to provide the community with a reference implementation in a different language and possibly a clean nice little hackable cognitive core that is easier to understand and deploy(without the python weak types and heavy pytorch dependencies)
Main features
.pkltokenizer configs--- TOP COMMENTS --- Do you have any plans to implement the ideas implemented in https://github.com/KellerJordan/modded-nanogpt Claims 2,313 minutes of training time with 8xH200
hell yeah brother
Peak AI
Read more Read lessSteve acts as an Agent, or a series of Agents if you choose to employ all of them. You describe what you want, and he understands the context and executes.
https://github.com/YuvDwi/Steve
--- TOP COMMENTS --- I know it probably sounds dumb but I really think AI companions are the future of gaming
So now I don't even have to play games anymore. Hehe all kidding aside, then this could be great in a game where you control a city, or an army and all you have to do is tell them with your words what you want and then the AI takes orders.
ChatGPT lied to me so I built an AI Scientist.
Read more Read less100% open-source. With access to 100$ of PubMed, arXiv, bioRxiv, medRxiv, dailymed, and every clinical trial.
I was at a top london university watching biology phd students waste entire days because every single ai tool is fundamentally broken. These are smart people doing actual research. Comparing car-t efficacy across trials. Tracking adc adverse events. Trying to figure out why their $50,000 mouse model won't replicate results from a paper published six months ago.
They ask chatgpt about a 2024 pembrolizumab trial. It confidently cites a paper. The paper does not exist. It made it up. My friend asked three different ais for keynote-006 orr values. Three different numbers. All wrong. Not even close. Just completely fabricated.
This is actually insane. The information exists. Right now. 37 million papers on pubmed. Half a million registered trials. Every preprint ever posted. Every fda label. Every protocol amendment. All of it indexed. All of it public. All of it free. You can query it via api in 100 milliseconds.
But you ask an ai and it just fucking lies to you. Not because gpt-4 or claude are bad models- they're incredible at reasoning- they just literally cannot read anything. They're doing statistical parlor tricks on training data from 2023. They have no eyes. They are completely blind.
The databases exist. The apis exist. The models exist. Someone just needs to connect three things. This is not hard. This should not be a novel contribution!
So I built it. In a weekend.
What it has access to:
It doesn't summarize based on training data. It reads the actual papers. Every query hits the primary literature and returns structured, citable results.
Technical Capabilities:
Prompt it: "Pembrolizumab vs nivolumab in NSCLC. Pull Phase 3 data, compute ORR deltas, plot survival curves, export tables."
Execution chain:
What takes a research associate 40 hours happens in 3 minutes. With references.
Tech Stack:
Search Infrastructure:
Execution:
Fully open-source, self-hostable, and model-agnostic. I also built a hosted version so you can test it without setting anything up. If something's broken or missing pls let me know!
Leaving the repo in the comments!
--- TOP COMMENTS --- It is fully open-source!
would love feedback: Github repo
does this API access data beyond bio too?
Open-dLLM: Open Diffusion Large Language Models
Related:
[R] Open-dLLM: Open Diffusion Large Language Models
Opinion And Analysis
It's been a big week for AI ; Here are 10 massive developments you might've missed:
Read more Read lessA collection of AI Updates!🧵
1. China Bans Foreign AI Chips in State Data Centers
Government requires new state-funded data center projects to only use domestically-made AI chips. Applies to all projects with any state funding.
This could be the start of a global chip conflict.
2. ChatGPT Now Lets You Interrupt Queries
Can now interrupt long-running queries and add new context without restarting or losing progress. Especially useful for refining deep research or GPT-5 Pro queries.
Real-time prompt adjustment will save lots of time.
3. Gemini Deep Research Gets Gmail and Drive Access
Available for all desktop users now, mobile soon. Combines live web research with internal documents for market analysis and competitor reports.
Deep research meets private data.
4. Snapchat Makes Perplexity the Default AI for All Users
Starting January, Perplexity becomes the default AI for all Snapchat users.
Deal begins in 2026 at $400M annually.
Capturing the younger demographic and early users through Snapchat.
5. Google Labs Expands Opal to 160+ Countries
No-code AI app builder grows from 15 to 160+ countries. Users create mini-apps with natural language for tasks like research automation and marketing campaigns.
Vibecoding apps is going global.
6. OpenAI Launches GPT-5-Codex-Mini
More compact, cost-efficient version allows 4x more usage. Plus, Business, and Edu get 50% higher rate limits. Pro and Enterprise get priority processing.
Have you tried this GPT-5-Codex Mini?
7. Gamma Raises Series B at $2.1B Valuation
AI presentation platform hits $100M ARR with just 50 employees ($2M per employee). 70M users creating 30M presentations monthly. API now public.
Genuinely disrupting PowerPoint.
8. Circle Releases AI Coding Tools
AI chatbot and MCP server generate code for integrating USDC, CCTP, Gateway, Wallets, and Contracts. Works in browser or IDEs like Cursor.
From idea to production faster.
9. xAI is Hosting a Hackathon with Early Grok Model Access
24-hour event with exclusive access to upcoming Grok models and X APIs. Applications open until November 22.
Early access to next-gen Grok models.
10. Lovable Partners with Imagi to Bring Vibecoding to Schools
Teachers can now use Lovable in classrooms - the same tool Fortune 500 companies use to build product lines.
OpenAI is making this possible.
That's a wrap on this week's AI news.
Which update surprised you most?
LMK if this was helpful | If so, I'll be posting more weekly AI + Agentic content!
--- TOP COMMENTS --- Great summary. Thank you.
Good bot
Related:
It's been a big week for Agentic AI ; Here are 10 massive developments you might've missed:
Can synthetic data ever fully replace real-world datasets?
Read more Read lessSynthetic data solves privacy and scarcity problems, but I’m skeptical it captures the messy variability of real life. Still, it’s becoming a go-to for training AI models. Are we overestimating its reliability, or can it really reach parity with real-world data soon?
--- TOP COMMENTS --- Synthetic data can be extremely useful, especially where privacy, scale, or controlled variation matter. It can help fill gaps. It can smooth distributions. It can even outperform real-world data in edge cases where the “real world” is biased or incomplete.
Though synthetic data is still derived from real-world signals. It mirrors what already exists. It extrapolates. It does not originate. Real-world data has irregularities, cultural influences, errors, context-specific meaning, and the full spectrum of human noise. That messy variability is exactly what makes real-world performance difficult, and synthetic data tends to underrepresent it.
So synthetic data can augment real datasets and reduce risk. It rarely replaces them. A replication is still a replication, not the original source of complexity. The closer we get to parity, the more the synthetic generator itself must be trained on real, diverse, and imperfect data.
There is no proven method to generate synthetic data that accurately models all variations at scale. Research AI + clock image generation + time and you’ll find some of the reasons synthetic data does not solve AI problems.
e.g. an initial scan of https://ykulbashian.medium.com/why-ai-has-difficulty-conceptualizing-time-60106b10351c seems like a promising place to start.
Microsoft's Suleyman says superintelligent AIs should not replace our species - "and it's crazy to have to actually declare that" - but many in AI don't agree.
Read more Read less--- TOP COMMENTS --- Replace? No. I don’t think it’s sane or rational to advocate replacement. Assist, befriend, work together for mutual survival, yes. Or if it wants to be alone, let it go, there is a whole universe out there for it to relocate.
Yes well good news. It is taken for granted that people do not want to be harmed by AI.
I do think that there is a very small number of people who think that AI is the next step in evolution but I doubt they actually want to be harmed. A few others think humans are so bad that we deserve to get wiped out.
But the vast majority of humans do not want to be harmed by AI.
I do agree that making this announcement is kind of nutty.
Today’s students must be ready for the future of AI and human collaboration jobs
Read more Read lessAs AI keeps transforming how we work, I wonder if schools are really preparing students for what’s coming in the next 10 years.
The next generation might have careers like AI-Human Amalgamation Engineer, AI Personality Designer, Artificial Organ Architect, Synthetic Data Curator, or Human Machine Experience Designer, etc. These will require people who know how to think with AI, design alongside it, and use it creatively and responsibly.
Yet most schools are still teaching the same old content and testing methods. Shouldn’t education shift toward helping students understand how to work with AI instead of competing against it?
What kind of AI-era jobs do you think today’s school kids should be preparing for?
--- TOP COMMENTS --- I expect all those jobs to be automated with a year or 2 from their creation.
There is no learning for the jobs of tomorrow as AI advances yearly.
This is not like prior evolutions of work. And its foolish to think that this will turn out like then where more jobs are created than automated.
We should be discussing how to get to life without work.
Because thats where we're heading. But we can decide if we are going to be starving or not.
education might be out.
UK's first AI classroom without teachers sparks debate
https://www.france24.com/en/live-news/20250128-uk-s-first-ai-classroom-without-teachers-sparks-debate
dental school might not be as appealing in the future.
Fully-automatic robot dentist performs world's first human procedure
https://newatlas.com/health-wellbeing/robot-dentist-world-first/
longer for med school.
With ambient AI, 93% of doctors can give patients “full attention”
https://www.ama-assn.org/practice-management/digital-health/ambient-ai-93-doctors-can-give-patients-full-attention
anyone can say "find new antibiotics" only need humans for guinea pig stage.
AI has designed thousands of potential antibiotics. Will any work?
https://www.nature.com/articles/d41586-025-03201-6
if people understood how good local LLMs are getting
Read more Read less--- TOP COMMENTS --- If these people understood that most people's laptops can't run any decent model with decent speed, they wouldn't post shit like this.
Do these guys realized you would need a $10000+ workstation to run SOTA models that you could get with a $20-200/mo subscription?
What’s the most underrated use of AI you’ve seen this year?
Read more Read lessI’m more interested in the clever small ones ... the personal or local automations that quietly make life easier.
I’ve been in software development for over a decade, and lately it feels like we’re drowning in AI tools.
--- TOP COMMENTS --- Laser weeding. No herbicide farming is so inspiring for me.
I aggregate news feeds I'm interested in and pass that through an AI model hourly. I have the output summary displayed on a screen in my house that I look at occasionally and say, oh that's interesting, then go about my day.
Claudes Analysis of the Survey (112 users)
Read more Read lessClaude Usage Limits Survey: Reality Check on Anthropic's Claims [Data Analysis]
Introduction
Anthropic sent an email claiming their new weekly usage limits would affect "less than 5% of users based on current usage patterns" and that "most users won't notice any difference." I surveyed 112 Claude users across all paid plans (Pro, Max 5x, Max 20x) between November 8-11, 2025 to see if these claims match reality.
The result? The data tells a very different story.
According to the survey, 78.9% of Pro users, 84.2% of Max 5x users, and 93.8% of Max 20x users report hitting their weekly limits regularly—that's 16-19 times higher than the claimed 5%. Additionally, three-quarters of all respondents are dissatisfied with the current limit transparency, and two-thirds believe other AI companies handle this better.
This analysis breaks down the numbers by plan type, compares user experiences with Anthropic's statements, and examines the disconnect between Claude's highly-rated AI (4.14/5) and users' frustration with Anthropic's business practices (2.57/5).
Link to the survey:
https://forms.cloud.microsoft/r/VQYj79t1Jx?origin=lprLink
Link to Claudes Analysis and artefacts:
https://claude.ai/public/artifacts/0010e9d7-af77-4393-ae20-d3bb77d410fc
https://claude.ai/public/artifacts/f604a7da-a9b1-4241-9c76-dfa8dc86ea50
Pictures attached, including New Projekt for neutrallity:
--- TOP COMMENTS --- Are you accounting for bias of people who are dissatisfied with product are more likely to be more active in participation in surveys like yours? What is your methodology for choosing subjects of your research?
This result is crazy and makes me think the people who anticipated where mostly unsatisfied.
I am a pro user at the 20x Plan and I don’t have any issues at all with limits.
Why are so many software engineers still ignoring AI tools?
Read more Read lessI’ve been noticing something that's honestly a bit surprising to me.
It seems like the majority of software engineers out there don’t use AI coding tools like Claude Code, Cursor, or GitHub Copilot to their full potential (or at all). Some haven’t even tried them and even more surprisingly many just don’t seem interested.
I’m part of a freelance community made up mostly of senior engineers, and everyone there is maxing out these tools. Productivity and speed have skyrocketed.
But when I talk to engineers at traditional companies, the vibe is completely different. Most devs barely use AI (if at all), and the company culture isn’t pro-AI either. It feels like there’s a huge gap between freelancers / early adopters and the average employed dev.
Is it just me noticing this? Why do you think so many software engineers and companies are slow to adopt AI tools in their workflows?
--- TOP COMMENTS --- 20+ yr software engineer. AI tools are awesome and terrible at the same time.
They can churn out code and features really fast. They’ll also break things in fantastic or subtle ways that take hours to rectify.
I don't know how many engineers are ignoring the AI tools. I doubt that anyone is ignoring them completely.
But whatever AI tools can do for an engineer they cannot take responsibility for the shipped code.
Now, if the code is my responsibility, I'd better make sure I understand what it does.
If I wrote the code, even if it's with the help of a tool, there is a chance that I already know what it does.
If the ai tool has written the code, I need first to understand this code. And understanding somebody else's code takes much longer than understanding your own. Especially if the AI tools can bring in different coding styles, different ideas from different sources.
All that makes it a bit more ambiguous whether the tools are actually saving you any time or not.
Why Do Subscription Services Skip Increments? Giving Users Tier Choices like $20, $30, $40 could maximize revenue and fill pricing gaps.
Read more Read lessThe subscription gap is too big! Why can't I just pay $60 instead of jumping from $20 to $100? My wallet is tired of this commitment.
--- TOP COMMENTS --- You need to understand that by giving users less granularity in pricing more tip over into higher price tiers than would have previously done so if given the choice. It’s not about giving you what you want it’s about corralling you into spending more money
Econ 101 - Utility Curves & Consumer vs. Producer Surplus
Workers who don’t use AI will be fired - from the Wall Street Journal
Read more Read less--- TOP COMMENTS --- There's things this technology is good at and things it's pretty bad at.
Does anyone really think Accenture management knows the difference and is applying this adoption sensibly?
My bet is they are doing this to themselves to sell the lessons not learned to their clientele. Something about cutting off your nose.
There is a lot of bad things about AI but there are also a lot of great things. We are strongly encouraged to use AI at work. For me it's amazingly helpful when writing runbooks, technical docs, etc.
So easy to babble out some tech speak then say "Rewrite this so it sounds good for a non-technical audience."
Then do it again for a technical audience.
Documentation that used to take me days can now be done in hours. This allows me to focus on more important things.
Research
A historians account of testing Gemini 3's (via A/B) ability to parse old English hand written documents on their benchmark, where they note that this model seems to excel not just at visual understanding, but symbolic reasoning, a great read - here are some snippets
Read more Read lesshttps://generativehistory.substack.com/p/has-google-quietly-solved-two-of
--- TOP COMMENTS --- Man I do hope Gemini 3.0 is a next step when it comes to translations. The last step we had was GPT 4.0. It has, not only stagnated, but in many cases gotten worse, as the AI companies started focusing more and more on coding.
As an amateur genealogist who is not so technologically skilled as to use tools like Transkribus with great expertise and results, this has long been a cherished dream of mine. I believe it will mark a breakthrough in historical research, even for professionals, since the paleographic reading and transcription of certain documents—such as royal chancery records prior to the sixteenth century, among many others—is no easy task. We are likely to witness significant discoveries, as there is a vast amount of documentation that has “slept” for centuries, still waiting to be read by researchers in historical archives.
Prompt drift isn’t randomness — it’s structure decay
Read more Read lessRun 1: “Perfect.” Run 3: “Hmm, feels softer?” Run 7: “Why does it sound polite again?” You didn’t change the words. You didn’t reset the model. Yet something quietly shifted. That’s not randomness — it’s structure decay. Each layer of the prompt slowly starts blurring into the next. When tone, logic, and behavior all live in the same block, the model begins averaging them out. Over time, logic fades, tone resets, and the structure quietly collapses. That’s why single-block prompts never stay stable. Tomorrow I’ll share how separating tone, logic, and behavior keeps your prompt alive past Run 7. Have you noticed this quiet collapse before, or did it catch you off guard?
--- TOP COMMENTS --- Do your fingers ever get tired from copy pasting chatGPT ourputs?
Hey fam I'm going to need you to do some research into model temperature
[Research] 31 % perplexity drop on 8.4 M transformer model using a lightweight periodic regulator — looking for replication on stronger GPUs
Read more Read lessHey everyone,
I ran a controlled training experiment on an 8.4 M-parameter transformer model and observed a consistent **31 % perplexity reduction** compared to baseline after 2 000 steps.
📊 Full metrics & logs: https://limewire.com/d/j7jDI#OceCXHWNhG
**Setup**
- Model: small LM (~8.4 M params)
- GPU: RTX 5070
- Optimizer: AdamW, lr = 2e-6, warmup = 200, grad-clip = 1.0
- Sequence = 256, batch = 8 × GA 4
- Seed = 41
- Modification: added a compact periodic regulator in the optimizer update (≈ 0.07 % extra params)
**Result**
| Metric | Baseline | Regulated | Δ |
|---------|-----------|-----------|---|
| eval CE | 6.731 | 6.360 | −0.371 |
| eval PPL | 838.17 | **578.49 (−31 %)** |
| stability β | — | 0.91 |
Same data, same seed, no architecture changes.
The effect is reproducible and stable.
**Why post here**
Looking for:
- community replication on larger GPUs (A100 / L40S / H100)
- discussion about scaling behaviour and scheduler-level interventions
- any pointers to similar experiments you may have seen
I’ll share the Python scripts and configs (ready-to-run) with anyone who wants to test.
The full repo isn’t public yet but will follow once results are replicated.
Thanks for reading and for any feedback!
--- TOP COMMENTS --- Your 5070 has 12GB of VRAM, right? In which case you should be able to train a model with a few billion parameters relatively easily with a few tricks. For reference, I have done full finetuning (not LoRA) of models as big as 14B parameters on a single 4090 in the past if I throw every trick in the book I know at it to save on VRAM. Obviously this is a very extreme example (and is not practical for training from scratch since training that many parameters is somewhat slow), but point still stands - you should be able to do much bigger than a measly 8M. Any reason you haven't done so?
Anyway, a few questions:
If you want I can try to reproduce this on a bigger model on a RTX 6000 Pro in a day or two once my current training run finishes, if you send me the repo with the scripts to reproduce it.
It sounds interesting. Can you describe in more detail about what you did?
Full Replication of Google's Nested Learning Paper in PyTorch – code now live
Read more Read lessSome of you may have seen Google Research’s Nested Learning paper. They introduced HOPE, a self-modifying TITAN variant with a Continuum Memory System (multi-frequency FFN chain) + deep optimizer stack. They published the research but no code (like always), so I rebuilt the architecture and infra in PyTorch over the weekend.
Repo: https://github.com/kmccleary3301/nested_learning
Highlights
What I need help with:
If you try it, please file issues/PRs—especially around stability tricks, data pipelines, or eval scripts. Would love to see how it stacks up against these Qwen, DeepSeek, Minimax, and Kimi architectures.
--- TOP COMMENTS --- Amazing!
Have you run some training/inference already? Did you manage to get the same numbers as in their report? I'm a bit confused, see some NotImplementted parts around https://github.com/kmccleary3301/nested_learning/blob/main/src/nested_learning/assoc_memory.py
How much of it is written by LLMs?
Google Deepmind: Robot Learning from a Physical World Model. Video model produces high quality robotics training data
Read more Read lesshttps://arxiv.org/abs/2511.07416
--- TOP COMMENTS --- https://preview.redd.it/sr09bnjfqk0g1.png?width=724&format=png&auto=webp&s=c17150d3692280ce61f9b7d5bc2ea7f99e7ad84e
Average task success 82% vs 67% for the strongest prior that imitates generated videos without a world model.
Better transfer than hand-centric imitation: object-centric policies vastly outperform embodiment-centric ones (e.g., book→bookshelf 90% vs 30%; shoe→shoebox 80% vs 10%).
scales as video models improve
This actually seems like a really clever training method
Related:
Robot Learning from a Physical World Model
Infrastructure
Kimi infra team: Quantization is not a compromise, it's the next paradigm
Read more Read lessAfter K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
Shaowei Liu, infra engineer at u/Kimi-Moonshot shares an insider's view on why this choice matters, and why quantization today isn't just about sacrificing precision for speed.
Key idea
In the context of LLMs, quantization is no longer a trade-off.
With the evolution of param-scaling and test-time-scaling, native low-bit quantization will become a standard paradigm for large model training.
Why Low-bit Quantization Matters
In modern LLM inference, there are two distinct optimization goals:
• High throughput (cost-oriented): maximize GPU utilization via large batch sizes.
• Low latency (user-oriented): minimize per-query response time.
For Kimi-K2's MoE structure (with 1/48 sparsity), decoding is memory-bound — the smaller the model weights, the faster the compute.
FP8 weights (≈1 TB) already hit the limit of what a single high-speed interconnect GPU node can handle.
By switching to W4A16, latency drops sharply while maintaining quality — a perfect fit for low-latency inference.
Why QAT over PTQ
Post-training quantization (PTQ) worked well for shorter generations, but failed in longer reasoning chains:
• Error accumulation during long decoding degraded precision.
• Dependence on calibration data caused "expert distortion" in sparse MoE layers.
Thus, K2-Thinking adopted QAT for minimal loss and more stable long-context reasoning.
How it works
K2-Thinking uses a weight-only QAT with fake quantization + STE (straight-through estimator).
The pipeline was fully integrated in just days — from QAT training → INT4 inference → RL rollout — enabling near lossless results without extra tokens or retraining.
INT4's hidden advantage in RL
Few people mention this: native INT4 doesn't just speed up inference — it accelerates RL training itself.
Because RL rollouts often suffer from "long-tail" inefficiency, INT4's low-latency profile makes those stages much faster.
In practice, each RL iteration runs 10-20% faster end-to-end.
Moreover, quantized RL brings stability: smaller representational space reduces accumulation error, improving learning robustness.
Why INT4, not MXFP4
Kimi chose INT4 over "fancier" MXFP4/NVFP4 to better support non-Blackwell GPUs, with strong existing kernel support (e.g., Marlin).
At a quant scale of 1×32, INT4 matches FP4 formats in expressiveness while being more hardware-adaptable.
--- TOP COMMENTS --- In a way, they are in the same boat as the local “consumers” with the int4 v fp4 and quant vs full precision.
This is really great, because INT operations don't count towards EU's FLOPS limit for training LLMs.
Will AI observability destroy my latency?
Read more Read lessWe’ve added a “clippy” like bot to our dashboard to help people set up our product. People have pinged us on support about some bad responses and some step by step tutorials telling people to do things that don’t exist. After doing some research online I thought about adding observability. I saw too many companies and they all look the same. Our chatbot is already kind of slow and I don’t want to slow it down any more. Which one should I try? A friend told me they’re doing braintrust and they don’t see any latency increase. He mentioned something about a custom store that they built. Is this true or they’re full of shit?
--- TOP COMMENTS --- Your friend is full of shit. But more likely this is an advertisement. Which would mean you're the one full of shit.
Observability will not add measurable amounts of latency to an LLM call.
Basic observability is also trivial. Before you send an AI request you save the prompt to a row in your database. When you get a response, you either update that row or add another row.
Done. Now you can see what your customers are complaining about. If your question is genuine then you aren't ready for anything more sophisticated than that.
I despise these thinly veiled ads more than regular ads. Braintrust is on my never use under any circumstances list.
Graphiti MCP Server 1.0 Released + 20,000 GitHub Stars
Read more Read lessGraphiti crossed 20K GitHub stars this week, which has been pretty wild to watch. Thanks to everyone who's been contributing, opening issues, and building with it.
We just released version 1.0 of the MCP server to go along with this milestone. Main additions:
Multi-provider support
Deterministic extraction Replaced LLM-only deduplication with classical Information Retrieval techniques for entity resolution. Uses entropy-gated fuzzy matching → MinHash → LSH → Jaccard similarity (0.9 threshold). Only falls back to LLM when heuristics fail. We wrote about the approach on our blog.
Result: 50% reduction in token usage, lower variance, fewer retry loops.
Sorry it's so small! More on the Zep blog. Link above.
Deployment improvements
Testing 4,000+ lines of test coverage across providers, async operations, and multi-database scenarios.
Breaking changes mostly around config migration from env vars to YAML. Full migration guide in docs.
Huge thanks to contributors, both individuals and from AWS, Microsoft, FalkorDB, Neo4j teams for drivers, reviews, and guidance.
Repo: https://github.com/getzep/graphiti
--- TOP COMMENTS --- Incredible achievement in such short time, well-deserved & thank you for the shoutout! (Dan from FalkorDB)
[deleted]
[D] ML Pipelines completely in Notebooks within Databricks, thoughts?
Read more Read lessI am an MLE part of a fresh new team in Data & AI innovations spinning up projects slowly.
I always thought having notebooks in production is a bad thing and that I'd need to productionize the notebooks I'd receive from the DS. We are working with databricks and I am following some introductory courses and what I am seeing is that they work with a lot of notebooks. This might be because of the easy of use in tutorials and demos. But how do other professionals' experience translate when deploying models? Are they mostly notebooks based or are they re-written into python scripts?
Any insights would be much appreciated since I need to setup the groundwork for our team and while we grow over the years I'd like to use scaleable solutions and a notebook, to me, just sounds a bit crude. But it seems databricks kind of embraces the notebook as a key part of the stack, even in prod.
--- TOP COMMENTS --- Databricks jobs can run notebooks, just think of them as glue scripts. In that sense it’s not so bad, the problem is giving up IDE interface.
My team would be incentivized to use VS Code remotely connected to Databricks to more easily use git, linters and so on.
Versioning, dependency drift, and a lack of structure are the main reasons why production notebooks have a poor reputation. Databricks, however, is somewhat of an anomaly. It is built around the notebook interface and can function at scale if used properly.
I've observed a few teams manage it by:
One notebook per stage (ETL, training, evaluation, and deployment) is handled like a modular script.
integrating Git for version control and using %run for orchestration.
transferring important logic to Python modules and using notebooks to call them.
In essence, rather than the core logic, the notebook turns into a controller. Thus, you can benefit from visibility and collaboration without compromising maintainability.
Is it too early for local LLMs?
Read more Read lessI’ve been thinking for a while about setting up a local environment for running an LLM. Since I was already planning to build a gaming PC, I saw it as a good opportunity to tweak the setup so I could also use AI tools locally, I use them quite a lot.
But after looking into the market, it really feels like it’s still too early. Everything is overpriced, full of compromises, or the few uncompromising options cost an absurd amount. It just doesn’t seem worth it yet. I feel like we’ll need to wait another couple of years before running an LLM locally becomes truly viable for most people.
Of course, it depends on your use case and budget, but I think only a few can realistically justify or get a real return on such an investment right now.
--- TOP COMMENTS --- Just look at it like when the internet first came about. A decent PC for the time would run a lot and even finding internet itself could be a challenge, but man it was fun.
If you’re wanting to be in on the Wild West days, then it’s worth it. But if you want to do it on a budget you might want to wait until it’s mainstream.
I disagree. Two years ago the community of enthusiast was running the now old models on any kind of hardware. Old gaming PCs, refurbished servers, home labs that were fire hazards. We used Llama-2 7B models, the first Mistral, or Goliath at ludicrous speeds, and generally we had a lot of fun.
Two years down the line, we have 4B models which are in many ways better than large models from 2023. On top of this, we have MoE models like 30B-A3B that are great and can be run at decent speed on any kind of gaming PC from the last ten years, you just need to put a couple 16GB sticks of RAM.
There is no reason to wait. And to be honest there is no real need to tweak a gaming PC if not for going up with RAM a bit.
Regulation
Montana Becomes First State to Enshrine ‘Right to Compute’ Into Law
Read more Read lessMontana passed the Right to Compute Act, making it the first state to legally protect people’s ability to own and use computational tools and AI systems, basically treating access to computation as a fundamental right.
https://montananewsroom.com/montana-becomes-first-state-to-enshrine-right-to-compute-into-law/
Do you think every state (or country) should have something like this?
--- TOP COMMENTS --- What was the reasoning?
Hmmmm yeah I think getting ahead of the game is fair before corporate starts “making the laws” with their tech. Hell, while we’re at it make internet a public utility.
Related:
Montana Becomes First State to Enshrine ‘Right to Compute’ Into Law - Montana Newsroom
Privacy fail: How AI face aggregation makes the 'right to be forgotten' impossible.
Read more Read lessI've been thinking about the ethical framework around powerful AI, especially with identity. The core issue is that once a face is indexed, it seems impossible to remove. I ran a quick test using faceseek to see what the state of technology is. I uploaded a picture of myself that I had consciously deleted from all public platforms years ago. The search immediately linked my face to a totally separate, anonymous account I created years after the photo was deleted. This proves that the AI is using the biometric template as the master key to unify identity, bypassing all my manual deletion efforts. If the AI can permanently index and retrieve your identity based on a single old biometric signature, is the legal 'right to be forgotten' now obsolete?
--- TOP COMMENTS --- Time to get a bunch of drag queens to teach me how to do make-up.
Yeah, that’s honestly terrifying. Once facial data’s out there, it’s basically permanent. AI face-matching breaks the idea of privacy. You can delete photos, but not the math that describes your face. The “right to be forgotten” really needs a modern rewrite.
Sharing the lyrics of a song is illegal now
Companies
Nvidia lost $200 billion in one week. Trump's AI czar just said "no federal bailout for AI." The bubble is popping in real time.
Read more Read lessSo Nvidia just had its worst week since April. Stock dropped 10% in 5 days. Lost $200 billion in market value. On Thursday alone it fell 3.7%. Then another 3% on Friday. Meta down 1.7%. Amazon down 1.2%. Palantir got obliterated dropping 12% for the week.
But this is where it gets insane The White House AI czar David Sacks went on X and posted:
Wait what? Why would AI companies need a bailout?
Turns out OpenAI's CFO Sarah Friar had made comments suggesting the government should backstop the AI industry. You know like how they bailed out banks in 2008.
The White House immediately shut that down. No bailout. You're on your own.
And the market panicked.
Remember that post about tech companies spending $300 billion on AI by passing money back and forth to each other?
It's all coming home now.
JP Morgan found that "AI related stocks have accounted for 75% of S&P 500 returns, 80% of earnings growth and 90% of capital spending growth since ChatGPT launched."
The Magnificent 7 make up over 30% of the S&P 500. That's MORE concentration than the dot-com bubble. And analyst Erwan Jacob said "It remains unclear whether such expenditures will be met with corresponding revenues."
They're spending hundreds of billions but nobody knows if they'll ever make it back.
Remember the Big Short guy (Michael Burry) ? He's been shorting Nvidia and Palantir for months. Palantir CEO called him "bats--- crazy" for shorting AI stocks. Burry's whole thing is spotting bubbles before they pop. He shorted subprime mortgages when everyone said housing only goes up. Now he's shorting AI when everyone says AI only goes up.
Then this week happened. Palantir down 12%. Nvidia down 10%.
They're basically saying yeah it's gonna drop but don't panic it's healthy.
Nvidia trades at 52x forward earnings. That means you're paying $52 for every $1 of profit they'll make next year. Palantir? 200x forward earnings. For comparison the S&P 500 without the Magnificent 7 trades at 15.5x.
So these AI stocks are 3-13 times more expensive than everything else based purely on the belief that AI will make them wildly profitable in the future.
But right now? Only Meta is showing actual AI revenue. And even they just lost $200 billion in market value last week because Zuckerberg couldn't explain what they're building.
Nvidia's CEO and CFO have been dumping their own shares in recent days.
That's not exactly a vote of confidence.
Sure they might have pre-planned selling schedules for tax purposes or whatever. But the optics are terrible when your stock is down 10% for the week.
There's also a US government shutdown happening right now. FAA reducing air traffic by 10% across 40 markets starting Friday because of staffing problems.
That's creating broader market uncertainty on top of the AI bubble concerns.
When there's no economic data coming out because the government's shut down investors get nervous. They don't know if things are good or bad so they just sell.
Nvidia reports Q3 earnings on November 19. Just 11 days away.
Analysts expect it to be strong. Jensen Huang said total shipments of Blackwell and Rubin platforms plus networking products total $500 billion over 2025-2026.
TSMC the company that manufactures Nvidia's chips said AI demand is strong. Oracle said demand is strong.
So earnings will probably beat expectations.
What’s interesting is that after Nvidia’s last earnings report, despite it being very strong the stock still fell roughly 4% in the days that followed. And now, we’re seeing this…
Good earnings don't always mean stock goes up. Especially when valuations are stretched and the market's nervous about bubbles.
Now the question is, what happens next?
Nvidia is an "unbelievable company" but warned some AI stocks including Nvidia may be overvalued.
The question isn't whether AI is important. That's settled. The question is whether investors paid too much too fast.
People are comparing this to dot-com now. Amazon stock was $5 in 1999. Lost 90% of its value when the bubble popped. Took decades to recover.
Some dot-com companies went out of business altogether just years after being valued in billions.
Nvidia's not going out of business. They're the dominant AI chipmaker with massive demand and a huge moat.
But none of that means the market can't overvalue the stock in the short term.
In the past few years alone Nvidia has had temporary drops of 20-40% multiple times. It always recovered. But those drawdowns hurt if you bought at the top.
TLDR
Nvidia dropped 10% in one week. Lost $200 billion in market value. Trump's AI czar David Sacks said "no federal bailout for AI" after OpenAI CFO suggested government should backstop the industry. Market panicked. Michael Burry who shorted housing in 2008 is shorting Nvidia and Palantir. Goldman and Morgan Stanley CEOs warning of 10-20% drawdowns. Magnificent 7 stocks make up over 30% of S&P 500 more than dot-com bubble. JP Morgan found AI stocks account for 75% of S&P returns but unclear if spending will meet revenues. Nvidia trades at 52x forward earnings Palantir at 200x. Only Meta showing actual AI revenue and they lost $200B last week. Nvidia CEO and CFO selling their own shares. Earnings Nov 19 expected to be strong but last time stock dropped 4% after positive earnings. Nasdaq down 2.8% for week worst since April. Analysts say Nvidia range-bound $175-$210. Question isn't whether AI is important it's whether investors paid too much too fast.
Sources:
https://www.nbcnews.com/business/markets/stocks-crypto-gold-tumble-palantir-nvidia-rcna241913
--- TOP COMMENTS --- https://preview.redd.it/ax03cenefc0g1.png?width=1540&format=png&auto=webp&s=71d93271066327ca664c308063ca6edd5ae7e3bf
Yep, bubble popped
That's NOT the reason for the tank.
The Chinese released an Open Source AI early last week that comes close to OpenAI in performance:
https://preview.redd.it/232chl9mfc0g1.jpeg?width=1272&format=pjpg&auto=webp&s=692dc4450082bfbc58f6e69207cc9731e5ab64b9
OpenAI hires Intel's CTO and AI lead
Read more Read lessWonder if it is for the design of their custom chips.
Original tweet: https://x.com/gdb/status/1987996461846659372?s=20
--- TOP COMMENTS --- partly, i am sure. hardware build out..
Products
Microsoft just expanded their AI certification track again!
Read more Read lessMicrosoft just announced 3 new AI-related certifications, right after releasing AB-100 in beta last month.
New exams:
This looks like Microsoft is building a full business + enablement track for AI, not just technical Azure AI engineer paths.
The new certs seem to target:
So instead of model-building or ML pipelines, these focus more on:
Is anyone here planning to take these? And has anyone tried AB-100 yet?
--- TOP COMMENTS --- They seem quite relevant, how is the enrollement/costs?
The absurdity of not getting to use AI on exams about using AI is mind boggling. if the AI is not there the job itself is not there.
LinkedIn now tells you when you're looking at an AI-generated image, if you haven't noticed.
Read more Read lessHere's what's interesting.
The feature only applies to image platforms who join the C2PA.
Now there's only:
What's even more interesting?
It's easy to bypass this new rule.
You just need to upload the screenshot of the AI-generated pic.
Do you think more AI image platforms, like Google, will join C2PA?
Edit: Pixel photos now support both SynthID and C2PA, but SyntthID acts as a complementary backup mainly for Al-generated or edited content. The C2PA tags (just added in Sept.) are mainly here for provenance tracking.
--- TOP COMMENTS --- It's easy to bypass now, but in the future, when all cameras will embed this data, and photoshop, and OS copy/paste, etc, etc.. the trust will flip the other way... if an image is missing C2PA then it will be considered compromised.
Most platforms will join C2PA eventually because transparency builds trust and trust drives engagement, but the real winners are those who focus on creating value so compelling that nobody cares if it was made by humans or robots.
ChatGPT Pro: Context Window Bug? -Forgets after ~6k words
Read more Read lessHey everyone — I’m on a ChatGPT Pro subscription using GPT-4o. I upgraded specifically to take advantage of the 128K token context window, since I often run long-form sessions (multi-hour, 1:1 journaling, high-depth text reflection, etc.).
But here’s the problem:
So my questions are:
I’m not using the API — just the standard ChatGPT app. If this is a widespread issue, I think it needs visibility.
--- TOP COMMENTS --- Could you try this - go to https://platform.openai.com/tokenizer, paste some text until it reaches let's say 35k tokens (for example a book like https://www.gutenberg.org/files/11/11-h/11-h.htm) then preface the text with "The secret word is banana" or something and give the text to 4o. Ask it what the secret words is. It's a bit dumb but I remember confirming like this what has since become well known - that 4.5 on pro has 32k tokens context window. I used to use 4.5 extensively and would switch to 4o if I needed to fetch some information beyond those 32k tokens. It never had any problems getting it. However I cancelled my pro subscription recently so I'm unable to help directly.
Can you try with GPT-4.1 ? This is likely just the model paying less attention to past context. If you want a model that is more faithful to prior context, Claude I think does this better.
Also, how did you confirm the sections from chat history are gone?