One idea I had for an LLM benchmark - rather than Javascript web dev, I’d want something that’s a bit more multi functional. I’d use an analyst level analysis all the time; something like transcribe Caesar’s Entertainment, Boyd Gaming, and MGM’s most recent earnings calls, pull out any takeaways regarding marketing efficiency and spend, and put them together in a nice document.
That would be great! I’m sure startups do that, but I wonder how good Deep Research would cover that.