📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems have achieved near-saturation on core engineering benchmarks, automating much of AI engineering work. Research remains less automated, but evidence suggests it may also be increasingly automated soon.
Recent empirical evidence shows that AI systems are now capable of automating the majority of AI engineering tasks, with benchmarks indicating near-complete automation of core engineering skills. This development shifts the landscape of AI research and development, raising questions about the future of human involvement in AI innovation.
Multiple benchmarks measuring AI capabilities in core research and engineering tasks have shown rapid progress. For instance, the CORE-Bench, which tests research reproduction, reached a 95.5% success rate by December 2025, with some authors declaring it ‘solved.’ Similarly, the MLE-Bench, assessing AI performance in Kaggle competitions, hit 64.4% in February 2026, approaching mid-tier human performance. These trajectories suggest that AI can now reliably perform tasks such as reproducing research results and competing in complex ML challenges, which previously required human expertise.
Clark’s analysis highlights that these benchmarks are hitting saturation points, indicating that further measurable progress may be limited by the benchmarks themselves rather than the underlying capabilities. The implication is that engineering tasks—traditionally considered the domain of human expertise—are increasingly being handled by AI systems, leaving research as the remaining frontier. Clark notes that the structural question remains whether research itself is fundamentally an engineering problem at scale, which could mean that the residual challenge of AI research automation might close faster than anticipated.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

1000 AI Tools Directory 2026: The Ultimate Guide to AI Tools for Business, Productivity, Content Creation, Marketing, Coding, Design, Research and Automation
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
![Claude AI for Beginners Bible: [5 in 1] The Ultimate Guide to Automate Your Work, Save Hours Every Week, and Use AI for Real-World Results](https://m.media-amazon.com/images/I/415+fSJacsL._SL500_.jpg)
Claude AI for Beginners Bible: [5 in 1] The Ultimate Guide to Automate Your Work, Save Hours Every Week, and Use AI for Real-World Results
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Strategic Benchmarking Reloaded with Six Sigma: Improving Your Company's Performance Using Global Best Practice
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Innovation in Music: Current Research Perspectives (Perspectives on Music Production)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Industry
This shift signifies that the bottleneck in AI development may soon move from engineering to research, fundamentally altering how AI systems are built and improved. As AI automates core engineering tasks, organizations could see faster development cycles, lower costs, and a potential reduction in human labor for routine research activities. However, it also raises strategic questions about the future role of human researchers and the nature of innovation in AI.
Rapid Progress in Core AI Engineering Benchmarks
Over the past year, multiple independent benchmarks have demonstrated exponential progress in AI capabilities related to core research and engineering skills. The CORE-Bench, measuring research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025. The MLE-Bench, assessing Kaggle competition performance, advanced from 16.9% to 64.4% over the same period. Additionally, advances in kernel design—such as automated GPU kernel generation—are moving from research papers into production-grade tools. These developments suggest a structural shift where AI systems are increasingly capable of performing tasks that were once thought to require human expertise.
“The pattern across these benchmarks indicates that AI is approaching saturation in core engineering tasks, which could mean the residual challenge is now primarily research.”
— Thorsten Meyer
Unresolved Questions on AI Research Automation
While engineering automation appears to be reaching saturation, it remains unclear how much of AI research itself can be automated. The structural question—whether research is simply large-scale engineering—remains open, and the pace at which research automation might occur is uncertain. Additionally, there is ongoing debate about whether inspiration and creative insight can be fully replicated by AI systems.
Next Milestones in AI Capability Development
In the coming months, researchers and industry players will monitor whether AI systems can automate higher-level research tasks, including hypothesis generation, experimental design, and theory development. The development of new benchmarks and evaluation methods will be critical to measure progress. Additionally, organizations may begin to shift their strategies toward integrating AI-driven research tools, potentially accelerating innovation cycles.
Key Questions
What specific engineering tasks has AI automated?
AI has demonstrated near-complete automation in research reproduction, code and kernel optimization, and complex ML competition performance, effectively handling dependencies, code execution, and performance tuning at levels comparable to human experts.
Does this mean human researchers are no longer needed?
While AI automates many engineering tasks, research activities involving hypothesis formulation, creative insights, and strategic decision-making may still require human input. The extent of remaining human involvement is still uncertain.
How soon might AI automate most of the research process?
Based on current trajectories, some experts suggest significant automation could occur within the next 2-3 years, but the timeline depends on breakthroughs in automating creative and theoretical aspects of research.
What are the risks of relying on AI for research and engineering?
Potential risks include over-reliance on automated systems, reduced human oversight, and the possibility of missing nuanced insights that require human judgment. Ensuring robustness and transparency will be critical.
Source: ThorstenMeyerAI.com