Darwin-skill

Optimizes AI Agent Skills using an autonomous, iterative loop, evaluating structural quality and real-world performance with a 'ratchet' mechanism to ensure continuous improvement.

A high-resolution social sharing image preview for Darwin.skill tool.

Darwin.skill is an autonomous optimization tool for AI Agent Skills, inspired by Andrej Karpathy's autoresearch. It aims to improve the quality and effectiveness of agent skills by iteratively generating, testing, and refining them. The tool addresses the challenge of maintaining and improving a growing number of agent skills beyond manual review, ensuring they remain performant and well-structured.

The core mechanism involves a double evaluation process: a static analysis provides a structural quality score (up to 60 points), and actual runtime performance testing assesses real-world effect (up to 40 points). Darwin.skill identifies the lowest-scoring dimensions, generates specific improvement plans, applies the proposed changes to the SKILL.md file, and then independently re-scores the updated skill. A unique 'ratchet mechanism' ensures that only changes leading to a higher score are retained, automatically reverting any regressions and guaranteeing continuous, positive improvement.

This tool is designed for developers and users of AI agents, particularly those operating within skill-based agent ecosystems that utilize formats like SKILL.md. It empowers users to ensure their agent's capabilities are robust, effective, and continuously evolving. The system incorporates a human-in-the-loop approach, pausing after each skill optimization to display the changes and score evolution, awaiting user confirmation before proceeding to the next optimization cycle.