|
|
|
|
|
<br>I ran a fast [experiment investigating](https://git.goolink.org) how DeepSeek-R1 [performs](https://hairybabystore.com) on [agentic](https://arthue.in) jobs, in spite of not [supporting tool](https://amymis.com) usage natively, and I was rather [satisfied](http://chansolburn.com) by [preliminary outcomes](http://www.danyuanblog.com3000). This [experiment runs](https://arisesister.com) DeepSeek-R1 in a [single-agent](http://1.15.187.67) setup, where the model not only [prepares](https://git.jpsoftware.sk) the [actions](https://ishare.igrowtech.biz) however likewise creates the [actions](http://xn--d1aefbiknlj4m.xn--p1ai) as [executable Python](https://boutiquevrentals.com) code. On a subset1 of the [GAIA validation](https://www.double-film.ir) split, DeepSeek-R1 [outshines Claude](https://www.olenamakukha.com) 3.5 Sonnet by 12.5% absolute, from 53.1% to 65.6% appropriate, and other [designs](http://www.heart-hotel.com) by an even larger margin:<br> |