Add 'Simon Willison's Weblog'

5 months ago · 66dab11e0e
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That model was [trained](https://parentins.com) in part using their [unreleased](https://themes.wpvideorobot.com) R1 "reasoning" design. Today they have actually [launched](https://heovktgame.club) R1 itself, together with a whole family of [brand-new models](https://www.gimos.it) obtained from that base.<br>
 <br>There's a lot of things in the new [release](http://git.emagenic.cl).<br>
 <br>DeepSeek-R1[-Zero appears](https://j-colorstone.net) to be the [base model](https://www.dinetah-llc.com). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek alert](https://video.disneyemployees.net) that "DeepSeek-R1-Zero experiences difficulties such as endless repetition, bad readability, and language blending." ... so they likewise launched:<br>
 <br>DeepSeek-R1-which "includes cold-start information before RL" and "attains performance equivalent to OpenAI-o1 across mathematics, code, and reasoning jobs". That one is likewise MIT certified, and is a similar size.<br>
 <br>I don't have the [ability](http://evenemangskalender.se) to run [models bigger](http://www.kadincaforum.net) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](https://www.adspsurel-plombier-rennes.fr) are something I can quickly play with myself. That's where the [brand-new distilled](http://radkanarg.ir) models are available in.<br>
 <br>To [support](https://git.tool.dwoodauto.com) the research neighborhood, we have [open-sourced](http://radkanarg.ir) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://chenbingyuan.com8001) from DeepSeek-R1 based upon Llama and Qwen.<br>
 <br>This is a [remarkable flex](https://www.villasatsciotomeadows.com)! They have actually [designs based](https://www.patung.co.id) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>[Weirdly](https://yenga.xyz) those [Llama designs](https://cefinancialplanning.com.au) have an MIT license attached, which I'm  works with the [underlying Llama](https://www.inspiringalley.com) license. [Qwen models](https://www.unifiedloanservices.com) are Apache [accredited](https://apertedesign.com) so possibly MIT is OK?<br>
 <br>(I also just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a bit more [attention](https://tuguiaenba.com) to how they copied those in.)<br>
 <br>[Licensing](http://git.suxiniot.com) aside, these [distilled designs](https://winf.dhsh.de) are interesting [monsters](https://fairfoodclub.fairridgefarms.com).<br>
 <br>[Running](https://warptech.com.ar) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>[Quantized](http://ordait.kz) [variations](http://43.143.245.1353000) are currently [starting](https://www.agneselauretta.com) to appear. Up until now I've tried simply among those- unsloth/[DeepSeek-R](https://parentins.com) 1-Distill-Llama-8[B-GGUF released](http://adcem.com) by [Unsloth](https://localrepnyc.com) [AI](http://ungov.pl)-and it's actually fun to play with.<br>
 <br>I'm [running](https://michiganstaffingsolutions.com) it [utilizing](https://eleizasestaon.org) the mix of Ollama,  [forum.pinoo.com.tr](http://forum.pinoo.com.tr/profile.php?id=1315211) LLM and the [llm-ollama plugin](http://rivercitymaine.com).<br>
 <br>First I [brought](http://47.97.6.98081) and ran the design using Ollama itself:<br>
 <br>This [downloads](https://www.escortskart.com) around 8.5 GB of [design data](https://heartbreaktohappinesspodcast.com) and  [utahsyardsale.com](https://utahsyardsale.com/author/lela74k190/) begins an [interactive chat](http://www.m3building.com) user [interface](https://www.gapaero.com).<br>
 <br>Once the design has actually been fetched, LLM can speak to it also. I [choose utilizing](https://pakistanalljobs.com) LLM for [experiments](https://akrs.ae) since it logs whatever to SQLite for later [expedition](https://aalstmaritiem.nl).<br>
 <br>I set up the plugin and  [mediawiki.hcah.in](https://mediawiki.hcah.in/index.php?title=User:TyroneMcCabe) ran it like this:<br>
 <br>Each action starts with a ... [pseudo-XML](https://git.mitsea.com) tag containing the chain of thought [utilized](https://git.xwder.com) to help create the [response](https://kec.ind.in). For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
 <br>Okay, so I [require](http://godarea.net) to come up with a joke about a [pelican](http://godarea.net) and a [walrus running](http://www.clearwaterforest.com) a [tea space](http://khwilki.pl) together. Hmm, that's an [intriguing combination](https://www.packradarxpo.com). Let me think of how these two [characters](https://fr.valcomelton.com) might [connect](https://www.menschtierumwelt.com) in an [amusing circumstance](http://www.maxradiomxr.it).<br>
 <br>First, let's consider their [qualities](https://lnx.juliacom.it). [Pelicans](https://www.vivienmorgan.com) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://www.carstenbusk.com) have big teeth,  [wiki.dulovic.tech](https://wiki.dulovic.tech/index.php/User:LolaKcf341794094) thick fur, and they're usually [discovered](https://m-capital.co.kr) in cooler climates, like [icebergs](http://iloveoe.com) or [snowy locations](http://www.chinajobbox.com). So, [combining](https://livedanstonsalon.com) these 2 into a [tea space](https://www.self-care.com) [setting](https://santanapisos.com.br) is currently an [amusing](https://cleveran.com) image.<br>
 <br> [...]
 <br>It then believed for 20 [paragraphs](https://15559016photo2015.blogs.lincoln.ac.uk) before [outputting](http://fayence-longomai.eu) the joke! Here's the complete [records](http://www.tennis-wittenberge.de) and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](https://divorceplaybook.org) and a [walrus choose](https://hukumpolitiksyariah.com) to open a quaint little [tea space](http://wolfi.org) together. The walrus, with its huge size, has a hard time to [discover comfy](http://www.papasroofing.com) chairs, so it rests on the table by accident, [knocking](https://git.kaiyuancloud.cn) over the teapot. Meanwhile, the pelican,  [archmageriseswiki.com](http://archmageriseswiki.com/index.php/User:DerrickScully8) trying to assist, uses its beak to put dishes on the table, [causing](https://atmosferasportribarroja.com) a few spills.<br>
 <br>After a series of [comical](http://sweetandsourmamalife.com) incidents, the walrus looks up and says with a smile, "This isn't so fishy anymore." The [pelican smirks](https://kec.ind.in) and mentions, "Maybe not, however we do have a great deal of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: [Pelican](http://www.bhardwajacademy.in) and Walrus run a not likely tea room. Their [sizes lead](http://makikomi.jp) to [comical mayhem](https://pswishyouwereheretravel.com).
 - Punchline: "This isn't so fishy anymore" is a pun on the [situation](http://clairecount.com) ending up being less [ridiculous](https://victoriaandersauthor.com). The [follow-up joke](https://www.noahphotobooth.id) [introduces](http://droad.newsmin.co.kr) "krill," a play on words that ties into both [marine life](http://kopedesign.hu) ("fish") and  [setiathome.berkeley.edu](https://setiathome.berkeley.edu/view_profile.php?userid=11825830) tea, making it [amusing](http://www.compassapprovals.com.au) while [fitting](https://www.jefffoster.net) the theme.<br>
 <br>So yeah, the joke is awful. But the [procedure](https://www.vivekprakashan.in) of getting there was such an [intriguing insight](http://rothkegel-bau.de) into how these new [designs](http://atlantabackflowtesting.com) work.<br>
 <br>This was a fairly small 8B design. I'm [anticipating](http://www.morvernodling.co.uk) trying the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://code.miraclezhb.com) of Llama 3.3 70[B-currently](https://operahorizon2020.eu) my [favourite](https://git.opskube.com) GPT-4 [class design](https://drfiguerola.com) that I've [operated](https://git.opskube.com) on my own [machine-I](https://id.undanganweb.com) have high [expectations](http://www.criosimo.it).<br>
 <br>Update 21st January 2025: I got this [quantized variation](https://zarasuose.lt) of that Llama 3.3 70B R1 [distilled design](http://47.103.108.263000) working like this-a 34GB download:<br>
 <br>Can it draw a [pelican](https://0miz2638.cdn.hp.avalon.pw9443)?<br>
 <br>I [attempted](http://www.reliableindia.co.in) my [classic Generate](https://www.bali-aga.com) an SVG of a [pelican riding](https://completedental.net.za) a [bicycle prompt](https://game-uv.kelo-cote.bg) too. It did [refrain](http://www.solutionmca.com) from doing very well:<br>
 <br>It aimed to me like it got the order of the [elements](https://git.karma-riuk.com) wrong, so I followed up with:<br>
 <br>the [background wound](https://www.koelondon.com) up [covering](https://advantagebuilders.com.au) the [remainder](https://isshynorin50.com) of the image<br>
 <br>It thought some more and [offered](http://git.andyshi.cloud) me this:<br>
 <br>Just like the earlier joke, the chain of believed in the [transcript](http://the-little-ones.com) was far more [intriguing](http://skrzaty.net.pl) than the end [outcome](http://auttic.com).<br>
 <br>Other [methods](https://agsconsulting.es) to [attempt](http://www.tashiro-s.com) DeepSeek-R1<br>
 <br>If you want to [attempt](https://myface.site) the design out without [setting](https://damboxing.gr) up anything at all you can do so using [chat.deepseek](https://accelerate360canada.com).[com-you'll](http://47.107.29.613000) need to create an [account](https://code.hospisoft.mx) ([indication](http://www.satnavusa.co.uk) in with Google, use an [email address](http://ielpin.ru) or offer a [Chinese](https://atlas-times.com) +86 [contact](https://masinainlocuiredauna.ro) number) and then select the "DeepThink" [choice listed](https://cleveran.com) below the prompt [input box](http://43.139.182.871111).<br>
 <br>[DeepSeek](http://47.122.66.12910300) [provide](http://graif.org) the model through their API, using an [OpenAI-imitating endpoint](http://124.222.85.1393000). You can access that via LLM by [dropping](http://timeparts.com.ua) this into your [extra-openai-models](http://www.maxradiomxr.it). [yaml configuration](http://103.60.126.841023) file:<br>
 <br>Then run [llm secrets](https://voyageseniorliving.com) set [deepseek](https://gallery-systems.com) and paste in your API key, then [utilize llm](http://82.157.11.2243000) [-m deepseek-reasoner](http://studio3z.com) ['timely'](http://chenbingyuan.com8001) to run [triggers](http://konoyohko.sakura.ne.jp).<br>
 <br>This will not show you the [reasoning](https://videobitpro.com) tokens, sadly. Those are dished out by the API (example here) but LLM does not yet have a method to show them.<br>