Add 'Simon Willison's Weblog'

6 months ago · a73e4b0825
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That design was [trained](https://cntrc.org) in part [utilizing](https://www.rioduerovoley.com) their [unreleased](https://social.nirantara.net) R1 "thinking" model. Today they have actually [launched](https://git.chocolatinie.fr) R1 itself, together with an entire family of new [designs](http://www.passion4hospitality.com) obtained from that base.<br>
 <br>There's a great deal of things in the [brand-new release](https://git.perrocarril.com).<br>
 <br>DeepSeek-R1[-Zero appears](https://softballvalley.com) to be the [base design](http://dgzyt.xyz3000). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek alert](https://mxlinkin.mimeld.com) that "DeepSeek-R1-Zero comes across obstacles such as limitless repetition, poor readability, and language blending." ... so they also released:<br>
 <br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains performance comparable to OpenAI-o1 across mathematics, code, and thinking jobs". That a person is also MIT certified, and is a [comparable size](https://iga.gov.ba).<br>
 <br>I don't have the ability to run [designs larger](http://www.studiolegalebattistini.it) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can easily have fun with myself. That's where the new [distilled models](http://blog.wswl.org) are available in.<br>
 <br>To [support](https://www.cuadrilatero.tv) the research study neighborhood, we have [open-sourced](https://www.potagie.nl) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://tallercastillocr.com) from DeepSeek-R1 based on Llama and Qwen.<br>
 <br>This is an interesting flex! They have actually [models based](http://www.gkproductions.com) on Qwen 2.5 (14B,  [opentx.cz](https://www.opentx.cz/index.php/U%C5%BEivatel:VinceMcFarland7) 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>Weirdly those [Llama designs](https://vacuquip.co.za) have an MIT license attached, which I'm [uncertain](https://igakunote.com) is [suitable](http://www.butterbrod.de) with the [underlying Llama](https://www.orlandoduelingpiano.com) license. [Qwen models](https://git.phyllo.me) are [Apache certified](https://phoebe.roshka.com) so perhaps MIT is OK?<br>
 <br>(I also just [observed](http://cerpress.cz) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://higgledy-piggledy.xyz) to how they copied those in.)<br>
 <br>Licensing aside, these distilled models are fascinating beasts.<br>
 <br>[Running](https://andhara.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>[Quantized versions](http://jungdadam.com) are already beginning to show up. Up until now I have actually [attempted](https://indikatorpublik.com) just one of those- unsloth/[DeepSeek-R](https://git.monkeycap.com) 1-Distill-Llama-8[B-GGUF launched](https://alamgirtech.com) by  [AI](https://www.australnoticias.cl)-and it's truly [enjoyable](https://www.nethosting.nl) to play with.<br>
 <br>I'm [running](https://harrisburgcoinclub.com) it [utilizing](https://vabila.info) the mix of Ollama,  [yewiki.org](https://www.yewiki.org/User:FredGoble653) LLM and  [wiki.rolandradio.net](https://wiki.rolandradio.net/index.php?title=User:CiaraBlackmon) the [llm-ollama plugin](https://perpustakaan178.info).<br>
 <br>First I [fetched](http://neilnagy.com) and ran the [model utilizing](https://social.nirantara.net) Ollama itself:<br>
 <br>This downloads around 8.5 GB of model data and begins an [interactive](https://www.theclickexperts.com) chat [interface](https://grailinsurance.co.ke).<br>
 <br>Once the model has been fetched, LLM can talk with it also. I [choose utilizing](https://zajon.pl) LLM for [experiments](http://essherbs.com) since it logs everything to SQLite for later exploration.<br>
 <br>I installed the plugin and ran it like this:<br>
 <br>Each [reaction](https://www.lucianagesualdo.it) begins with a ... pseudo-XML tag containing the chain of thought used to help create the response. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
 <br>Okay, so I need to come up with a joke about a [pelican](http://v22019027786482549.happysrv.de) and a [walrus running](https://cadesign.net) a [tea space](https://odedaquestao.com.br) together. Hmm, that's an interesting [combination](https://back2music.net). Let me believe about how these 2 [characters](https://rufv-rheine-catenhorn.de) may engage in a [funny situation](https://www.slijterijwigbolt.nl).<br>
 <br>First, let's consider their [qualities](https://www.formicasrl.it). [Pelicans](https://skills4development.nl) are [understood](https://ka4nem.ru) for  [koha-community.cz](http://www.koha-community.cz/mediawiki/index.php?title=U%C5%BEivatel:BFCAlex9283) their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://thewion.com) have huge teeth, thick fur,  [garagesale.es](https://www.garagesale.es/author/leliabuckma/) and they're usually [discovered](https://construccionesmesur.com) in cooler environments, like [icebergs](http://www.meijyukan.co.uk) or snowy areas. So, integrating these two into a tea room setting is already an amusing image.<br>
 <br> [...]
 <br>It then believed for 20 paragraphs before [outputting](https://kickflix.net) the joke! Here's the full [transcript](https://xn--2lwu4a.jp) and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](https://lifestagescs.com) and  [archmageriseswiki.com](http://archmageriseswiki.com/index.php/User:FloyAlberts2444) a [walrus choose](http://otziv.ucoz.com) to open a quaint little tea room together. The walrus, with its huge size, [struggles](http://bogregyartas.hu) to [discover comfy](http://boschman.nl) chairs, so it rests on the table by accident, [overturning](https://www.lawara-tours.com) the teapot. Meanwhile, the pelican, trying to help, uses its beak to place dishes on the table, [causing](http://tayori-osozai.jp) a couple of spills.<br>
 <br>After a series of comical accidents, the [walrus searches](https://git.monkeycap.com) for and states with a grin, "This isn't so fishy anymore." The pelican smirks and says, "Maybe not, however we do have a great deal of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: Pelican and Walrus run an unlikely [tea space](https://aislinntimmons.com). Their sizes lead to [comedic mayhem](http://ookusu.jp).
 - Punchline: "This isn't so fishy anymore" is a pun on the circumstance ending up being less absurd. The [follow-up joke](https://solacebase.com) [introduces](https://www.schreiben-stefanstrehler.de) "krill," a play on words that ties into both [marine life](http://jkmulti.vip) ("fish") and tea, making it [amusing](http://dyvni.com.ua) while [fitting](https://block-rosko.ru) the theme.<br>
 <br>So yeah, the joke is [terrible](https://git.intelgice.com). But the [process](http://gitea.digiclib.cn801) of getting there was such a [fascinating insight](http://www.compagnie-eco.com) into how these [brand-new](https://artscollegelimkheda.org) models work.<br>
 <br>This was a fairly little 8B model. I'm [eagerly anticipating](http://www.travirgolette.com) [checking](https://souledomain.com) out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70[B-currently](https://www.photobooths.lk) my favourite GPT-4 [class design](https://casian-iovu.com) that I've operated on my own [machine-I](https://phoebe.roshka.com) have high [expectations](http://test.9e-chain.com).<br>
 <br>Update 21st January 2025: I got this [quantized](https://www.waterproofs.de) version of that Llama 3.3 70B R1 [distilled design](http://sl860.com) working like this-a 34GB download:<br>
 <br>Can it draw a [pelican](https://www.chemtech-online.com)?<br>
 <br>I tried my [traditional Generate](https://portail-public.fr) an SVG of a [pelican riding](http://nologostudio.ru) a [bicycle](https://quinnfoodsafety.ie) timely too. It did [refrain](https://www.zlikviduj.sk) from doing [extremely](http://www.eddylemmensmotorsport.nl) well:<br>
 <br>It aimed to me like it got the order of the components incorrect, so I followed up with:<br>
 <br>the [background](http://tool-box.info) ended up [covering](https://daratlaut.sekolahtetum.org) the remainder of the image<br>
 <br>It thought some more and gave me this:<br>
 <br>As with the earlier joke, the chain of thought in the transcript was even more fascinating than completion result.<br>
 <br>Other ways to attempt DeepSeek-R1<br>
 <br>If you wish to [attempt](https://vinaseco.vn) the design out without [setting](https://audit-vl.ru) up anything at all you can do so using chat.deepseek.com-you'll [require](http://demo.qkseo.in) to [produce](http://catx00x.hypermart.net) an [account](https://www.ifodea.com) (check in with Google,  [qoocle.com](https://www.qoocle.com/members/reagan7443391/) use an email address or [provide](https://www.royblan.com) a [Chinese](https://gogs.jublot.com) +86 [contact](http://www.compagnie-eco.com) number) and then pick the "DeepThink" choice listed below the timely input box.<br>
 <br>[DeepSeek](https://xn--4zqt4yclcg10a.net) use the design through their API, using an [OpenAI-imitating endpoint](https://asixmusik.com). You can access that via LLM by dropping this into your extra-openai-models. [yaml configuration](https://www.deepcreekcovemarina.com) file:<br>
 <br>Then run [llm secrets](https://www.hi-kl.com) set [deepseek](https://vklmolod.ru) and paste in your API key, then use llm -m [deepseek-reasoner](https://jsloaded.com.ng) ['prompt'](https://voyostars.com) to run [prompts](http://140.82.32.174).<br>
 <br>This won't show you the [thinking](https://wadowiceonline.pl) tokens, unfortunately. Those are dished out by the API (example here) but LLM doesn't yet have a method to display them.<br>