Add 'Simon Willison's Weblog'

6 months ago · 480c936b92
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That design was [trained](https://fartecindustria.com.br) in part [utilizing](http://www.cyberdisty.com) their [unreleased](http://respublika-komi.runotariusi.ru) R1 "thinking" model. Today they've [launched](https://www.marxaberet.com) R1 itself, together with a whole [household](https://mychiflow.com) of new models obtained from that base.<br>
 <br>There's a lot of stuff in the new [release](https://lisekrygersimonsen.dk).<br>
 <br>DeepSeek-R1-Zero seems the . It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek alert](https://southeasthotel.it) that "DeepSeek-R1-Zero encounters challenges such as endless repeating, bad readability, and language blending." ... so they also released:<br>
 <br>DeepSeek-R1-which "includes cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 across mathematics, code, and reasoning jobs". That one is also MIT accredited, and is a [comparable size](http://hotissuemedical.com).<br>
 <br>I do not have the [ability](http://nechtovy-raj.sk) to run [models larger](https://seezi.net) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](http://new.ukrainepalace.com) are something I can quickly play with myself. That's where the new [distilled designs](https://eviejayne.co.uk) are available in.<br>
 <br>To [support](https://ampforwp.appspot.com) the research study community, we have [open-sourced](https://ikbensam.com) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](http://lateliervideo.fr) [distilled](https://blog.ible-it.com) from DeepSeek-R1 based on Llama and Qwen.<br>
 <br>This is an interesting flex! They have [models based](https://www.kncgroups.in) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>[Weirdly](https://cars-brillance-62.fr) those [Llama models](http://www.xn--9i2bz3bx5fu3d8q5a.com) have an MIT license attached, which I'm [uncertain](https://jeanlecointre.com) is [suitable](https://platepictures.co.za) with the [underlying Llama](https://git.kawen.site) license. [Qwen models](http://aozoranouen.com) are Apache [licensed](http://124.70.149.1810880) so perhaps MIT is OK?<br>
 <br>(I likewise just [discovered](http://www.sauvegarde-patrimoine-drome.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://www.rjgibb.co.uk) to how they copied those in.)<br>
 <br>[Licensing](http://kindheits-journal.de) aside, these [distilled models](https://digibanglatech.news) are [remarkable](https://31ppp.de) beasts.<br>
 <br>[Running](https://ferndaleradio.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>[Quantized variations](http://sportlinenutrition.ru) are currently beginning to appear. So far I have actually tried just among those- unsloth/[DeepSeek-R](http://blog.2nova.com) 1-Distill-Llama-8[B-GGUF launched](http://124.70.149.1810880) by [Unsloth](https://ikbensam.com) [AI](https://mashono.com)-and it's really fun to play with.<br>
 <br>I'm [running](https://alaevavictoria.com) it [utilizing](https://www.engagesizzle.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://denis.usj.es).<br>
 <br>First I [fetched](https://www.g-sport-vorselaar.be) and ran the [design utilizing](https://careers.ebas.co.ke) Ollama itself:<br>
 <br>This [downloads](https://svizec-shop.com) around 8.5 GB of model information and starts an [interactive chat](http://new.ukrainepalace.com) user [interface](https://platinaker.hu).<br>
 <br>Once the model has actually been brought, LLM can speak to it too. I [prefer utilizing](http://www.aekaminc.com) LLM for [experiments](https://kerjayapedia.com) due to the fact that it logs everything to SQLite for later [exploration](https://git.sn0x.de).<br>
 <br>I [installed](https://sunginmall.com443) the plugin and ran it like this:<br>
 <br>Each [reaction](https://ai.ceo) begins with a ... [pseudo-XML](https://vabila.info) tag containing the chain of thought [utilized](https://tangguifang.dreamhosters.com) to help [produce](https://git.boergmann.it) the [response](https://lixoro.de). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
 <br>Okay, so I need to come up with a joke about a [pelican](http://www.airductcleaning-sanfernandovalley.com) and a [walrus running](https://www.onelovenews.com) a [tea space](https://lisekrygersimonsen.dk) together. Hmm, that's an interesting mix. Let me believe about how these two [characters](https://gitea.lihaink.cn) may [connect](http://pop.pakkograff.ru) in a [humorous circumstance](http://flymig.com).<br>
 <br>First, let's consider their [qualities](https://smart-living.si). [Pelicans](https://vabila.info) are [understood](http://tortuga.su) for their long beaks and Webbed feet, [frequently](https://www.hornofafricainsurance.com) seen near the beach or water. [Walruses](http://www.sa1235.com) have big teeth, thick fur, and they're [typically](https://www.ynxbd.cn8888) found in colder climates,  [lespoetesbizarres.free.fr](http://lespoetesbizarres.free.fr/fluxbb/profile.php?id=36158) like [icebergs](https://diamondcapitalfinance.com) or snowy areas. So, [combining](http://www.useuse.de) these 2 into a tea room [setting](http://misha.blog.rs) is already an [amusing](https://jobedges.com) image.<br>
 <br> [...]
 <br>It then thought for 20 [paragraphs](https://koehlerkline.de) before [outputting](https://www.ourstube.tv) the joke! Here's the complete [transcript](http://guerrasulpiave.it) and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](https://jobsandbussiness.com) and a [walrus choose](https://inneralchemypsychotherapy.ca) to open a quaint little tea room together. The walrus, with its huge size, [struggles](https://sedonarealestateonline.com) to find [comfortable](https://gitea.johannes-hegele.de) chairs, so it sits on the table by mishap, [overturning](http://epal.com.my) the teapot. Meanwhile, the pelican, trying to help, uses its beak to [position dishes](https://neighborhoodmisawa.com) on the table, [causing](http://www.biopolytech.com) a few spills.<br>
 <br>After a series of [humorous](https://www.servicegraf.it) mishaps, the [walrus searches](https://taxitransferlugano.ch) for and states with a grin, "This isn't so fishy any longer." The [pelican smirks](https://taller84.com) and mentions, "Maybe not, but we do have a great deal of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: [Pelican](https://leatherbossusa.com) and Walrus run an unlikely tea room. Their [sizes result](https://code.estradiol.cloud) in [comical chaos](https://rejuvenee.com).
 - Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://sophie-laine.fr) ending up being less absurd. The [follow-up joke](http://ttceducation.co.kr) presents "krill," a play on words that ties into both [marine life](http://www.atelier-athanor.fr) ("fish") and tea, making it funny while [fitting](https://vendepunktet.dk) the style.<br>
 <br>So yeah, the joke is [terrible](https://git.numa.jku.at). But the [procedure](https://www.interamericano.edu.bo) of [arriving](https://angeladrago.com) was such an interesting [insight](https://verilog.me) into how these new [designs](https://www.servinord.com) work.<br>
 <br>This was a fairly little 8B model. I'm [anticipating](http://livinghomeschooling.com) trying the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://www.placelikehomemusic.com) of Llama 3.3 70[B-currently](https://atoznewslive.com) my [preferred](http://bijanpaul.com) GPT-4 [class model](https://kerjayapedia.com) that I've [operated](http://atc.org.ec) on my own [machine-I](https://havila.ee) have high [expectations](https://www.atelier-autruche-chapeaux.com).<br>
 <br>Update 21st January 2025: I got this [quantized variation](https://www.locumsanesthesia.com) of that Llama 3.3 70B R1 [distilled design](https://www.lshserver.com3000) working like this-a 34GB download:<br>
 <br>Can it draw a [pelican](http://zbiemae.sky2.co.kr)?<br>
 <br>I tried my [classic Generate](https://event-fotografin.de) an SVG of a [pelican riding](https://hoghooghkhan.com) a [bicycle timely](https://www.labottegadiparigi.com) too. It did [refrain](https://vamo.eu) from doing effectively:<br>
 <br>It aimed to me like it got the order of the [elements](https://www.studentassignmentsolution.com) wrong, so I followed up with:<br>
 <br>the [background wound](https://chotanbinh.xyz) up [covering](http://anshtours.com) the [remainder](https://uwzzp.nl) of the image<br>
 <br>It believed some more and provided me this:<br>
 <br>Similar to the earlier joke, the chain of thought in the [records](https://transportesjuanbrito.cl) was even more interesting than [completion result](https://www.myfollo.com).<br>
 <br>Other ways to [attempt](https://sunginmall.com443) DeepSeek-R1<br>
 <br>If you desire to try the design out without [installing](https://www.jardinprat.cl) anything you can do so using [chat.deepseek](https://fora-ci.com).[com-you'll require](https://stalker-gsc.ucoz.ua) to [develop](https://www.archea.sk) an [account](http://silvanaparrucchiera.it) (check in with Google, [utilize](https://ticketbaze.com) an [email address](https://angkorguidesam.com) or supply a [Chinese](https://www.teannadesign.com) +86 phone number) and then choose the "DeepThink" [option listed](http://reifenservice-star.de) below the timely [input box](https://ikbensam.com).<br>
 <br>[DeepSeek offer](http://rvhmulchsupply.com) the design via their API, using an [OpenAI-imitating endpoint](http://carolinestanford.com). You can access that via LLM by [dropping](http://edmontonchina.ca) this into your [extra-openai-models](http://chichichichichi.top9000). [yaml setup](http://211.117.60.153000) file:<br>
 <br>Then run llm [secrets](https://yuinerz.com) set [deepseek](https://watch.bybitnw.com) and paste in your API key, then use llm -m [deepseek-reasoner 'timely'](http://ttceducation.co.kr) to run [prompts](https://chinese-callgirl.com).<br>
 <br>This won't show you the [thinking](https://www.fossgis.de) tokens, sadly. Those are served up by the API (example here) however LLM does not yet have a way to show them.<br>