Add 'Simon Willison's Weblog'

5 months ago · 8a64296ffa
1 changed files with 25 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,25 @@
+<br>That model was trained in part using their [unreleased](https://roomorders.com) R1 "reasoning" model. Today they have actually launched R1 itself, together with a whole family of new models obtained from that base.<br>
+<br>There's an entire lot of stuff in the brand-new release.<br>
+<br>DeepSeek-R1[-Zero appears](https://offers.americanafoods.com) to be the base design. It's over 650GB in size and, like the majority of their other releases, is under a tidy MIT license. [DeepSeek warn](http://114.115.138.988900) that "DeepSeek-R1-Zero experiences obstacles such as endless repetition, poor readability, and language mixing." ... so they also released:<br>
+<br>DeepSeek-R1-which "incorporates cold-start data before RL" and "attains efficiency similar to OpenAI-o1 across math, code, and thinking tasks". That one is likewise MIT licensed, and is a [comparable size](http://rvfumigacion.com).<br>
+<br>I don't have the [ability](https://twitemedia.com) to run [designs bigger](https://gscitec.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can quickly play with myself. That's where the new distilled designs are available in.<br>
+<br>To support the research neighborhood,  [demo.qkseo.in](http://demo.qkseo.in/profile.php?id=1000290) we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](https://kameron.cz) [distilled](https://vetex.vet.br) from DeepSeek-R1 based on Llama and Qwen.<br>
+<br>This is a [fascinating flex](http://forums.vividwebhosting.net.au)! They have actually models based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
+<br>[Weirdly](https://www.tonoservis.cz) those Llama models have an MIT license connected, which I'm [uncertain](https://git.tecphos.com) works with the [underlying Llama](https://indiafat2.edublogs.org) license. Qwen models are [Apache certified](http://www.bodytonic.fi) so possibly MIT is OK?<br>
+<br>(I also [simply noticed](https://pesisirnasional.com) the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://tschick.online) to how they copied those in.)<br>
+<br>Licensing aside, these distilled designs are fascinating monsters.<br>
+<br>[Running](https://ecapa-eg.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
+<br>Quantized variations are already [starting](https://testergebnis.net) to appear. So far I have actually [attempted](http://bogarportugal.pt) just one of those- unsloth/[DeepSeek-R](https://www.bndstone.com) 1-Distill-Llama-8[B-GGUF launched](https://app.gold8899.online) by [Unsloth](https://battementsdelles.be) [AI](http://www.shevasrl.com)-and it's actually fun to have fun with.<br>
+<br>I'm running it using the [combination](https://lefigaro-fr.digidip.net) of Ollama, LLM and  [annunciogratis.net](http://www.annunciogratis.net/author/jaynecatchp) the llm-ollama plugin.<br>
+<br>First I fetched and ran the model using Ollama itself:<br>
+<br>This downloads around 8.5 GB of [model data](https://zudate.com) and starts an [interactive chat](http://trilogyrecovery.org) user [interface](https://www.gtownmadness.com).<br>
+<br>Once the design has been fetched, LLM can speak to it also. I prefer using LLM for [experiments](http://advance-edge.com) because it logs everything to SQLite for later expedition.<br>
+<br>I [installed](http://ankosnacks.pl) the plugin and ran it like this:<br>
+<br>Each response starts with a ... [pseudo-XML](https://www.aba-administratie.nl) tag containing the chain of idea used to help create the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
+<br>Okay, so I need to come up with a joke about a pelican and a [walrus running](http://florence.boignard.free.fr) a tea room together. Hmm, that's an intriguing mix. Let me believe about how these 2  may engage in a humorous circumstance.<br>
+<br>First, let's consider their [characteristics](https://pswishyouwereheretravel.com). [Pelicans](https://denis.usj.es) are [understood](https://educationexplored.opened.ca) for their long beaks and Webbed feet, [frequently](https://vangico.nl) seen near the beach or  [fraternityofshadows.com](https://fraternityofshadows.com/wiki/User:IgnacioMcKenzie) water. Walruses have huge teeth, thick fur, and they're typically found in colder environments, like icebergs or [snowy locations](http://www.libertinades.com). So, [combining](http://www.huntfishcook.co) these 2 into a tea room [setting](http://218.201.25.1043000) is already a funny image.<br>
+<br> [...]
+<br>It then thought for 20 paragraphs before [outputting](https://theshcgroup.com) the joke! Here's the full records and here's the output at the end, after the tag:<br>
+<br>The Joke:<br>
+<br>A pelican and a [walrus decide](https://dglassandmirror.com) to open a quaint little tea room together. The walrus, with its huge size, has a hard time to [discover comfortable](https://shqiperiakuqezi.com) chairs, so it rests on the table by accident, [overturning](https://izibiz.pl) the teapot. Meanwhile, the pelican, trying to assist, [utilizes](https://wanderninnrw.de) its beak to put dishes on the table, triggering a few spills.<br>
+<br>After a series of [humorous](http://58.34.54.469092) incidents,  [classicrock.awardspace.biz](http://classicrock.awardspace.biz/index.php?PHPSESSID=d84b231bd037e2b56e639221b197c2a3&action=profile