=20 =20 =20 We=E2=80=99ve launched cached models for Runpod Serve= rless, an easier way to deploy Hugging Face models without baking weights i= nto your container image. When you spe= cify a Hugging Face model during endpoint setup, Runpod will: - Prefer hosts that already have the mod= el cached locally, so workers can start quickly when the model is already p= resent.
- If the model isn=E2=80=99t cached yet = on the target host, we download it before starting the worker and you aren= =E2=80=99t billed during the download phase.
Why use cached models - Workers start serving faster when mode= l weights are already on the machine
- More predictable deployments (no =E2= =80=9Cstart worker then wait on downloads=E2=80=9D)
- Smaller images (your container can foc= us on app + runtime, not weights)
- Less redundant storage on a host (mult= iple workers can reuse the same cached model)
How this fits with existing workflows Cached model= s are a great default when your model lives on Hugging Face (public, gated = with HF token, or private with HF token). If your mode= l is not on Hugging Face or you need a fully self-contained artifact, baked= images and network volumes remain supported and can still be the right cho= ice. Getting started Serverless = =E2=86=92 New Endpoint =E2=86=92 add your model under Model. Existing end= points: Manage =E2=86=92 Edit Endpoint =E2=86=92 Model. To learn mor= e, . | | | Coming soon: We're building a better way to manage y= our private and fine-tuned models inside Runpod. Import, version, and deplo= y without leaving the platform. | = When a Pod is stopped, the underlying GPU becomes av= ailable to other customers. If that GPU gets claimed before you restart, yo= u're left with the "0 GPUs available" message and have to manually transfer= data to a new machine using tools like runpodctl or rsync. That process ca= n take hours. Pod migratio= n changes that. You can now move your data to a GPU-ready machine in minute= s with a single click. When you try to restart a Pod and the original GPU i= sn't available, just select "Automatically migrate your Pod data" and Runpo= d handles the rest. It provisions a new Pod with the same config and transf= ers your Pod volume automatically. No more baby= sitting file transfers and no more lost momentum. for more details. | = We've made major improvements to how images are inge= sted into our registry. After identifying a bottleneck in the implementatio= n that caused stalls during large layer uploads, we reworked the ingestion = pipeline. You should now see images upload up to 5x faster, with support fo= r layers greater than 25GB without timing out. | <= /table> | Resemble AI'= s new text-to-speech model is now available on Runpod Public Endpoints, fre= e to use all week. Clone your voice in 5 seconds, generate speech faster th= an real time with near-zero latency, and add natural modifiers like sighs, = laughs, and coughs. | = This month we added six new public endpoints = including Sora, Granite 4.0, and Cogito 671B. We also launched load balanci= ng for Serverless, giving you granular control over request routing, and ma= de billing improvements to better protect your data during payment interrup= tions. | <= div id=3D"hs_cos_wrapper_module_176599927793911_" class=3D"hs_cos_wrapper h= s_cos_wrapper_widget hs_cos_wrapper_type_rich_text" style=3D"color: inherit= ; font-size: inherit; line-height: inherit;" data-hs-cos-general-type=3D"wi= dget" data-hs-cos-type=3D"rich_text"> Open source had a big first half of December.= Mistral 3 and Nvidia's Nemotron 3 both dropped with full open weights, and= Nvidia acquired SchedMD to deepen its investment in Slurm. For teams runni= ng training and inference on their own infrastructure, these releases expan= d what's possible without API constraints. | =20 =20 =20 =20 If you h= ave questions about Runpod or want to reach our sales team, book a call wit= h us | | Runpod Inc., 1181 Nixon Drive #1158, Moorestown, NJ 08057, United= States | =20 |