Mentionsy

evoilutioncast
evoilutioncast
06.08.2025 09:00

evoilutioncast 45: How to build a solid foundation for AI on VCF?

In this episode of the IT podcast - evoilutioncast, Maciej Lelusz speaks with Frank Denneman - a very AI person in VMware by  Broadcom. Frank plays a key role in the VCF division, where he shapes the roadmap for the Private AI Foundation in NVIDIA and heavily influences the division’s overall AI strategy.

✔️Fancy to know whether the VCF is an infrastructure for AI?

✔️Does that put an AI construct in the DC?

✔️The truth is that with AI, nothing is easy, but you can make it easier.

✔️As well as there are things to be approved by humans and things to be made by AI.

✔️Listen to the conversation to find out new trends in AI infrastructure as RAG or #vector database and many more.

🤝 Episode's Partner: VMware by Broadcom

𝓛𝓲𝓼𝓽 𝓸𝓯 𝓬𝓸𝓷𝓽𝓮𝓷𝓽:

AI on VCF (VMware Cloud Foundation) platform: what's all about?

vcf9 as a local hypervisor?

SaaS solution as a starter on the cloudfoundation platform

Platform, both for engineers and developers - what is this VCF platform?

cost spending tracking on private cloud platform

Retrieval Augmented Generation (RAG) is the most common use case

The most trending solutions on the market: summarizing based on augmented AI in healthcare

digestion pipeline of data by building vector database

Similarity search and embedding model: how does it work?

What gives the VCF platform to the organization as an open infrastructure model?

Next steps on VCF? Wider integration and an easier way of consuming a platform - giving the best way of consuming the resources that you have

🔔 Subskrybuj: https://bit.ly/sub_evoilutioncast

Szukaj w treści odcinka

Znaleziono 11 wyników dla "GPU"

Musisz być świadomy o kierowcy, kierowcy GPU, ponieważ jest wiele bibliotek, które są polegające na pewnej wersji tego kierowcy.

Wspomnieliśmy o DLVM, w którym zauważyliśmy wersję gpu w hypervisorze, połączyliśmy ją z gpu w VM, a następnie dajemy wersję gpu w virtual machine.

The spin-up time for a model, getting it from storage into GPU memory, is much shorter.

W zasadzie, gdzie jest on w galerii modelu, jaka jest moja infrastruktura, więc w zasadzie dostajemy z Veeam klasę, jak to jest GPU, to jest ilość replik, to jest CPU i to jest memoria.

Yes, on that cluster, on that GPU, on that host in the most efficient way.

Niemieckie rozwiązanie, ponieważ jest zbyt mało GPU-ów, głównie w przedsiębiorstwach, jest to, że mówisz, że zbieram jakieś DLVM z Jupyter Notebooka i zamiast ładowania modelu na ten DLVM, będę używał modelu runtime endpoint.

You spin it up in the model runtime and every Jupyter notebook environment just runs on a CPU but hits that model at that molecule who consumes a GPU.

But, you know, usually those infrastructures are extremely expensive, because of the equipment, GPUs, generally speaking, it's not the, it's not the cheapest hobby, let's say.

So you have to figure a different way of exposing the GPU or the model, the consuming factor, and then see what other services can I allow to consume that.

To znaczy, że kiedy wywołasz aplikację, zaczniesz ją włączyć do modelu API Endpoint, do modelu działającego na A lub mnóstwo GPU.

Like, oh, you have a 16,000 GPU cluster or a 24,000 GPU cluster to build large models.