How to Deploy Open-source LLMs on Azure and GCP?
Azure
To use Open-source models like Llama or Mistral with allms, first you have to deploy it on your own on Azure as a ML Online Endpoint. Here's how to do it:
- Go to ml.azure.com and use a subscription with a workspace that has access to the
Model catalog. - On the left click
Model catalog, then underIntroducing Llama 2clickView models. - Click the model you want to deploy.
- Click
Deploy -> Real-time endpoint. - Select
Skip Azure AI Content Safetyand clickProceed. - Select a virtual machine and click
Deploy. You must have sufficient quota to deploy the models. - In the menu on the left, click
Endpointsand select the endpoint you've just created. - After the deployment is complete, you'll see
Consumetab where the endpoint URL and authentication key will be provided. - Now you can start using the model by configuring it as in the example below:
from allms.models import AzureLlama2Model
from allms.domain.configuration import AzureSelfDeployedConfiguration
configuration = AzureSelfDeployedConfiguration(
api_key="<AZURE_API_KEY>",
endpoint_url="<AZURE_ENDPOINT_URL>",
deployment="<AZURE_DEPLOYMENT_NAME>"
)
llama_model = AzureLlama2Model(config=configuration)
llama_response = llama_model.generate("2+2 is?")
In case of any problems with deployment, you can review this guide on the Azure blog: Introducing Llama 2 on Azure
GCP
Follow the following guide to deploy a model on the GCP VertexAI Model Garden.