How to Deploy Open-source LLMs on Azure and GCP?
Azure
To use Open-source models like Llama or Mistral with allms, first you have to deploy it on your own on Azure as a ML Online Endpoint. Here's how to do it:
- Go to ml.azure.com and use a subscription with a workspace that has access to the
Model catalog
. - On the left click
Model catalog
, then underIntroducing Llama 2
clickView models
. - Click the model you want to deploy.
- Click
Deploy -> Real-time endpoint
. - Select
Skip Azure AI Content Safety
and clickProceed
. - Select a virtual machine and click
Deploy
. You must have sufficient quota to deploy the models. - In the menu on the left, click
Endpoints
and select the endpoint you've just created. - After the deployment is complete, you'll see
Consume
tab where the endpoint URL and authentication key will be provided. - Now you can start using the model by configuring it as in the example below:
from allms.models import AzureLlama2Model
from allms.domain.configuration import AzureSelfDeployedConfiguration
configuration = AzureSelfDeployedConfiguration(
api_key="<AZURE_API_KEY>",
endpoint_url="<AZURE_ENDPOINT_URL>",
deployment="<AZURE_DEPLOYMENT_NAME>"
)
llama_model = AzureLlama2Model(config=configuration)
llama_response = llama_model.generate("2+2 is?")
In case of any problems with deployment, you can review this guide on the Azure blog: Introducing Llama 2 on Azure
GCP
Follow the following guide to deploy a model on the GCP VertexAI Model Garden.