DISCLAIMER: Image is generated using ChatGPT
.
1. Introduction
2. What’s Ollama?
3. Install Ollama
4. Run Ollama
5. FastAPI
6. Mock Bedrock - Python
7. Mock Bedrock - Perl
8. Uvicorn
9. Mock Test
Introduction
Recently I was introduced to AWS Bedrock
at work. Ever since, I’ve had many questions around it.
In this post, I am sharing my first encounter.
AWS Bedrock
is a fully managed service offered by Amazon Web Services
.
It provides access to high-performing Foundation Models
from leading AI
companies through a single API
.
Using AWS Bedrock
we can build and scale generative AI
applications without managing infrastructure.
It provides access to a variety of Foundation Models
via API
, for example:
Anthropic (Claude)
AI21 Labs (Jurassic)
Meta (Llama 2 & Llama 3)
Mistral
Cohere
Stability AI
Amazon Titan
The Bedrock
endpoints are regional.
https://bedrock.{region}.amazonaws.com
Below is a simple example invoking Mistral 7B
model:
import boto3
bedrock = boto3.client('bedrock', region_name='us-west-2')
response = bedrock.invoke_model(
modelId='mistral.mistral-7b-instruct-v0:2',
contentType='application/json',
accept='application/json',
body=b'''
{
"prompt": "<s>[INST] What is the capital of India? [/INST]",
"max_tokens": 100,
"temperature": 0.7
}
'''
)
print(response['body'].read().decode('utf-8'))
AWS Bedrock
is not included in the general Free Tier
.
I have used LocalStack for AWS
services e.g. S3
, Lambda
, DynamoDB
but unfortunately it doesn’t support AWS Bedrock
yet.
Having said, we can build Bedrock
stack locally using Ollama
.
What’s Ollama?
Ollama
is an open-source tool designed to run Large Language Models (LLMs)
locally on your machine.
It simplifies the process of downloading, managing, and interacting with models like Llama 3
, Mistral
, Gemma
without requiring cloud services.
Install Ollama
$ curl -fsSL https://ollama.com/install.sh | sh
Verify the installation:
$ ollama --version
ollama version is 0.9.0
Run Ollama
This starts the background server, if not already running, and opens up interactive chat session.
$ ollama run mistral
To exit, Ctrl + d
.
The model API can be reached here: http://localhost:11434
.
To list the installed models, try this:
$ ollama list
NAME ID SIZE MODIFIED
mistral:latest f974a74358d6 4.1 GB 5 minutes ago
Test if the server is running:
$ curl http://localhost:11434
Ollama is running
On system reboot, you need to start the server in the background again.
$ ollama serve &
FastAPI
FastAPI
is a web framework for building APIs in Python
.
You need to activate virtual environment before installing fastapi
.
(myenv) $ pip install fastapi
We need to install requests
package too.
(myenv) $ pip install requests
Mock Bedrock - Python
File: mock_bedrock.py
#!/usr/bin/env python3
import requests
from fastapi import FastAPI, Request
app = FastAPI()
OLLAMA_URL = "http://localhost:11434/api/generate"
@app.post("/model/ollama.mistral/invoke")
async def mock_bedrock(request: Request):
body = await request.json()
prompt = body.get("input", "")
response = requests.post(OLLAMA_URL, json={
"model": "mistral",
"prompt": prompt,
"stream": False
})
result = response.json()
return {
"model": "ollama.mistral",
"response": result.get("response", "")
}
Mock Bedrock - Perl
A simple PSGI
(Perl Server Gateway Interface) application using Dancer2
.
File: mock_bedrock.pl
#!/usr/bin/env perl
use v5.38;
use Dancer2;
use boolean ();
use HTTP::Tiny;
use JSON::MaybeXS ();
my $json_encoder = JSON::MaybeXS->new(allow_blessed => 1, convert_blessed => 1);
my $ollama_url = 'http://localhost:11434/api/generate';
post '/model/ollama.mistral/invoke' => sub {
my $data = request->body;
my $json = JSON::MaybeXS::decode_json($data);
my $prompt = $json->{input} // '';
my $http = HTTP::Tiny->new;
my $response = $http->post($ollama_url, {
headers => { 'Content-Type' => 'application/json' },
content => $json_encoder->encode({
model => 'mistral',
prompt => $prompt,
stream => boolean::false,
}),
});
if ($response->{success}) {
my $res_json = JSON::MaybeXS::decode_json($response->{content});
content_type 'application/json';
return $json_encoder->encode({
model => 'ollama.mistral',
response => $res_json->{response} // ''
});
}
else {
status '500';
content_type 'application/json';
return $json_encoder->encode({ error => "Failed to contact Ollama API" });
}
};
return Dancer2->psgi_app;
Uvicorn
Uvicorn
(Ultra-Violeticorn) is a lightweight, ASGI
(Asynchronous Server Gateway Interface) server for Python
.
Activate virtual environment before installing uvicorn
.
(myenv) $ pip install uvicorn
Check the version:
$ uvicorn --version
Running uvicorn 0.27.1 with CPython 3.12.3 on Linux
Mock Test
Let’s start the ASGI
server.
$ python3 -B -m uvicorn mock_bedrock:app --reload
INFO: Will watch for changes in these directories: ['/home/manwar/playground/bedrock']
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [57429] using StatReload
INFO: Started server process [57431]
INFO: Waiting for application startup.
INFO: Application startup complete.
Now in another terminal, make a call to the server.
$ curl -X POST http://localhost:8000/model/ollama.mistral/invoke \
-H "Content-Type: application/json" \
-d '{"input": "What is the capital of India?"}'
{"model":"ollama.mistral","response":" The capital of India is New Delhi. Despite popular belief, it's important to note that Mumbai (formerly known as Bombay) is not the capital city, but rather its financial and entertainment hub. This common misconception might stem from the fact that prior to 1947, Bombay was the capital of British India, while New Delhi was just a small town outside of Delhi."}
Stop the ASGI
server and start the PSGI
server now with Plack
(a PSGI
toolkit):
Plack
is the most widely used PSGI
implementation.
$ plackup -p 8000 mock_bedrock.pl
HTTP::Server::PSGI: Accepting connections at http://0:8000/
Let’s make the same call again.
$ curl -X POST http://localhost:8000/model/ollama.mistral/invoke \
-H "Content-Type: application/json" \
-d '{"input": "What is the capital of India?"}'
{"model":"ollama.mistral","response":" The capital of India is New Delhi. However, it's important to note that New Delhi is not a state or union territory but serves as the administrative capital for the country. India has 28 states and 9 union territories."}
Happy Hacking !!!