AWS Bedrock

DISCLAIMER: Image is generated using `ChatGPT`.

1. Introduction

2. What’s Ollama?

3. Install Ollama

4. Run Ollama

5. FastAPI

6. Mock Bedrock - Python

7. Mock Bedrock - Perl

8. Uvicorn

9. Mock Test

Introduction

Recently I was introduced to AWS Bedrock at work. Ever since, I’ve had many questions around it.

In this post, I am sharing my first encounter.

AWS Bedrock is a fully managed service offered by Amazon Web Services.

It provides access to high-performing Foundation Models from leading AI companies through a single API.

Using AWS Bedrock we can build and scale generative AI applications without managing infrastructure.

It provides access to a variety of Foundation Models via API, for example:

Anthropic (Claude)
AI21 Labs (Jurassic)
Meta (Llama 2 & Llama 3)
Mistral
Cohere
Stability AI
Amazon Titan

The Bedrock endpoints are regional.

https://bedrock.{region}.amazonaws.com

Below is a simple example invoking Mistral 7B model:

import boto3

bedrock  = boto3.client('bedrock', region_name='us-west-2')
response = bedrock.invoke_model(
    modelId='mistral.mistral-7b-instruct-v0:2',
    contentType='application/json',
    accept='application/json',
    body=b'''
    {
        "prompt": "<s>[INST] What is the capital of India? [/INST]",
        "max_tokens": 100,
        "temperature": 0.7
    }
    '''
)

print(response['body'].read().decode('utf-8'))

AWS Bedrock is not included in the general Free Tier.

I have used LocalStack for AWS services e.g. S3, Lambda, DynamoDB but unfortunately it doesn’t support AWS Bedrock yet.

Having said, we can build Bedrock stack locally using Ollama.

What’s Ollama?

Ollama is an open-source tool designed to run Large Language Models (LLMs) locally on your machine.

It simplifies the process of downloading, managing, and interacting with models like Llama 3, Mistral, Gemma without requiring cloud services.

Install Ollama

$ curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

$ ollama --version
ollama version is 0.9.0

Run Ollama

This starts the background server, if not already running, and opens up interactive chat session.

$ ollama run mistral

To exit, Ctrl + d.

The model API can be reached here: http://localhost:11434.

To list the installed models, try this:

$ ollama list
NAME              ID              SIZE      MODIFIED
mistral:latest    f974a74358d6    4.1 GB    5 minutes ago

Test if the server is running:

$ curl http://localhost:11434
Ollama is running

On system reboot, you need to start the server in the background again.

$ ollama serve &

FastAPI

FastAPI is a web framework for building APIs in Python.

You need to activate virtual environment before installing fastapi.

(myenv) $ pip install fastapi

We need to install requests package too.

(myenv) $ pip install requests

Mock Bedrock - Python

File: mock_bedrock.py

#!/usr/bin/env python3

import requests
from fastapi import FastAPI, Request

app        = FastAPI()
OLLAMA_URL = "http://localhost:11434/api/generate"

@app.post("/model/ollama.mistral/invoke")
async def mock_bedrock(request: Request):
    body   = await request.json()
    prompt = body.get("input", "")

    response = requests.post(OLLAMA_URL, json={
        "model": "mistral",
        "prompt": prompt,
        "stream": False
    })

    result = response.json()
    return {
        "model": "ollama.mistral",
        "response": result.get("response", "")
    }

Mock Bedrock - Perl

A simple PSGI (Perl Server Gateway Interface) application using Dancer2.

File: mock_bedrock.pl

#!/usr/bin/env perl

use v5.38;
use Dancer2;
use boolean ();
use HTTP::Tiny;
use JSON::MaybeXS ();

my $json_encoder = JSON::MaybeXS->new(allow_blessed => 1, convert_blessed => 1);
my $ollama_url   = 'http://localhost:11434/api/generate';

post '/model/ollama.mistral/invoke' => sub {
    my $data     = request->body;
    my $json     = JSON::MaybeXS::decode_json($data);
    my $prompt   = $json->{input} // '';
    my $http     = HTTP::Tiny->new;
    my $response = $http->post($ollama_url, {
        headers => { 'Content-Type' => 'application/json' },
        content => $json_encoder->encode({
            model  => 'mistral',
            prompt => $prompt,
            stream => boolean::false,
        }),
    });

    if ($response->{success}) {
        my $res_json = JSON::MaybeXS::decode_json($response->{content});
        content_type 'application/json';
        return $json_encoder->encode({
            model    => 'ollama.mistral',
            response => $res_json->{response} // ''
        });
    }
    else {
        status '500';
        content_type 'application/json';
        return $json_encoder->encode({ error => "Failed to contact Ollama API" });
    }
};

return Dancer2->psgi_app;

Uvicorn

Uvicorn (Ultra-Violeticorn) is a lightweight, ASGI (Asynchronous Server Gateway Interface) server for Python.

Activate virtual environment before installing uvicorn.

(myenv) $ pip install uvicorn

Check the version:

$ uvicorn --version
Running uvicorn 0.27.1 with CPython 3.12.3 on Linux

Mock Test

Let’s start the ASGI server.

$ python3 -B -m uvicorn mock_bedrock:app --reload
INFO:     Will watch for changes in these directories: ['/home/manwar/playground/bedrock']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [57429] using StatReload
INFO:     Started server process [57431]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Now in another terminal, make a call to the server.

$ curl -X POST http://localhost:8000/model/ollama.mistral/invoke \
  -H "Content-Type: application/json" \
  -d '{"input": "What is the capital of India?"}'
{"model":"ollama.mistral","response":" The capital of India is New Delhi. Despite popular belief, it's important to note that Mumbai (formerly known as Bombay) is not the capital city, but rather its financial and entertainment hub. This common misconception might stem from the fact that prior to 1947, Bombay was the capital of British India, while New Delhi was just a small town outside of Delhi."}

Stop the ASGI server and start the PSGI server now with Plack (a PSGI toolkit):

Plack is the most widely used PSGI implementation.

$ plackup -p 8000 mock_bedrock.pl
HTTP::Server::PSGI: Accepting connections at http://0:8000/

Let’s make the same call again.

$ curl -X POST http://localhost:8000/model/ollama.mistral/invoke \
  -H "Content-Type: application/json" \
  -d '{"input": "What is the capital of India?"}'
{"model":"ollama.mistral","response":" The capital of India is New Delhi. However, it's important to note that New Delhi is not a state or union territory but serves as the administrative capital for the country. India has 28 states and 9 union territories."}

Happy Hacking !!!