Ivan Pua

Creating AWS API Gateway Private Endpoints

2024-04-28T20:30:00+10:00

What are AWS API Gateway Private Endpoints?

AWS API Gateway Private Endpoints is a feature of Amazon API Gateway that allows you to expose your APIs privately within your Amazon Virtual Private Cloud (VPC). This feature ensures that API traffic is confined within the AWS network, bypassing the public internet entirely. These endpoints are made possible through the integration of API Gateway with AWS PrivateLink, a technology that securely connects services across different AWS accounts and VPCs without requiring public IP addresses or the need to manage firewall and route tables. With API Gateway Private Endpoints, you create private APIs that are accessible only from within your VPC or from those VPCs to which you have provided access via VPC peering, AWS Transit Gateway, or Direct Connect. Here’s a image that illustrates this behaviour:

By creating a AWS API Gateway Private Endpoint with PrivateLink (left side of diagram), we could allow access to or from another VPC

API Gateway Private Endpoints are important because they ensure that sensitive API traffic is not exposed over the internet. This is crucial for businesses operating under strict regulatory requirements, as it minimizes the risk of data breaches and unauthorized access. Moreover, keeping traffic internal reduces latency and potential exposure points, contributing to both performance and security improvements.

For example, consider a financial services company that operates within a tightly regulated industry. They need to process confidential financial transactions and must ensure that all data handling complies with industry regulations such as PCI-DSS or GDPR. By using API Gateway Private Endpoints, they can route all their API traffic through the private network of their Amazon Virtual Private Cloud (VPC), significantly reducing the risk of data exposure and enabling compliance with these regulatory requirements. This setup not only secures the data but also often improves the response times of the APIs by minimizing the distance data travels.

To learn more about the evolution of private endpoints in AWS, refer to this AWS blog.

Deploying with AWS CDK

In the previous post, I’ve explained the benefits of deploying AWS resources progammatically with Infrastructure as Code (IaC). Therefore, I prefer deploying the AWS API Gateway Private Endpoint via AWS CDK. The code below shows how to do it in Typescript, feel free to modify the properties based on your use case.

I refered to this AWS blog and AWS CDK documention for deployment.

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as dotenv from "dotenv";
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as path from 'path';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as apiGateway from 'aws-cdk-lib/aws-apigateway';

// Stack is a logical grouping of AWS resources
export class InfraStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Creating the VPC and subnets
    const vpc = new ec2.Vpc(this, "myVPC", {
      vpcName: "myVPC",
      ipAddresses: ec2.IpAddresses.cidr('10.0.0.0/16'),
      availabilityZones: ["ap-southeast-2a", "ap-southeast-2b"], 
      enableDnsHostnames: true,
      enableDnsSupport: true,
      
      subnetConfiguration: [
        {
          name: "private-subnet",
          subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
          cidrMask: 20,
        }
      ],

    });
    
    // Creating the VPC Endpoint to Execute the API
    const vpcEndpoint = new ec2.InterfaceVpcEndpoint(this, 'VPC Endpoint', {
      vpc,
      service: new ec2.InterfaceVpcEndpointService('com.amazonaws.ap-southeast-2.execute-api'),
      privateDnsEnabled: true,
      // Choose which availability zones to place the VPC endpoint in, based on
      // available AZs
      subnets: {
        availabilityZones: ['ap-southeast-2a', 'ap-southeast-2b']
      }
    });

    // Create a S3 bucket for VPC Flow Logs - important for debugging. 
    const logsBucket = new s3.Bucket(this, "myLogs", {
      bucketName: 'my-logs',
      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
      enforceSSL: true,
      accessControl: s3.BucketAccessControl.LOG_DELIVERY_WRITE,
      encryption: s3.BucketEncryption.S3_MANAGED,
      intelligentTieringConfigurations: [
        {
          name: "archive",
          archiveAccessTierTime: cdk.Duration.days(90),
          deepArchiveAccessTierTime: cdk.Duration.days(180),
        },
      ],
    })

    const vpcFlowLogRole = new iam.Role(this, "vpcFlowLogRole", {
      assumedBy: new iam.ServicePrincipal("vpc-flow-logs.amazonaws.com"),
    })

    logsBucket.grantWrite(vpcFlowLogRole, "vpcFlowLogs/*")
    
    // Direct flow logs to S3.
    const vpcFlowLogs = new ec2.FlowLog(this, "vpcFlowLogs", {
      destination: ec2.FlowLogDestination.toS3(logsBucket, "vpcFlowLogs/"),
      trafficType: ec2.FlowLogTrafficType.ALL,
      flowLogName: "vpcFlowLogs",
      resourceType: ec2.FlowLogResourceType.fromVpc(vpc),
    })

    /* *
     * Lambda Function
     * Feel free to change it as you see fit
     * For example, you might prefer to use EC2 instead of Lambda function.
     * */
    const lambda_layer_path = path.join(__dirname, "PATH_TO_CODE");

    const lambda_layer = new lambda.LayerVersion(this, "LambdaBaseLayer", {
      code: lambda.Code.fromAsset(path.join(lambda_layer_path, "layer.zip")), 
      compatibleRuntimes: [lambda.Runtime.PYTHON_3_10],

    });

    const lambdaFunction = new lambda.Function(this, "myFunction", {
      
      functionName:"myFunction",
      runtime: lambda.Runtime.PYTHON_3_10,
      code: lambda.Code.fromAsset(lambda_layer_path),
      memorySize: 1024, // Set memory size to 1024MB
      architecture: lambda.Architecture.ARM_64,
      handler: "main.handler",
      timeout: cdk.Duration.seconds(600),// 10 minutes
      layers: [lambda_layer],
      role: lambdaRole,
    });

    // Create a resource policy for the AWS API Gateway to only 
    // allow the VPC endpoint to execute the API.
    const privateAPIPolicy = {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Deny",
          "Principal": "*",
          "Action": "execute-api:Invoke",
          "Resource": [
            "execute-api:/*"
          ],
          "Condition": {
            "StringNotEquals": {
              "aws:sourceVpc": vpc.vpcId
            }
          }
        },
        {
          "Effect": "Allow",
          "Principal": "*",
          "Action": "execute-api:Invoke",
          "Resource": [
            "execute-api:/*"
          ],
        }
      ]
    }
    
    const privateAPIPolicyDocument = iam.PolicyDocument.fromJson(privateAPIPolicy);

    // Create a AWS API Gateway Private Endpoint
    const myApi = new apiGateway.RestApi(this, 'ApiGateway', {
      restApiName: 'My API Gateway',
      endpointConfiguration: {
        types: [apiGateway.EndpointType.PRIVATE],
        vpcEndpoints: [vpcEndpoint]
      },
      policy: privateAPIPolicyDocument

    })

    // Lambda Integration - user requests are passed wholsale from API Gateway to Lambda 
    myApi.root.addProxy({
      defaultIntegration: new apiGateway.LambdaIntegration(lambdaFunction)
    })

  }
}

This is how the API looks like after deploying:

AWS API Gateway Private Endpoint, within a VPC

Testing the Private Endpoint

To check if the private endpoint works, try invoking it with a Lambda function.

Create a new Lambda function with the following code

import requests

# Replace these global variables with your account's
VPCE_DNS_NAME = "yourVPCEndpoint.execute-api.ap-southeast-2.vpce.amazonaws.com"
API_GW_ENDPOINT = "yourAPI.execute-api.ap-southeast-2.amazonaws.com"

def lambda_handler(event, context):
    # Set up the options for the HTTPS request
    url = f"https://{VPCE_DNS_NAME}/prod/" # Enter the path that you want to test
    headers = {
        'Host': API_GW_ENDPOINT
    }
    
    # Make the GET request
    try:
        response = requests.get(url, headers=headers)
        # Log status code and headers
        print('statusCode:', response.status_code)
        print('headers:', response.headers)
        
        # Return the JSON content if request was successful
        # print(response.json())
        return response.json()
    
    # Catch any errors that occur during the request
    except requests.RequestException as e:
        print(e)
        return {'error': str(e)}

Ensure that the Lambda function is in the same VPC as thte Private endpoint, or at least in a VPC that is allowed as stated in the privateAPIPolicyDocument
Run a test on Lambda

If the connection is successful, you will see a success message along with the JSON payload.

Connection to Private Endpoint is successful!

Poetry for Dependency Management

2024-04-02T20:50:00+11:00

Ever struggled with Python dependency conflicts? So have I, until I discovered this tool.

Meet Poetry 🌟

Poetry is a Python dependency management tool.

As I add more packages to my projects, Poetry deftly resolves any dependency conflicts—goodbye, dependency hell (yes, TensorFlow and Numpy, I’m looking at you 👀 )

And the best part? Poetry creates a lockfile that ensures reproducible environments across different operating systems. For instance, while I work on MacOS, I can seamlessly share my project environment with colleagues on Linux.

While I’ve been a fan of conda for its one-stop environment setup, I’m starting to appreciate Poetry’s reproducibility. Now, my approach for new AI projects combines the best of both worlds—conda for the Python version, and Poetry for managing everything else.

Here’s a list of Bash commands I run when setting up a new Python project. Feel free to incorporate them into your Makefile 😁

$ conda create -n myenv python=3.12  # Create a virtual env with Conda or PyEnv

$ pip install poetry # Install Poetry in your virtual env

$ poetry init # Creates a basic pyproject.toml file in the current directory.

$ poetry add langchain openai # Adds dependencies to pyproject.toml file.

$ poetry update # Get and installs latest versions of dependencies, automagically.

You could also incorporate these commands into a Makefile to save time. I like to use Makefiles as they save developer a few seconds, and it accumulates as you code longer. The code below displays my Makefile, and anyone who wants to work on this propioject could just run make setup and make install.

setup:
	@echo "Creating a new Python environment called 'myEnv'..."
	@conda create -n myEnv python=3.12 -y

install:
	@echo "Installing Poetry, a Python package manager..."
	@pip install poetry

	@echo "Installing packages with poetry..."
	@poetry install --no-root

After running this command poetry add langchain openai, the pyproject.toml file looks like this.

[tool.poetry]
name = "projectName"
version = "0.1.0"
description = ""
authors = ""
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.12"
openai = "^1.14.3"
langchain = "^0.1.14"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

I Built a SaaS Business for a Year. Here’s What I Learnt

2024-01-26T09:25:00+11:00

In December 2022, coinciding with the launch of ChatGPT, I embarked on a mission to develop data and AI products that empower others. Since then, I have brainstormed ideas, tried to establish product-market fit, and constructed the product’s back end. This blog shares the learnings I wish I had known before creating a SaaS product. It aims to help aspiring entrepreneurs avoid the obstacles I encountered and accelerate their journey in launching a SaaS business.

Ideation and Market Validation

If you already have a business idea, feel free to jump ahead to Tip #2. But if you’re figuring out where to start, the following advice is for you.

Tip 1: There is No “Million-Dollar” Idea

Finding the perfect problem to solve right off the bat is rare. Many successful startups pivot from their initial idea to something more viable. Instagram began as a location check-in app, while Slack originated from a gaming project named Glitch. The key is to start somewhere. If you’re struggling to identify a problem, consider these two strategies:

Address a personal pain point, something that you wish were better in your day-to-day life. For instance, I found the process of searching for recipes and buying ingredients time-consuming, but existing meal delivery services like HelloFresh were too costly. This led me to explore alternative solutions.
Leverage your strengths. My expertise in Data Science and AI directed me towards using these tools to enhance marketing and sales for businesses, which became my focal point. In addition, pay attention to what resonates with your audience, especially if you have ongoing projects or a GitHub repository.

I opted for the second approach to guide my direction, aiming to apply data and AI to boost sales for e-commerce founders. However, don’t stress about nailing the “perfect” problem at the start. Remember, the initial idea is merely a starting point; a launchpad. Your journey will likely involve pivots as you refine your concept.

Don't overthink it mate

Tip 2: Define your Initial Customer Persona (ICP) first

Finding the right customer is as important as solving the right problem. If your target audience isn’t aligned, even the best solutions won’t make an impact. ICP helps you pinpoint the precise niche your SaaS product should initially cater to. Your SaaS product should initially address the pain points of your most dedicated users, or your superfans. Once they are happy, you can expand your features to a wider audience. Startups have the advantage of becoming experts in a specific problem area, thereby outmaneuvering established incumbents.

Having an ICP also helps you narrow down the users you should interview. I made the mistake of not having a very specific ICP at the start, as a result, I interviewed a very diverse array of e-commerce founders, wasting valuable time.

How do you figure out your ICP? Be specific. Instead of a broad “Shopify sellers,” target “Beginner Shopify founders with annual revenues under $10K looking to boost customer acquisition through targeted ads.” Here’s a template to kickstart your ICP definition – credits to Lenny’s Newsletter.

One of my biggest mistakes is not having a specific ICP at the start

Tip 3: Talk to your customers

This is the most important tip. Want to know if you’re solving a real problem? Talk to your potential users. Time is precious, so pick your platforms to reach out to your users wisely. LinkedIn and TikTok are great for reaching B2B and B2C audiences, respectively.

My ICP was ‘Beginner Shopify founders with annual revenues under $10K looking to push ads and attract more customers.’ I reached out to my ICP through my network, Instagram DMs, online forum posts, Facebook groups, and pitches at startup events. This approach helped me connect with over 30 potential users, uncovering their main challenges, which includes:

Uncertainty about competitors’ marketing performance.
Lack of knowledge of the most profitable marketing strategies.
Difficulty identifying the target market or audience for their ads.

Don’t be shy to reach out. The worst response you can get is a no.

Tip 4: How to talk to customers? Mum’s test

The Mum’s Test is a framework for conducting user interviews to get honest feedback. It emphasises the importance of understanding users’ challenges without hinting at solutions and advocates for active listening. Your goal should be finding out their pain points and their current workarounds – basically peeking into their life.

Look for strong emotional signals, such as frustration or eagerness to pay, as these can highlight significant pain points.

Building Your Product

Tip 5: Keep It Simple, Stupid (KISS)

So you’ve validated the problem with your ICP, and you are ready to build your SaaS product. Here’s my advice: focus on building a Minimum Viable Product (MVP) that’s as simple as possible. Resist the temptation to integrate complex technologies, like Blockchain or Snowflake, unless they’re central to your offering. Simplicity means quicker development and a clearer focus on solving your users’ most pressing issues. Sometimes, your MVP can be as basic as a Python script!

For my project, I initially designed an MVP with several features:

Customer segmentation and analytics
Recommends platforms for publishing ads
Publishes ads
Tracks ad performance
Creates a feedback loop for better ads

In retrospect, this MVP was overly complex. A better approach would have been to focus on one key feature based on user feedback. It took some time to realize this issue, but I eventually decided to develop a customer segmentation and analytics tool – the first feature on my list.

Tip 6: Talk to your customers (again)

After creating your MVP, demonstrate it to your customers and gather their feedback. Even better, record your demo and share it on LinkedIn – that’s a fast way to spread the word.

Thanks, Bernie

After demonstrating my MVP to over 30 users, I have observed three obvious signs that indicate if the user is interested:

They are willing to pay you money
They are willing to refer you to their networks
You start receiving cold inbound inquiries

Here are the key user feedback for my MVP (customer segmentation and analytics tool):

A desire for customer analytics to not only display data but also provide marketing recommendations.
Integration with Shopify and/or Google Analytics for data extraction.
A focus on content creation over customer analytics.

The last piece of feedback was interesting, and I should have paid more attention to it, as I’ll explain in the next tip.

Tip 7: Embrace the Pivot

As a SaaS founder, being receptive to user feedback and ready to iterate your product is essential. Take my experience, for instance. I was testing my MVP with a beta tester—a house furnishing company with an online presence. I provided them with customer segmentation insights through a simple Google Sheets file, nothing too fancy. The feedback I received was pretty tepid; the client didn’t see the value in paying $20/month for insights they felt they already understood well. This lukewarm response was a wake-up call – it drove me to revisit and refine my product. It’s important to remember that it’s rare to get everything perfect on the first try, and that’s perfectly fine. Each iteration is a step closer to success.

Bonus Tip: Join a Community

Building a SaaS product from scratch is tough. There are times when, as a founder, you might feel like giving up and retreating to your comfort zone. However, being part of a community of like-minded individuals can keep you going. Communities offer various benefits. They keep you accountable, provide a platform to exchange ideas and remind you that you’re not alone in this journey.

If you are based in Australia, I highly recommend Next Chapter and The Builders Club.

Conclusion

I’m currently developing a customer segmentation tool for e-commerce entrepreneurs, offering them actionable insights. I plan to launch this tool soon to start building traction. My next blog post will explore the technical architecture, so stay tuned.

Remember, building a startup is a journey, not a destination. Embrace the process and enjoy the adventure 🌱

Auto-GPT is overhyped.

2023-04-17T19:12:00+10:00

Auto-GPT: Explained in 2 seconds

Auto-GPT utilizes OpenAI’s API to autonomously perform tasks like writing a blog or creating a website from scratch. The creators of Auto-GPT aim to make it the best autonomous AI assistant for every device and person, think J.A.R.V.I.S. from Iron Man.

How it works

To use Auto-GPT, you simply type what you want it to do in the terminal and it breaks down the task into a to-do list. For example, you could ask it to be “an autonomous agent that leverages data to provide expert marketing recommendations based on customer segments and their attributes.” The subtasks that it generates are visible in the image below.

Asking Auto-GPT to become an data-driven marketing expert

Compared to ChatGPT, Auto-GPT is more capable because it can access the Google search engine to perform various tasks. It also supports several third-party plugins, although I haven’t used them.

Why Auto-GPT is overhyped

1. Repetitive

Auto-GPT often becomes repetitive by recommending different solutions to fix the same problem. While this is similar to how humans think, as we explore multiple methods, it can be annoying, especially if the problem is simple. For example, one of the subtasks is to execute a Python file called customer_data_analysis.py, but it keeps encountering the same error: pandas module not found. Any software engineer would tell you to run pip install pandas to solve the problem, but Auto-GPT proceeds to Google “how to install pandas module” compile those instructions, and yet fails to run the command. As a result, the same error reappears.

Auto-GPT going around in circles just to install pandas Python module

2. Costly

This brings us to the second point: the back-and-forth between Auto-GPT and the same error can consume a lot of cost, particularly when using GPT-4.

In my opinion, Auto-GPT represents a promising initial stride towards full autonomy. However, it still has a considerable distance to cover before it can truly be regarded as “intelligent”. Given the fast-paced nature of the AI industry, I would recommend that anyone seeking to avoid falling for overhyped AI trends and tools adopt a critical mindset and concentrate on the underlying technology, rather than being swayed by buzzwords or flashy Twitter videos.

Large Language Models - A Primer

2023-04-17T19:12:00+10:00

Two-second Summary

Large Language Models (LLMs) are artificial intelligence systems that can analyze, understand, and generate human language. These models are designed to learn the patterns and structures of natural language by processing vast amounts of text data.

Brief history of LLM

In 2012, researchers at the University of Toronto and Google developed the first neural language model, called Word2Vec. It was able to learn word embeddings that could capture the semantic relationships between words. This was a major breakthrough in the field and it paved the way for the development of larger and more complex language models.
In 2018, Google developed BERT, a large pre-trained language model. BERT has achieved state-of-the-art results on many NLP benchmarks, and it has been used for a variety of NLP tasks, including sentiment analysis, named entity recognition, and question answering. The main challenge with BERT models is because it is a complex model with millions of parameters, training this model requires considerable data and computational power, resulting in high costs and time consumption.
The same year, researchers at OpenAI developed the first GPT (Generative Pre-trained Transformer) model, which was able to generate human-like text and perform a wide range of NLP tasks with high accuracy.

GPT vs BERT

The primary difference between GPT family models and BERT lies in their architectures, training data, and objectives. For instance, BERT is designed to perform specific tasks, such as sentiment analysis, language translation, or speech recognition, meaning that it can be trained on a smaller dataset to perform a specific language-based task with high accuracy. On the other hand, GPT is trainred on a large corpus of publicly available data, hence it is more suitable for tasks that require generating coherent and meaningful language, such as holding a conversation and content creation.

ChatGPT

ChatGPT, developed by OpenAI, has gained immense popularity due to its exceptional conversational abilities. It has been trained on a wide range of conversational text, and fine-tuned to excel at tasks such as question answering and dialogue generation. Furthermore, its user-friendly interface makes it highly versatile and adaptable to various use cases, even beyond developers.

One of the most remarkable features of ChatGPT is its ability to generate human-like responses. This is primarily due to its use of reinforcement learning from human feedback (RLHF). ChatGPT employs this technique to rank the responses generated by the initial model and learn from human rankings to select the best human-like response, resulting in more natural and coherent conversations.

Use Cases

For Corporations	For individuals
Chatbots that are more personalised	Text summarization and generation
Integration with existing work applications (e.g. Slack, G-Drive)	Grammar correction
Accelerate content creation and customer personalisation	Explain difficult concepts like I’m 5, or a PhD student
Email classification, summarisation and automated response	Translate text too different languages
Enhance team productivity and creativity, for instance generate meeting agenda.	Write and explain code, and even translate to another coding language
Create new text-based products	Turn a product description to an ad copy
	Integration with 3rd party apps – the possibilities are endless!

Model Architecture of ChatGPT

ChatGPT belongs to the GPT family of language models. Let’s zoom in on GPT-3, which comprises an encoder, attention layers, a feedforward network, a decoder, and a softmax layer. To achieve its impressive language generation capabilities, GPT-3 uses causal language modeling. This means that the model predicts the next token in a sequence of tokens, with a constraint that it can only attend to tokens on the left. Here are the steps involved by ChatGPT to generate text:

GPT's architecture [Reference]

The input sequence for GPT-3 is fixed at 2048 words, but shorter sequences can still be used by filling the extra positions with “empty” values.
To encode the input sequence, the encoder first converts it into a one-hot vector and then compresses it into a smaller dimensional space called an embedding vector to save space.
Meanwhile, GPT-3 also encodes the position of each token in the sequence, but does not reduce its size to form an embedding.
The position encodings and input embeddings are combined into a single matrix, which is then fed into the attention layers.
In simple terms, the attention layer predicts which input tokens to focus on and how much for each output in the sequence. The input matrix is transformed into three separate matrices - queries, keys, and values - and matrix manipulations are performed among them to select the most important token.
This process is repeated 96 times in GPT, which is why it is called multi-head attention.
The output of the attention layers is then passed into a feed-forward block in a multi-layer perceptron.
The resulting matrix contains, for each of the 2048 output positions in the sequence, a 12288-vector of information about which word should appear. To generate text, this matrix is decoded using a “decoder”.
When GPT-3 generates text, it doesn’t just provide a single guess for the next word. Instead, it generates a sequence of guesses - one for each of the 2048 “next” positions in the sequence - with each guess representing the probability of a likely word.

Limitation of ChatGPT

Hallucination - ChatGPT can generate highly creative but potentially inaccurate information, and therefore should not be used for decision-making without human involvement. Although the AI model is continuously improving, it cannot understand cause and effect, reason like a human, or produce sensible moves in games like chess. It is a useful tool for ideation and creativity, but critical thinking and validation should remain the responsibility of humans. The output of ChatGPT is not a reliable source of factual information and should not be used without human supervision.
Data security and privacy - Studies have demonstrated that large models like ChatGPT can be vulnerable to privacy intrusion issues, where personally identifiable information (PII) can be extracted from training data using specific prompts or code. As such, businesses must carefully consider data security and privacy concerns when incorporating this technology into their operations. Protecting sensitive information and customer privacy should be a top priority, and guardrails should be established to reduce potential risks.
Fairness and Inclusiveness - Internet-scale systems are prone to bias, which can have unintended negative consequences for minority groups, such as perpetuating bias in algorithms and increasing error rates in facial recognition. Additionally, the digital divide may prevent minority groups from accessing the benefits of technological advancements. As a result, it is important to develop and deploy new technologies responsibly and equitably. While ChatGPT uses a Moderation API to block unsafe content, it may not effectively address the propagation of unfairness and bias within the system.

Recent Trends (as of April 2023)

Microsoft has invested $10 billion in OpenAI and recently released their latest conversational AI solution, the Bing chatbot. Unlike ChatGPT, which can only retrieve information up until 2022 based on the data it was trained on, “the new Bing” is able to retrieve information about recent news and events.
In mid-March, OpenAI announced their latest breakthrough - the GPT-4 model. GPT-4 is able to handle more complex conversational tasks compared to ChatGPT. The new model is versatile and can accept images as input as well as text.
Google has its own conversational AI system called BARD, and they have released the PaLM API.
Meta released LLaMA, a smaller and more performant model compared to ChatGPT. They intend to grant access to users on a case–by-case basis.
Amazon has introduced a cloud service called Bedrock that developers can use to enhance their software with artificial intelligence systems that can generate text. Through its Bedrock generative AI service, AWS will offer access to its own first-party language models called Titan, and a model for turning text into images from startup Stability AI.

Segue - Prompt engineering

Prompt engineering is the process of designing and refining prompts to guide generative AI systems, particularly in language and image models. It is crucial for achieving high-quality results, but can be challenging and time-consuming. Prompt engineering is becoming more popular due to the increasing demand for generative AI applications, and some creators are already offering their prompts on marketplaces like PromptBase. However, there are concerns that people may overestimate the technical rigor and reliability of results obtained from a constantly evolving black box. Crafting appropriate prompts requires meticulous exploration of possibilities and figuring out why and when AI produces inaccurate results. The field of prompt engineering is evolving, and new strategies and techniques may become necessary to keep pace with emerging trends and challenges. Despite limitations, the potential benefits of these technologies are vast and far-reaching.

Stay tuned for more content on Large Language Models!

Creating an endpoint on AWS Sagemaker with Pulumi

2023-04-01T19:07:00+11:00

In the previous post, I mentioned Pulumi - an emerging open-source IaC tool. To further understand Pulumi’s functionalities, I have used Pulumi to create a real-time endpoint to serve a machine learning (ML) model on AWS Sagemaker, and now I would like to walk you through the steps involved.

In this blog post, we will cover everything from setting up the necessary infrastructure to provisioning endpoints with industry’s best practices. By the end of this post, you will have a good understanding of how to use Pulumi and SageMaker together to manage your machine learning models like a pro. So, let’s dive in!

Prerequisites:

An active AWS account with developer permissions
A new Pulumi project with your AWS configuration.
A ML model created on AWS

To create an endpoint, 3 resources are required, namely S3 bucket, endpoint configuration, and endpoint. In addition, CloudWatch log group resource is a good to have. We will explain the details further below.

S3 Bucket

This is needed to store your endpoint e.g. input data from users, output predictions etc.

import pulumi_aws as aws

s3_bucket = aws.s3.Bucket(
    resource_name="endpoint-bucket",
    bucket="endpoint-bucket",
    acl="private",
)

Endpoint Configuration

It is highly recommended to enable data capture to record information that can be used for training, debugging, and monitoring model. Amazon SageMaker Model Monitor automatically parses this captured data and compares metrics from this data with a baseline that you create for the model, which is useful for detecting model and data drift. For more information, refer to this video.

s3_uri = f"s3://endpoint-bucket/endpoint-data-capture-logs/" # from s3 bucket created previously
endpoint_configuration = aws.sagemaker.EndpointConfiguration(
    resource_name=model_name,
    name=model_name,
    data_capture_config=aws.sagemaker.EndpointConfigurationDataCaptureConfigArgs(
        destination_s3_uri=s3_uri,
        initial_sampling_percentage=100, # A lower value is recommended for Endpoints with high traffic.
        enable_capture=True,
        capture_options=[
            aws.sagemaker.EndpointConfigurationDataCaptureConfigCaptureOptionArgs(capture_mode="Output"),
            aws.sagemaker.EndpointConfigurationDataCaptureConfigCaptureOptionArgs(capture_mode="Input"),
        ],
        capture_content_type_header=aws.sagemaker.EndpointConfigurationDataCaptureConfigCaptureContentTypeHeaderArgs(
            csv_content_types=["text/csv"], json_content_types=["application/json"]
        ),
    ),
    production_variants=[
        aws.sagemaker.EndpointConfigurationProductionVariantArgs(
            variant_name='version_1'
            model_name=[model name],
            initial_instance_count=1,
            instance_type="ml.m5.xlarge",
        )
    ],
)

Endpoint

This resource is created by referring to the endpoint configuration created previously.

endpoint = aws.sagemaker.Endpoint(
    resource_name=model_name,
    name=model_name,
    endpoint_config_name=endpoint_configuration.id,
)

Cloudwatch Log Group

With a log group, warnings and error messages logged to stdout can be recorded, which is helpful for debugging and is considered industry best practice.

cloudwatch_logs = aws.cloudwatch.LogGroup(
    resource_name=f"/aws/sagemaker/Endpoints/{model_name}",
    name=f"/aws/sagemaker/Endpoints/{model_name}",
    retention_in_days=30,
)

After adding these pulumi resources, the end point would be successfully created on AWS by running pulumi up.

Introduction to Infrastructure as Code (IaC)

2023-03-28T19:38:00+11:00

What is IaC?

Infrastructure as code or IaC enables developers to programmatically create, deploy and manage cloud resources in an automated, consistent and scalable manner. Notice the emphasis on scalable – that means the IaC template will spin up the same resources with the same configuration every time unless the cloud provider itself changes its configuration. This reduces the operational overhead of creating cloud resources, enabling developers to focus on delivering high-quality software and services to their customers.

Why do we need IaC?

Before IaC, developers would use a ‘Click-Ops’ method to create resources; essentially clicking on buttons, following the prompts, and referring to documentation if they get stuck. Alternatively, some developers would opt for cloud provider’s own CLI such as AWS CLI or Google Cloud Shell to deploy resources.

Using Click-Ops or the CLI can be a quick and straightforward way to create resources on the cloud, especially for small-scale projects and quick prototyping. It can be useful for small, one-off tasks or for exploring the capabilities of the cloud provider. But what if you were leading a team of 10 data engineers and data scientists and you want everyone to use the same cloud stack? You could create a guide and tell them to follow the setup themselves, however, it can quickly become cumbersome and error-prone when managing a large number of resources or complex infrastructure.

To address these issues, IaC tools are developed.

Types of IaC tools

There are two types of IaC tools – ones that are built in-house by cloud providers, and open source.

IaC tools by Cloud Providers

AWS CloudFormation: AWS CloudFormation is a service that allows you to define your infrastructure as code using JSON or YAML. CloudFormation supports a wide range of AWS services and resources and also allows you to create custom resources using AWS Lambda.
Azure Resource Manager (ARM): Azure Resource Manager is a service that allows you to define your infrastructure as code using JSON or YAML. ARM supports a wide range of Azure services and resources.
Google Cloud Deployment Manager: Google Cloud Deployment Manager is a service that allows you to define your infrastructure as code using YAML or Jinja2 templates. Deployment Manager supports a wide range of Google Cloud Platform services and resources.

Open Source

Terraform

Terraform is an open-source IaC tool that allows you to define your infrastructure as code using a declarative language called HashiCorp Configuration Language (HCL) or JSON. HCL is the recommended language as it’s explicitly designed for Terraform. It currently enjoys a dominant position among open-source IaC platforms.

To deploy an AWS S3 bucket with Terraform, you will need to follow these steps: a. Define the S3 bucket in your Terraform configuration file:

provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "my_bucket" {
  bucket = "my-bucket-name"
  acl = "private"
  
  versioning {
    enabled = true
  }

  tags = {
    Environment = "dev"
  }
}

b. Initialize Terraform in your project directory by running terraform init. c. Create a Terraform execution plan by running terraform plan. This will show you the changes that Terraform will make to your infrastructure. d. Apply the Terraform execution plan by running terraform apply. This will create the S3 bucket in your AWS account.

Pulumi

Emerging as a fierce competitor to Terraform, Pulumi is a universal infrastructure as code platform that allows you to use familiar programming languages and tools to build, deploy, and manage cloud infrastructure. To deploy an AWS S3 bucket with Pulumi, you will need to follow these steps:

a. Use the pulumi_aws Python library to create a resource.

import pulumi
import pulumi_aws as aws

# Create an AWS resource (S3 Bucket)

my_bucket = aws.s3.Bucket("my-bucket",
                          bucket="my-bucket-name",
                          acl="private",
                         )

# Export the name of the bucket
pulumi.export('bucket_name',  bucket.id)

b. Running pulumi up in the terminal will create the S3 bucket in your AWS account.

Terraform vs Pulumi

Both Terraform and Pulumi support a wide range of cloud providers, including AWS, Azure, and Google Cloud. The main difference between Pulumi and Terraform is that Pulumi allows you to define your infrastructure using a general-purpose programming language, while Terraform uses its own declarative language (focuses on the what) called HashiCorp Configuration Language (HCL) or JSON.

With Pulumi, you can use popular programming languages such as Python, JavaScript, Go, and TypeScript to define your infrastructure. This allows you to leverage the full power of a programming language to define, configure, and deploy your infrastructure. Pulumi also provides a set of libraries for working with cloud providers, allowing you to easily create and manage resources.On the other hand, Terraform is designed specifically for infrastructure as code and provides a domain-specific language (HCL) that is optimized for describing infrastructure resources. Terraform also has a large ecosystem of providers, which allows you to manage a wide range of cloud resources.

Here are some additional differences between Pulumi and Terraform:

Pulumi has a more procedural approach (how), while Terraform is more declarative (what).
Pulumi supports more cloud providers than Terraform, including AWS, Azure, Google Cloud Platform, and Kubernetes.
Pulumi allows for easier refactoring and reuse of infrastructure code, as it uses a programming language that is familiar to developers.
Terraform has a larger community and ecosystem of providers, making it easier to find resources and examples for managing specific cloud resources.

Ultimately, the choice between Pulumi and Terraform depends on your specific needs and preferences. If you prefer a general-purpose programming language and want more flexibility in defining your infrastructure, Pulumi may be a good choice. If you prefer a declarative approach and want to leverage a deeper and more stable knowledge base, Terraform may be a better fit.

Ingesting data from Amazon Redshift to Sagemaker

2022-11-12T10:15:00+11:00

Amazon Redshift is a SQL-based data warehouse whereas Amazon Sagemaker is the main machine learning platform on AWS. A machine learning project that involves massive datasets would usually require these two services, hence its important to understand how to ingest data from Redshift to Sagemaker.

This guide is written with the following assumptions:

You have created a Redshift cluster that in enclosed within a Virtual Private Cloud (VPC)
You have created tables within the Redshift cluster
You have created a secret that contains Redshift credentials in the SecretsManager

Approach 1: Traditional Method

1. Create and verify IAM role

Go to IAM > Roles, then create a role that allow access to the following micro services:

Sagemaker
Redshift
SecretsManager

Policies in a role - example.

In addition, verify if you have added a Trust Relationship for both Redshift and Sagemaker, as shown below.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "redshift.amazonaws.com",
                    "redshift-serverless.amazonaws.com",
                    "glue.amazonaws.com",
                    "scheduler.redshift.amazonaws.com",
                    "sagemaker.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

If you aren’t sure about this step, refer to my previous blog post.

2. Create Jupyter notebook on Sagemaker

Once you have sorted the permissions, it’s time to create a Jupyter Notebook on Sagemaker! Ensure that the notebook has the same VPC as Redshift’s.

3. Establishing a connection

Run this code on Jupyter, replacing the square brackets with your own information

import boto3
import json
import sagemaker
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# Assume role
sts = boto3.client("sts")
redshift = boto3.client("redshift")
sm = boto3.client("sagemaker")
session = boto3.session.Session()
print(session)
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

role_name = role.split("/")[-1]
print("Your Role name used to create this notebook is: {}".format(role_name))

# Get credentials
secretsmanager = boto3.client("secretsmanager")
secret = secretsmanager.get_secret_value(SecretId="[YOUR SECRET ID]")
cred = json.loads(secret["SecretString"])
master_user_name = cred["username"]
master_user_pw = cred["password"]

# Redshift cluster details
redshift_cluster_identifier = "[YOUR REDSHIFT ID]"
redshift_port = "5439"

# check cluster status
response = redshift.describe_clusters(ClusterIdentifier=redshift_cluster_identifier)
cluster_status = response["Clusters"][0]["ClusterStatus"]
print("Cluster status is:", cluster_status)
redshift_endpoint_address = response["Clusters"][0]["Endpoint"]["Address"]
print("Redshift endpoint: {}".format(redshift_endpoint_address))

'''
Hierachy: Database -> Schema -> Tables
'''
database_name = "[YOUR DB NAME ON REDSHIFT]" 

# Create connection to Redshift

engine = create_engine(
    "postgresql://{}:{}@{}:{}/{}".format(
        master_user_name,
        master_user_pw,
        redshift_endpoint_address,
        redshift_port,
        database_name
    )
)

4. Test with a SQL script

%%time
sql_statement = """
    select *
    from schema.table
"""

df = pd.read_sql_query(sql_statement, engine)
df.head()

Approach 2: Redshift Data API

If you can’t be bothered with configuring IAM and VPC settings, there is a simpler alternative — Redshift Data API. I haven’t written a guide for this method because AWS developers have already written a comprehensive documentation.

Explain IAM like I’m 5 (AWS)

2022-11-08T13:15:00+11:00

Mate, what in the world is IAM?

It is a permission system that regulates access to AWS resources.

Dumb it down for me please.

It’s AWS’ method of regulating permission among users. For example, if you are managing an organisation that consists of business analysts and data scientists, you would not want to grant the same permissions for both groups. In addition, IAM audits any changes to permissions and access using AWS Cloud Trail, therefore promoting transparency.

Well, this could happen.

I see why that’s important – sounds like a pain to set up though

Not gonna lie, it is difficult, especially if you are starting from scratch. Imagine if you want to allow data scientists to build ML models on AWS. You will likely grant them permissions to access a data warehouse (Redshift), ML pipeline (Sagemaker), storage bucket for model artefacts (S3) … and this could easily blow up. However, AWS has made our lives easier by creating default policy templates (aka AWS-managed policies). Also, AWS has handy tools like CloudFormation to programmatically set up IAM, but that’s a topic for another day.

In this article, I will explain the basics of IAM and show you how to set them up for a single AWS microservice.

Important concepts before we dive in

Resource: An AWS Services (e.g. S3)

Action: List all S3 objects in a bucket

Users - e.g. John Doe
Groups - Collections of users (e.g. a group of business analysts)
Policy - Low level permission to a resource (e.g. Allow / Deny a List action to a S3)
Role - Collection of Policies

Keep in mind that role is a preset of policies for service(s) or users external to your organisations. Role is a way to provide permissions to someone (a customer, supplier, contractor, employee, an EC2 instance, some external application outside AWS trying to consume your services) without creating a user for it.

How do I set up IAM?

There are commonly 2 ways to set up IAM permissions, depending on your situation:

For a group of individuals within your organisation, the steps are:

Create users groups
Set up / add policies
Add users to user group

For provisioning access to AWS resources or people outside your organisation, the steps are:

Set up / add policies
Create roles
Assign role to users (optional – only required if you want to assign to a particular external user)

I’ll give a walk through of the second route (AWS resources or external users).

Example Walk-through

Let’s get into the nitty-gritty! Say you are tasked to set up AWS Glue access for a single Data Engineer for ETL purposes.

1. Set up / add policies

You could either add a AWS-managed policy e.g.AWSGlueConsoleFullAccess or create a customised policy based on the the following format.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:ListCrawls",
                "glue:ListJobs"
            ],
            "Resource": "*"
        }
    ]
}

The policy above allows users to list Glue crawlers and the jobs.

2. Create roles

The policy created in Step 1 is attached in here as AWSGlueConsoleFullAccess.

If you want AWS Glue to assume this role (meaning that you are giving it permission to interact with other AWS microservices), then add a trust relationship as shown in image above.

3. Attach role to users

Generally all you need is the AWS ID of the external user(s). Refer to this documentation for the full steps.

Best Practices

According to AWS, we should always adopt a least privilege model, that means granting users the minimum set of permissions to accomplish their tasks. This prevents users from accessing sensitive data, or misusing other microservices.

However, in reality, it’s takes a long time to figure out what actions are needed for a particular resources (at least for me). When I’m the AWS admin of a new project, I tend to grant users full permissions to all actions for the relevant resources, to ensure they don’t encounter any permissions hurdles. Then, I will figure out which actions or resource are not needed, and filter them out. If you know a better method or any handy tools that improves this process, please comment it below.

Cheers! 😊

Why I stopped using Grid Search CV

2022-09-06T20:25:00+10:00

I’ve always used scikit-learn’s GridSearchCV or RandomisedSearchCV to tune hyperparameters within a classifier. Even when I’m coding neural networks using PyTorch, I’ll loop over a list (e.g. learning_rate = [0.001, 0.01, 0.1, 0.5]) to determine the best value. However, these methods requires a lot of manual effort and parameter values have to be pre-determined — how do I know if the values in search field are good in the first place? Is it possible to automatically tune the hyperparamaters using all possible values within a range, while I go for a jog?

Well I might have a found an answer — Optuna!

In my previous post, I have experimented with different types of ML algorithms and the top two best performing ones (excluding ensemble) are Logistic Regression and Adaboost. Hence, in this tutorial, I shall demonstrate how to use Optuna to select the best model among these two candidates. The metric to be maximised is recall score.

First, let’s read the data

df = pd.read_csv('preprocessed.csv') #insert dataset here

y = df.pop('Churn')
X = df

Then, we set the objective, which is the maximise the recall score.

def objective(trial):

    # Selecting the best model out of these two candidates
    classifier_name = trial.suggest_categorical("classifier", ["Logistic", "AdaBoost"])
    if classifier_name == "Logistic":
        
        # Add parameters here
        penalty = trial.suggest_categorical('penalty', ['l2', 'l1'])
        if penalty == 'l1':
            solver = 'saga'
        else:
            solver = 'lbfgs'
        regularization = trial.suggest_uniform('logistic-regularization', 0.01, 10)
        model = LogisticRegression(penalty=penalty, 
                                   C=regularization, 
                                   solver=solver, 
                                   random_state=0)
    else:
        
        # Add parameters here
        ada_n_estimators = trial.suggest_int("n_estimators", 10, 500, step = 10)
        ada_learning_rate = trial.suggest_float("learning_rate", 0.1, 3)
        
        model = sklearn.ensemble.AdaBoostClassifier(
            n_estimators=ada_n_estimators,
            random_state=0
        )

    score = cross_val_score(model, X, y, n_jobs=-1, cv=3)
    accuracy = score.mean()
    return accuracy


# 3. Create a study object and optimize the objective function.
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
print(study.best_trial)
print(study.best_trial.params) # Find out the parameter values

Output:

Number of finished trials: 200
Best trial:
  Value: 0.8137882018479033
  Params: 
    classifier: AdaBoost
    n_estimators: 10
    learning_rate: 0.555344080589061
CPU times: user 51.5 s, sys: 10.3 s, total: 1min 1s
Wall time: 48.9 s

From the study above, it is apparent that Adaboost is the superior classifier, increasing the recall score from 0.79 (from the previous post to 0.81! 🥳 You can find the full code in this repo.

Tips

Use the same random_state during tuning and model inference to stay consistent.
For massive datasets, I suggest to obtain a subset and tune the model with Optuna.
If the dataset is highly imbalanced, undersample data and ensure that all classes are represented, then tune the model.