Cloud Computing

Serverless Spam Detection API: Deploying a Scikit-Learn Model with AWS Lambda and API Gateway

2026-05-01 22:08:53

Overview

Spam is more than an annoyance—it's a security risk. While training a machine learning model in a notebook is straightforward, the real challenge is deploying it as a scalable, production-ready service. This guide walks you through building an end-to-end serverless spam classifier using Scikit-learn for model development and AWS Lambda, S3, and API Gateway for deployment. The result is a lightweight, cost-efficient API that classifies messages in real time. The system is modular: you can retrain the model independently without affecting the live API.

Serverless Spam Detection API: Deploying a Scikit-Learn Model with AWS Lambda and API Gateway
Source: www.freecodecamp.org

Prerequisites

Skills & Tools

Step-by-Step Instructions

1. Building the Model: The Brain

The classifier uses supervised learning. Instead of hardcoding spam rules, the algorithm learns patterns from labeled data.

1.1 Vectorization: Converting Text to Numbers

Models cannot read raw text. We use TF-IDF (Term Frequency–Inverse Document Frequency) to transform email content into numerical vectors.

from sklearn.feature_extraction.text import TfidfVectorizer

feature_extraction = TfidfVectorizer(min_df=1, stop_words='english', lowercase=True)
X_train_features = feature_extraction.fit_transform(X_train)

The TF-IDF formula assigns a weight to each word:

w(i,j) = tf(i,j) * log(N / df(i))

1.2 Training and Saving the Model

We use a Logistic Regression classifier (or other algorithm) and save both the vectorizer and the trained model with joblib.

from sklearn.linear_model import LogisticRegression
import joblib

model = LogisticRegression()
model.fit(X_train_features, y_train)

# Save artifacts
joblib.dump(model, 'spam_classifier.pkl')
joblib.dump(feature_extraction, 'vectorizer.pkl')

2. Deploying the Model to AWS

2.1 Upload to S3

Create an S3 bucket (e.g., spam-classifier-models) and upload both .pkl files.

aws s3 cp spam_classifier.pkl s3://spam-classifier-models/
aws s3 cp vectorizer.pkl s3://spam-classifier-models/

2.2 Create the Lambda Function

Write a Lambda function that loads the model and vectorizer from S3, processes incoming text, and returns a prediction.

import json
import boto3
import joblib
import os

s3 = boto3.client('s3')
BUCKET = 'spam-classifier-models'

def load_model():
    model_path = '/tmp/spam_classifier.pkl'
    vec_path = '/tmp/vectorizer.pkl'
    if not os.path.exists(model_path):
        s3.download_file(BUCKET, 'spam_classifier.pkl', model_path)
        s3.download_file(BUCKET, 'vectorizer.pkl', vec_path)
    model = joblib.load(model_path)
    vectorizer = joblib.load(vec_path)
    return model, vectorizer

model, vectorizer = load_model()

def lambda_handler(event, context):
    body = json.loads(event['body'])
    text = body['message']
    features = vectorizer.transform([text])
    prediction = model.predict(features)[0]
    label = 'spam' if prediction == 1 else 'ham'
    return {
        'statusCode': 200,
        'body': json.dumps({'prediction': label})
    }

2.3 Set Up API Gateway

Create a REST API in API Gateway with a POST method that triggers the Lambda function. Deploy the API to a stage (e.g., prod). Note the endpoint URL.

Serverless Spam Detection API: Deploying a Scikit-Learn Model with AWS Lambda and API Gateway
Source: www.freecodecamp.org

3. Running the Project Locally

To test before deploying, simulate the Lambda handler locally:

import requests

url = 'https://your-api-id.execute-api.region.amazonaws.com/prod/classify'
response = requests.post(url, json={'message': 'Congratulations! You won a free iPhone!'})
print(response.json())  # {'prediction': 'spam'}

You can also run the entire pipeline locally by loading the model files and calling the transformation directly.

Common Mistakes

Summary

You now have a serverless spam classifier API that can scale from zero to thousands of requests without managing servers. The modular design allows you to retrain the model offline and update it in S3 without touching the API. This approach bridges the gap between ML experimentation and production deployment, making it easy to detect phishing attempts or spam messages in real time. For further reading, check the references on AWS Lambda best practices and scikit-learn model persistence.

Explore

5 Crucial Facts About the OnePlus Pad 4: Debut, Downgrade, and Uncertain Future The Hidden Tracker: How a Postcard Compromised Naval Security Inside the $573M Interconnected Finances of Elon Musk's Companies Guide to Critical Unpatched Flaw Leaves Hugging Face LeRobot Open to Unauthen... Rivian Trims Georgia EV Factory Plans After DOE Cuts Loan to $4.5 Billion