Dipjyoti Metia

Chapter Lead - Testing

Testing LLM-Based Applications

February 21, 2025 · 5 min read

Chapter Lead - Testing

As AI systems powered by Large Language Models (LLMs) become increasingly central to modern applications, the need for robust testing frameworks has never been more critical. In this article, I'll explore practical approaches to testing generative AI applications, with a special focus on using DeepEvals to ensure your LLM systems perform reliably.

The Challenge of Testing Generative AI

Testing traditional software typically involves deterministic inputs and outputs—given the same input, you expect the same output every time. LLMs, however, introduce an element of stochasticity. The same prompt can generate different responses, making traditional testing methodologies insufficient.

Let's explore key testing techniques specifically designed for LLM-based applications.

Essential Testing Techniques for LLM Applications

1. Prompt Testing

Prompt testing involves verifying that your carefully crafted prompts elicit the desired behavior from the underlying models.

Example with DeepEvals:

import pytest
from deepeval import assert_test
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCase, LLMTestCaseParams

def test_case():
    correctness_metric = GEval(
        name="Correctness",
        criteria="Determine if the 'actual output' is correct based on the 'expected output'.",
        evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
        threshold=0.5
    )
    test_case = LLMTestCase(
        input="What if these shoes don't fit?",
        # Replace this with the actual output from your LLM application
        actual_output="You have 30 days to get a full refund at no extra cost.",
        expected_output="We offer a 30-day full refund at no extra costs.",
        retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
    )
    assert_test(test_case, [correctness_metric])

Prompt testing diagram

2. Regression Testing

Regression testing ensures that improvements to your LLM system don't inadvertently break existing functionality.

Example with DeepEvals:

import pytest
from deepeval import assert_test
from deepeval.dataset import EvaluationDataset
from deepeval.metrics import AnswerRelevancyMetric, HallucinationMetric
from deepeval.test_case import LLMTestCase

first_test_case = LLMTestCase(
    input="What is your return policy for damaged items?",
    actual_output="Our standard return policy allows returns within 30 days of purchase. For damaged items specifically, we offer a full refund or replacement if reported within 7 days of delivery. Please include photos of the damage when submitting your claim.",
    context=["Return Policy: Items can be returned within 30 days of purchase for a full refund.", 
             "Damaged items must be reported within 7 days of delivery with photographic evidence.",
             "Replacements for damaged items are subject to inventory availability."]
)

second_test_case = LLMTestCase(
    input="How do I track my order?",
    actual_output="You can track your order by logging into your account on our website and navigating to the 'Order History' section. Alternatively, you can use the tracking number provided in your order confirmation email on our tracking page or directly on the shipping carrier's website.",
    context=["Orders can be tracked through customer accounts under 'Order History'.",
             "Tracking numbers are provided in order confirmation emails.",
             "Tracking can be done on our website or directly with the shipping carrier."]
)

dataset = EvaluationDataset(test_cases=[first_test_case, second_test_case])


@pytest.mark.parametrize(
    "test_case",
    dataset,
)
def test_customer_chatbot(test_case: LLMTestCase):
    hallucination_metric = HallucinationMetric(threshold=0.3)
    answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)
    assert_test(test_case, [hallucination_metric, answer_relevancy_metric])

Regression testing diagram

3. Adversarial Testing

Adversarial testing probes the boundaries of your LLM's capabilities by intentionally providing challenging inputs.

4. Hallucination Testing

Hallucination testing focuses on detecting when an LLM generates factually incorrect information presented as true.

Example with DeepEvals:

from deepeval import evaluate
from deepeval.metrics import HallucinationMetric
from deepeval.test_case import LLMTestCase

# More comprehensive context documents
context = [
    "The annual tech conference was held in San Francisco last week. Over 5,000 attendees participated in workshops and keynote sessions.",
    "The keynote speaker was Dr. Sarah Chen, who discussed advances in quantum computing and its implications for cybersecurity.",
    "Conference attendees received a complimentary laptop bag and water bottle with the conference logo.",
]

# Input query to the LLM
input_query = (
    "What did Dr. Chen talk about at the conference and how many people attended?"
)

# Actual output from the LLM application
actual_output = "Dr. Sarah Chen gave a keynote presentation about quantum computing and cybersecurity at the tech conference in San Francisco, which was attended by 5,000 people. She also announced a new quantum research initiative and distributed free quantum computing textbooks to all participants."

# Create test case with the updated information
test_case = LLMTestCase(input=input_query, actual_output=actual_output, context=context)

# Configure hallucination metric with appropriate threshold
metric = HallucinationMetric(threshold=0.5)

# Measure hallucination in the response
metric.measure(test_case)
print(f"Hallucination Score: {metric.score}")
print(f"Explanation: {metric.reason}")

# Evaluate test cases in bulk
evaluate([test_case], [metric])

Implementing a Comprehensive Testing Strategy

To effectively test your LLM application, you should integrate multiple testing approaches into a single coherent testing strategy.

Conclusion

Testing LLM-based applications presents unique challenges that require specialized approaches. By combining prompt testing, regression testing, adversarial testing and hallucination testing, you can build robust AI systems that deliver reliable, safe, and high-quality user experiences.

DeepEvals provides a comprehensive framework for implementing these testing techniques, allowing developers to rigorously evaluate their LLM applications throughout the development lifecycle.

Remember that testing LLMs is fundamentally different from testing traditional software.

Essential Mathematics for Better Programming

January 6, 2025 · 7 min read

Dipjyoti Metia

Chapter Lead - Testing

Programming and mathematics are deeply intertwined disciplines. While you can certainly write code without being a math expert, understanding key mathematical concepts can dramatically improve your programming skills and help you solve complex problems more efficiently. Lets explore the fundamental mathematical concepts that every programmer should know.

1. Boolean Algebra and Logic

At its core, programming is about logic, and Boolean algebra forms the foundation of computational thinking. Understanding these concepts helps you write better conditional statements and optimize logical operations.

Boolean algebra operates on binary values (true/false, 1/0) using three basic operations:

AND (conjunction)
OR (disjunction)
NOT (negation)

These operations directly translate to programming constructs like if statements, while loops, and complex conditional logic. Understanding De Morgan's Laws can help you simplify complex logical expressions:

NOT (A AND B) = (NOT A) OR (NOT B)
NOT (A OR B) = (NOT A) AND (NOT B)

2. Number Systems and Binary Mathematics

Computers operate in binary, making it essential to understand different number systems:

Binary (base-2) is the foundation of all computing operations. Each digit represents a power of 2, and understanding binary helps you:

Work with bitwise operations
Understand memory allocation
Debug low-level issues
Optimize storage solutions

Hexadecimal (base-16) is commonly used in:

Color codes
Memory addresses
Debugging tools
Binary file formats

3. Basic Algebra and Functions

Algebraic thinking is crucial for programming concepts like:

Variables and Constants

Just as in algebra, variables in programming store values that can change, while constants remain fixed. Understanding algebraic expressions helps you write more maintainable code and understand relationships between variables.

Functions and Mapping

Mathematical functions are similar to programming functions:

They take inputs (parameters)
Perform operations
Return outputs (return values)

Understanding function composition helps you break down complex problems into smaller, manageable pieces.

4. Modular Arithmetic

Modular arithmetic is essential for:

Array indexing
Hash functions
Cryptography
Scheduling algorithms
Resource allocation

The modulo operator (%) is frequently used to:

Wrap around arrays
Create circular buffers
Generate random numbers
Implement cyclic behaviors

5. Basic Statistics and Probability

Statistical concepts are crucial for:

Data Analysis

Mean, median, and mode calculations
Standard deviation for measuring variation
Data distribution patterns
Outlier detection

Performance Optimization

Understanding algorithmic complexity
Analyzing performance metrics
Optimizing resource usage
Predicting system behavior

6. Linear Algebra Basics

Linear algebra concepts are fundamental for:

Graphics Programming

Vector operations
Matrix transformations
3D rotations and translations
Computer vision algorithms

Machine Learning

Data representation
Feature extraction
Pattern recognition
Neural network operations

7. Set Theory

Set theory concepts directly apply to:

Data Structures

Arrays and lists
Sets and dictionaries
Database operations
Collection manipulation

Algorithm Design

Sorting and searching
Graph algorithms
Data filtering
Query optimization

Practical Applications

Let's look at how these mathematical concepts translate into real programming scenarios: We will provide code examples that demonstrate the practical application of these mathematical concepts in Python.

1. Linear Algebra and Vectors

Linear Algebra and Vectors Example
import numpy as np

# Creating vectors and performing operations
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])

# Vector addition
sum_vector = vector1 + vector2  # [5, 7, 9]

# Dot product
dot_product = np.dot(vector1, vector2)  # 32

# Matrix operations
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Matrix multiplication
matrix_product = np.dot(matrix1, matrix2)

def calculate_vector_magnitude(vector):
    """Calculate the magnitude (length) of a vector"""
    return np.sqrt(np.sum(vector**2))

# Example usage in 3D graphics
def rotate_point_3d(point, angle_degrees, axis='z'):
    """Rotate a point around a specified axis"""
    angle = np.radians(angle_degrees)
    if axis == 'z':
        rotation_matrix = np.array([
            [np.cos(angle), -np.sin(angle), 0],
            [np.sin(angle), np.cos(angle), 0],
            [0, 0, 1]
        ])
    return np.dot(rotation_matrix, point)

2. Statistics and Probability

Statistics and Probability Example
import numpy as np
from collections import Counter

def calculate_statistics(data):
    """Calculate basic statistical measures"""
    mean = np.mean(data)
    median = np.median(data)
    std_dev = np.std(data)
    
    # Calculate mode manually
    counter = Counter(data)
    mode = [k for k, v in counter.items() if v == max(counter.values())]
    
    return {
        'mean': mean,
        'median': median,
        'mode': mode,
        'std_dev': std_dev,
        'range': max(data) - min(data)
    }

# Probability calculations
def calculate_probability(favorable_outcomes, total_outcomes):
    """Calculate probability of an event"""
    return favorable_outcomes / total_outcomes

# Monte Carlo simulation example
def estimate_pi(num_points=1000):
    """Estimate π using Monte Carlo method"""
    points_inside_circle = 0
    
    for _ in range(num_points):
        x = np.random.uniform(-1, 1)
        y = np.random.uniform(-1, 1)
        
        if x**2 + y**2 <= 1:
            points_inside_circle += 1
    
    pi_estimate = 4 * points_inside_circle / num_points
    return pi_estimate

3. Number Theory and Modular Arithmetic

Number Theory and Modular Arithmetic Example
def is_prime(n):
    """Check if a number is prime using the Sieve of Eratosthenes concept"""
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True

def generate_fibonacci(n):
    """Generate Fibonacci sequence using dynamic programming"""
    fib = [0, 1]
    for i in range(2, n):
        fib.append(fib[i-1] + fib[i-2])
    return fib

def modular_exponentiation(base, exponent, modulus):
    """Calculate (base^exponent) % modulus efficiently"""
    result = 1
    base = base % modulus
    while exponent > 0:
        if exponent & 1:  # If exponent is odd
            result = (result * base) % modulus
        base = (base * base) % modulus
        exponent >>= 1
    return result

4. Calculus and Optimization

Calculus and Optimization Example
def numerical_derivative(f, x, h=1e-7):
    """Calculate numerical derivative of function f at point x"""
    return (f(x + h) - f(x)) / h

def gradient_descent(f, initial_x, learning_rate=0.1, iterations=100):
    """Simple gradient descent optimization"""
    x = initial_x
    history = [x]
    
    for _ in range(iterations):
        grad = numerical_derivative(f, x)
        x = x - learning_rate * grad
        history.append(x)
    
    return x, history

# Example usage for finding minimum of f(x) = x^2
def f(x):
    return x**2

minimum, convergence_history = gradient_descent(f, initial_x=2.0)

5. Set Theory and Combinatorics

Set Theory and Combinatorics Example
def calculate_combinations(n, r):
    """Calculate combinations (n choose r)"""
    def factorial(n):
        if n == 0:
            return 1
        return n * factorial(n-1)
    
    return factorial(n) // (factorial(r) * factorial(n-r))

def set_operations_example():
    """Demonstrate set operations"""
    set_a = {1, 2, 3, 4, 5}
    set_b = {4, 5, 6, 7, 8}
    
    union = set_a | set_b
    intersection = set_a & set_b
    difference = set_a - set_b
    symmetric_difference = set_a ^ set_b
    
    return {
        'union': union,
        'intersection': intersection,
        'difference': difference,
        'symmetric_difference': symmetric_difference
    }

# Example of using sets for efficient lookups
def find_duplicates(arr):
    """Find duplicates in an array using sets"""
    seen = set()
    duplicates = set()
    
    for item in arr:
        if item in seen:
            duplicates.add(item)
        seen.add(item)
    
    return list(duplicates)

These examples demonstrate how mathematical concepts translate directly into practical programming applications.

Conclusion

Mathematics provides the theoretical foundation for many programming concepts. While you don't need to be a mathematician to be a good programmer, understanding these basic mathematical principles will:

Improve your problem-solving abilities
Help you write more efficient code
Enable you to understand complex algorithms
Python provides powerful tools for mathematical computations
Understanding the underlying math helps in writing more efficient code

The combination of mathematical thinking and programming expertise will make you a more effective problem solver.

VS Code Setup for Golang Programming

January 5, 2025 · 5 min read

Dipjyoti Metia

Chapter Lead - Testing

As a Go developer, having the right programming environment can significantly boost your productivity. Visual Studio Code has become the go-to editor for many Golang developers, thanks to its excellent extension ecosystem and customizability. In this guide, I'll walk you through creating the perfect VS Code setup for Go development.

Installing Prerequisites
Essential Extensions
Advanced Configuration
Productivity Tips and Tricks
Debugging Setup
Theme and UI Customization

Installing Prerequisites

Before we dive into VS Code configuration, ensure you have:

The latest version of Go installed (1.22+ recommended)
Git for version control
Visual Studio Code
Make (optional but recommended for build automation)

Essential Extensions

Let's start with the must-have extensions for Go development:

Go (ms-vscode.go)
- Official Go extension
- Provides language support, debugging, and testing features
- Install by pressing Cmd+P (Mac) or Ctrl+P (Windows/Linux) and typing: ext install ms-vscode.go
Go Test Explorer
- Visual test runner for Go
- Makes testing more intuitive and visual
Error Lens
- Inline error highlighting
- Better error visibility without hovering
GitLens
- Enhanced Git integration
- Great for team collaboration

Advanced Configuration

Here's an optimized settings.json configuration for Go development:

settings.json
{
    "go.toolsManagement.autoUpdate": true,
    "go.useLanguageServer": true,
    "go.addTags": {
        "tags": "json",
        "options": "json=omitempty",
        "promptForTags": false,
        "transform": "snakecase",
    },
    "gopls": {
        "formatting.gofumpt": true,
        "usePlaceholders": true,
        "ui.semanticTokens": true,
        "staticcheck": false, // Enable if it's now better optimized
    },
    "go.lintTool": "golangci-lint",
    "go.lintFlags": [
        "--fast",
        "--timeout",
        "5m",
        "--fix"
    ],
    // disable test caching, race and show coverage (in sync with makefile)
    "go.testFlags": [
        "-cover",
        "-race",
        "-count=1",
        "-v",
        "-s",
        "-benchtime=5s",
        "-timeout=5m"
    ],
    "go.enableCodeLens": {
        "runtest": true,
    },
    // Go-specific editor settings
    "[go]": {
        "editor.insertSpaces": false,
        "editor.formatOnSave": true,
        "editor.formatOnSaveMode": "file",
        "editor.stickyScroll.enabled": true,  // Better navigation for long files
        "editor.codeActionsOnSave": {
            "source.organizeImports": "always",
            "source.fixAll": "always"
        },
    },
    // Enhanced inlay hints
    "go.inlayHints.compositeLiteralFields": true,
    "go.inlayHints.compositeLiteralTypes": true,
    "go.inlayHints.functionTypeParameters": true,
    "go.inlayHints.parameterNames": true,
    "go.inlayHints.rangeVariableTypes": true,
    "go.inlayHints.constantValues": true,
    // Security checks
    "go.diagnostic.vulncheck": "Imports",
    "go.toolsEnvVars": {
        "GOFLAGS": "-buildvcs=false" // Better performance for large repos
    },
}

Productivity Tips and Tricks

1. Keyboard Shortcuts

Essential shortcuts for Go development:

F12: Go to definition
Alt+F12: Peek definition
Shift+Alt+F12: Find all references
F2: Rename symbol
Ctrl+Shift+P: Command palette (use for Go commands)

2. Snippets

Create custom snippets for common Go patterns. Here's an example snippet for a test function:

Test Function Snippet
{
    "Test function": {
        "prefix": "test",
        "body": [
            "func Test${1:Name}(t *testing.T) {",
            "    tests := []struct{",
            "        name string",
            "        input ${2:type}",
            "        want ${3:type}",
            "    }{",
            "        {",
            "            name: \"${4:test case}\",",
            "            input: ${5:value},",
            "            want: ${6:value},",
            "        },",
            "    }",
            "",
            "    for _, tt := range tests {",
            "        t.Run(tt.name, func(t *testing.T) {",
            "            got := ${7:function}(tt.input)",
            "            if got != tt.want {",
            "                t.Errorf(\"got %v, want %v\", got, tt.want)",
            "            }",
            "        })",
            "    }",
            "}"
        ],
        "description": "Create a new test function with table-driven tests"
    }
}

3. Task Automation

Create a tasks.json for common operations:

tasks.json
{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "go: build",
            "type": "shell",
            "command": "go build -v ./...",
            "group": {
                "kind": "build",
                "isDefault": true
            }
        },
        {
            "label": "go: test",
            "type": "shell",
            "command": "go test -v -cover ./...",
            "group": {
                "kind": "test",
                "isDefault": true
            }
        }
    ]
}

Debugging Setup

Configure launch.json for debugging:

launch.json
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Launch Package",
            "type": "go",
            "request": "launch",
            "mode": "auto",
            "program": "${fileDirname}"
        },
        {
            "name": "Attach to Process",
            "type": "go",
            "request": "attach",
            "mode": "local",
            "processId": "${command:pickProcess}"
        }
    ]
}

Theme and UI Customization

For optimal Go development, I recommend:

Theme: One Dark Pro or GitHub Theme
Font: JetBrains Mono or Fira Code with ligatures
Icon Theme: Material Icon Theme
Editor: Line cursor, smooth blinking, and 4-space tabs
Window: Zoom level 0.5 for better readability
Rulers: Set at 88 and 120 columns for code alignment
Minimap: Disable for better performance
Inlay Hints: Font size 14 for better visibility
Tree Indent: Set to 20 for better file navigation
Auto Indent: Full for consistent code formatting

Apply these settings in your configuration:

settings.json
{
    "window.zoomLevel": 0.5,
    "workbench.iconTheme": "material-icon-theme",
    "workbench.tree.indent": 20,
    "editor.cursorStyle": "line",
    "editor.cursorBlinking": "smooth",
    "editor.fontSize": 14,
    "editor.fontVariations": false,
    "editor.inlayHints.fontSize": 14,
    "editor.tabSize": 4,
    "editor.insertSpaces": true,
    "editor.autoIndent": "full",
    "editor.fontFamily": "'Fira Code'",
    "editor.fontLigatures": true,
    "editor.minimap.enabled": false,
    "editor.rulers": [88,120],
}

Conclusion

This setup provides a powerful, efficient, and visually appealing environment for Go development. Remember to regularly update your VS Code and extensions to get the latest features and improvements.

The beauty of VS Code lies in its customizability – feel free to modify these settings based on your preferences and workflow. The key is finding the right balance between functionality and simplicity that works best for you.

Happy coding! 🚀

Did you find this guide helpful? Follow me for more development tips and tricks!

python dependency management

January 2, 2025 · 4 min read

Dipjyoti Metia

Chapter Lead - Testing

Python package management can be tricky, especially when working with machine learning and AI projects that often have complex dependencies. In this guide, we'll explore how to use pipx and poetry together to create a robust development environment for your generative AI projects.

What are pipx and poetry?

pipx is a tool that lets you install and run Python applications in isolated environments. Think of it as npm install -g for Python, but with better isolation. Poetry, on the other hand, is a dependency management and packaging tool that makes it easy to manage project dependencies and build packages.

Setting Up Your Environment

1. Installing pipx

First, let's install pipx. It's recommended to use pip to install pipx globally:

python -m pip install --user pipx
python -m pipx ensurepath

2. Installing poetry using pipx

Now that we have pipx, we can use it to install poetry in an isolated environment:

pipx install poetry

Creating a New GenAI Project

1. Project Initialization

Let's create a new project:

poetry new genai-project
cd genai-project

This creates a basic project structure:

genai-project/
├── pyproject.toml
├── README.md
├── genai_project/
│   └── __init__.py
└── tests/
    └── __init__.py

2. Configuring poetry

Let's modify the pyproject.toml file for our GenAI project:

pyproject.toml
[tool.poetry]
name = "genai-project"
version = "0.1.0"
description = "A generative AI project using modern Python tools"
authors = ["Your Name <your.email@example.com>"]

[tool.poetry.dependencies]
python = "^3.9"
torch = "^2.0.0"
transformers = "^4.30.0"
datasets = "^2.12.0"
accelerate = "^0.20.0"

[tool.poetry.group.dev.dependencies]
pytest = "^7.3.1"
black = "^23.3.0"
isort = "^5.12.0"
flake8 = "^6.0.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

3. Installing Dependencies

Install the project dependencies:

poetry install

Working with Virtual Environments

1. Activating the Environment

Poetry automatically creates and manages virtual environments. To activate it:

poetry shell

2. Running Scripts

You can run Python scripts in your project using:

poetry run python your_script.py

Best Practices for GenAI Projects

1. Managing GPU Dependencies

For GPU support, you might need to install PyTorch with CUDA. Modify your pyproject.toml:

pyproject.toml
[tool.poetry.dependencies]
torch = { version = "^2.0.0", source = "pytorch" }

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu117"
priority = "explicit"

2. Dependency Groups

Organize dependencies into groups for better management:

pyproject.toml
[tool.poetry.group.training]
optional = true
dependencies = {accelerate = "^0.20.0", wandb = "^0.15.0"}

[tool.poetry.group.inference]
optional = true
dependencies = {onnxruntime-gpu = "^1.15.0"}

Install specific groups:

poetry install --with training

3. Version Control

Add these entries to your .gitignore:

.venv/
dist/
__pycache__/
*.pyc
.pytest_cache/

Common Workflows

1. Adding New Dependencies

poetry add transformers datasets

2. Updating Dependencies

poetry update

3. Exporting Requirements

For environments that don't use poetry:

poetry export -f requirements.txt --output requirements.txt

Troubleshooting

1. GPU Dependencies

If you encounter GPU-related issues:

Ensure CUDA is properly installed
Match PyTorch version with your CUDA version
Use nvidia-smi to verify GPU availability

2. Memory Issues

For large models:

Use poetry config virtualenvs.in-project true to create the virtual environment in your project directory
Consider using poetry run python -m pytest instead of pytest directly

Conclusion

Using pipx and poetry together provides a robust foundation for GenAI projects. The isolation provided by pipx ensures that poetry itself doesn't interfere with other Python tools, while poetry's dependency management makes it easy to handle complex AI library requirements.

Remember to:

Always use poetry for dependency management
Keep your pyproject.toml updated
Commit both pyproject.toml and poetry.lock to version control
Use dependency groups to organize optional dependencies

This setup will help you maintain a clean, reproducible environment for your GenAI projects, making it easier to collaborate and deploy your models.

function

July 30, 2023 · 4 min read

Dipjyoti Metia

Chapter Lead - Testing

What is serverless

Serverless computing is a method of providing backend services on an as-used basis. A serverless provider allows users to write and deploy code without the hassle of worrying about the underlying infrastructure. code executes in a fully managed environment and no need to provision any infrastructure.

Introduction to cloud functions

Google Cloud Functions is a serverless execution environment for building and connecting cloud services. With Cloud Functions you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your Cloud Function is triggered when an event being watched is fired. Your code executes in a fully managed environment. There is no need to provision any infrastructure or worry about managing any servers.

Functions Framework

The Functions Framework lets you write lightweight functions that run in many different environments. Functions framework

package main

import (
 "github.com/GoogleCloudPlatform/functions-framework-go/funcframework"
 p "github.com/cloudmock"
 "golang.org/x/net/context"
 "log"
 "os"
)

func main() {
 ctx := context.Background()
 if err := funcframework.RegisterHTTPFunctionContext(ctx, "/", p.GoMock); err != nil {
  log.Fatalf("funcframework.RegisterHTTPFunctionContext: %v\n", err)
 }
 port := "8080"
 if envPort := os.Getenv("PORT"); envPort != "" {
  port = envPort
 }
 if err := funcframework.Start(port); err != nil {
  log.Fatalf("funcframework.Start: %v\n", err)
 }
}

package db

import (
 "context"
 "fmt"
 "log"
 "os"
 "time"

 "github.com/cloudmock/config"
 "github.com/cloudmock/secret"
 "go.mongodb.org/mongo-driver/mongo"
 "go.mongodb.org/mongo-driver/mongo/options"
)

const ENV = "ENVIRONMENT"

func NewDatabaseConnection() *mongo.Collection {
 var err error
 log.Print("Connecting to mongodb")
 conf, err := config.LoadConfigPath("config/app")
 if err != nil {
  log.Fatalf("")
 }
 env := os.Getenv(ENV)
 var client *mongo.Client

 conn, err := secret.GetSecrets()
 if err != nil {
  log.Fatalf("mongo db secret url failed %v", err)
 }
 if env == "dev" {
  fmt.Println("Connecting to localdb")
  client, err = mongo.NewClient(options.Client().SetAuth(
   options.Credential{
    Username: conf.DBuser,
    Password: conf.DBpassword,
   }).ApplyURI(conf.DBurl))
 } else {
  client, err = mongo.NewClient(options.Client().ApplyURI(conn))
 }

 if err != nil {
  log.Fatalf("mongo db client failed %v", err)
 }
 ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
 defer cancel()
 err = client.Connect(ctx)
 if err != nil {
  log.Fatalf("mongo db connection failed %s", err) //nolint:gocritic
 }
 return client.Database("function").Collection("payments")
}

package router

import (
 "encoding/json"
 "github.com/brianvoe/gofakeit/v6"
 "net/http"
)

type UserDetails struct {
 Name     string `json:"name"`
 Email    string `json:"email"`
 Phone    string `json:"phone"`
 Address  string `json:"address"`
 Company  string `json:"company"`
 JobTitle string `json:"jobTitle"`
}

func NewUserWrite() *[]UserDetails {
 var usr []UserDetails
 for i := 0; i < gofakeit.RandomInt([]int{5, 10, 12, 4, 11}); i++ {
  usr = append(usr, UserDetails{
   Name:     gofakeit.Name(),
   Email:    gofakeit.Email(),
   Phone:    gofakeit.Phone(),
   Address:  gofakeit.Address().Address,
   Company:  gofakeit.Company(),
   JobTitle: gofakeit.JobTitle(),
  })
 }
 return &usr
}

func User() func(w http.ResponseWriter, r *http.Request) {
 return func(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Content-Type", "application/json")
  jData, err := json.Marshal(NewUserWrite())
  if err != nil {
   http.Error(w, err.Error(), http.StatusInternalServerError)
   return
  }
  w.WriteHeader(http.StatusOK)
  w.Write(jData)
 }
}

package p

import (
 "github.com/cloudmock/router"
 "github.com/go-chi/chi/v5"
 "github.com/go-chi/chi/v5/middleware"
 "github.com/go-chi/httprate"
 "github.com/rs/cors"
 "net/http"
 "time"
)

func GoMock(w http.ResponseWriter, r *http.Request) {
 rc := chi.NewRouter()
 conn := db.NewDatabaseConnection()

 rc.Use(middleware.RealIP)
 rc.Use(middleware.Logger)
 rc.Use(httprate.Limit(
  2,
  1*time.Second,
  httprate.WithLimitHandler(func(w http.ResponseWriter, r *http.Request) {
   http.Error(w, "too many requests", http.StatusTooManyRequests)
  }),
 ))

 rc.Route("/api/v1", func(rc chi.Router) {
  rc.Get("/users", router.User())
  rc.Get("/categories", router.Category())
 })

 cors.Default().Handler(rc).ServeHTTP(w, r)
}

Deploy cloud function

name: Build and Deploy to CloudFunction

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    name: deploy
    runs-on: ubuntu-latest
    steps:
      - uses: google-github-actions/setup-gcloud@master
        with:
          project_id: ${{ secrets.GCP_PROJECT_ID }}
          service_account_key: ${{ secrets.gcp_credentials }}
          export_default_credentials: true
      - uses: actions/checkout@v2
      - name: Deploy serverless function
        run: |
          gcloud functions deploy "GoMock" \
            --runtime go113 --trigger-http \
            --allow-unauthenticated \
            --region australia-southeast1 \
            --update-env-vars MONGODB=${{ secrets.mongo_secret }} \
            --max-instances 2 \
            --memory 128mb \
            --service-account=${{ secrets.service_account }} \
            --no-user-output-enabled

Why Mocking using cloud function

Use cases of mocking using cloud function

System Testing

Performance testing

Performance tests check the behaviors of the system when it is under significant load. These tests are non-functional and can have the various form to understand the reliability, stability, and availability of the platform. For instance, it can be observing response times when executing a high number of requests, or seeing how the system behaves with a significant of data.

Kafka

April 17, 2023 · 7 min read

Dipjyoti Metia

Chapter Lead - Testing

What is Apache Kafka?

Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Behind the scenes, Kafka is distributed, scales well, replicates data across brokers (servers), can survive broker downtime, and much more.

Topics, Partitions and Offsets

Topics: A particular stream of data

Similar to a table of the database
You can have as many topics you can
A topic is identified by its name

Topics are split in partitions

Each partition is ordered
Each message in partition will get an incremental ID called offset
Partition 0, 1, 2 ....
Order only guaranteed within a partition, not across partitions
Data is kept only for a limited time.
Once the data is written to a partition it cannot be changed.

Example Scenario : You can have multiple cabs, and each cabs reports its GPS location to kafka. You can have a topic cabs_gps that contains the position of all cabs. Each cab will send a message to kafka every 20 sec, each message will contain the cabID, and the cab location(lat/long)

Brokers & Topics

A kafka cluster is composed of multiple brokers(servers)
Each broker is identified by its ID(integer)
Each broker contains certain topic partitions
After connecting to any broker(called a bootstrap broker), you will be connected to the entire cluster
A good number to get start is 3 brokers, but some big clusters have more than 100 brokers

Example of topic A with 3 partitions Example of topic B with 2 partitions

Topics replication

Topics should have a replication factor >1 (Usually between 2 and 3)
This way if one broker is down another broker can serve the data. Example of topic A with replication factor 2
At any time only ONE broker can be a leader for a given partition
Only that leader can receive and serve data for a partition.
The other broker will synchronize the data.
So each partition has one leader and multiple ISR (in-sync-replica)

Producer

Producer write data to topics(which is made of partitions)
Producer automatically know to which broker and partition to write.
In case broker failure, Producers will automatically recover
Producers can choose to receive acknowledgment of data writes.
- acks=0 Producer won't wait for acknowledgment (Possible data loss)
- acks=1 Producer will wait for leader acknowledgment (Limited data loss)
- acks=2 Leader & Replica acknowledgment (no data loss)
Producer can choose to send a key with the message(string,num etc.)
If key==null data will sent round robin(broker 101 then 102 then 103)
If key is sent then all message for that key will send to same partition
A key is sent if we need a message ordering for a specific field as cabID.

producer.java
@Slf4j
public static void main(String[] args) {
    String topic = "second-topic";
    String value = "hello kafka";
    String bootstrapServer = "127.0.0.1:9092";
    // Create producer properties
    Properties properties = new Properties();
    properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
    properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    // Create the producer
    KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
    ProducerRecord<String, String> record = new ProducerRecord<>(topic, value);
    log.info("Creating producer");
    // Send Data
    producer.send(record, (metadata, e) -> {
        // Execute every time record is successfully send
        if (e == null) {
            log.info((metadata.timestamp()));
            log.info(topic, metadata.topic());
            log.info(metadata.hasOffset());
            log.info(metadata.hasTimestamp());
        } else {
            e.printStackTrace();
        }
    });
    producer.flush();
    producer.close();
}

Consumer

Consumer read data from a topic(identified by name)
Consumer knows which broker to read from
In case of broker failure, consumer know how to recover
Data is read in order with in each partition
Consumer read data in consumer groups
Each consumer within a group reads form exclusive partitions
If you have more consumers than partitions, some consumers will be inactive
Kafka stores the offset at which a consumer group has been reading
The offsets committed live in a kafka topic named _consumer_offsets
When a consumer in a group has processed the data received from kafka, it should be committing the offsets.
If a consumer dies, it will be able to read back from where it left off.

consumer.java
public static void main(String[] args) {

    String bootstrapServer = "127.0.0.1:9092";
    String groupId = "my-sixth-application";
    String topic = "second-topic";

    // Create consumer config
    Properties properties = new Properties();
    properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
    properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
    properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
    properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, groupId);
    properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

    // Create consumer
    KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);

    // subscribe consumer to our topic
    consumer.subscribe(Arrays.asList(topic));

    // poll for the new data
    while (true) {
        ConsumerRecords<String, String> records =
                consumer.poll(Duration.ofMillis(100));
        for (ConsumerRecord<String, String> record : records) {
            log.info("Key: " + record.key() + ", Value: " + record.value());
            log.info("Partition: " + record.partition() + ", Offset: " + record.offset());
        }
    }
}

Zookeeper

Zookeeper manager brokers(keeps a list of them)
Zookeeper helps in performing leader election for partition
Zookeeper send notifications to kafka in case of any changes.

Schema Registry

Kafka takes bytes as an input and publishes them
No data verification
Schema registry rejects bat data
A common data format must be agreed upon
Apache avro as data format
- Data is fully typed
- Date is compressed automatically
- Schema comes along with the data
- Documentation is embedded in the schema
- Data can be read across any language
- Schema can be evolved over time in safe manner

Avro

Apache Avro is a data serialization system.

Avro provides:
- Rich data structures.
- A compact, fast, binary data format.
- A container file, to store persistent data.
- Remote procedure call (RPC).
- Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

{"namespace": "dip.avro",
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "favorite_number",  "type": ["int", "null"]},
    {"name": "favorite_color", "type": ["string", "null"]}
  ]
}

Common Fields:
- Name: Name of the schema
- Namespace: (equivalent of package in java)
- Doc: Documentation to explain your schema
- Aliases: Optional other name for schema
- Fields
  - Name: Name of field
  - Doc: Documentation for that field
  - Type: Data type for that field
  - Default: Default value for that field
- Complex types:
  - Enums
    { "type": "enum", "name": "Customer Status", "symbols": ["BRONZE","SILVER","GOLD"] }
  - Arrays
    { "type": "array", "items": "string" }
  - Maps
    { "type": "map", "values": "string" }
  - Unions
    { "name": "middle_name", "type": [ "null", "string" ], "default": "null" }
  - Calling other schema as type

Kafka Rest Proxy

kafka is great for java based consumers/producers
Avro support for some languages isn't great, where JSON/HTTP requests are great.
Reporting data to Kafka from any frontend app built in any language not supported by official Confluent clients
Ingesting messages into a stream processing framework that doesn’t yet support Kafka
Perform a comprehensive set of administrative operations through REST APIs, including:
- Describe, list, and configure brokers
- Create, delete, describe, list, and configure topics
- Delete, describe, and list consumer groups
- Create, delete, describe, and list ACLs
- List partition reassignments

Github Eyes

January 26, 2023 · 3 min read

Dipjyoti Metia

Chapter Lead - Testing

Presenting github eyes, a golang implementation of the github rest apis using Google GitHub sdk to interact with the Github Api, using github apis we can crawl over multiple repository and automate different tasks from creating repo, creating labels, adding milestones, get latest commits, updating workflows, get the project build status etc, below is the basic demonstration of getting list of issues from multiple repos.

The go-github library does not directly handle authentication. The easiest and recommended way to do this is using the OAuth2 library, If you have an OAuth2 access token (for example, a personal API token), you can use it with the OAuth2 library. To get the personal api token follow the documentation and Below is the code snippet for authentication using oauth2.

auth.go
package github

import (
 "context"

 "github.com/google/go-github/v33/github"
 "golang.org/x/oauth2"
)

AUthenticating using github access token
// AuthGithubAPI authentication of github api
func AuthGithubAPI(ctx context.Context) *github.Client {
 ts := oauth2.StaticTokenSource(
  &oauth2.Token{AccessToken: "XXXXXXXXXXXXXXXXXXXXXXX"},
 )
 tc := oauth2.NewClient(ctx, ts)
 return github.NewClient(tc)
}

Getting the list of issues in a repository, here we have created a struct named Issues with the required fields and then created a function ListIssues where we are passing the github api authentication and then client.Issues.ListByRepo is doing the job where underneath its calling Github Issues Api. We can also extend this function by adding filters to get open/closed issues and so on.

issues.go
package github

import (
 "context"
 "log"
 "time"
)

type Issues struct {
 ID        int64
 Title     string
 State     string
 CreatedAt time.Time
 URL       string
}

// ListIssues get list of issues
func ListIssues(repos string) interface{}{
 ctx := context.Background()
 client := AuthGithubAPI(ctx)
 issues, _, err := client.Issues.ListByRepo(ctx, "dipjyotimetia", repos, nil)
 if err != nil {
  log.Println(err)
 }

 var issueList []interface{}
 for _, v := range issues {
  issueList = append(issueList,&Issues{
   ID:        v.GetID(),
   Title:     v.GetTitle(),
   State:     v.GetState(),
   CreatedAt: v.GetCreatedAt(),
   URL:       v.GetHTMLURL(),
  })
 }
 return issueList
}

Main function to drive the show, here we are passing the repo names in an array called repoNames and in a loop calling the the function derived above ListIssues and then generating the result in a json file in local path.

main.go
package main

import (
 "encoding/json"
 "github.com/goutils/pkg/github"
 "io/ioutil"
)

func main() {
 repoNames := []string{"HybridTestFramewrok", "MobileTestFramework"}
 var result []interface{}
 for _, repoName := range repoNames {
  result = append(result, repoName, github.ListIssues(repoName))
 }

 file, _ := json.MarshalIndent(result, "", "")
 _ = ioutil.WriteFile("test.json", file, 0644)
}

Example of the exported json data of the ListIssues function for the two repos.

[
  "HybridTestFramewrok",
  [
    {
      "ID": 690950907,
      "Title": "Add reddis tests support",
      "State": "open",
      "CreatedAt": "2020-09-02T11:42:07Z",
      "URL": "https://github.com/dipjyotimetia/HybridTestFramewrok/issues/65"
    },
    {
      "ID": 690950833,
      "Title": "Add ssh login builder",
      "State": "open",
      "CreatedAt": "2020-09-02T11:42:01Z",
      "URL": "https://github.com/dipjyotimetia/HybridTestFramewrok/issues/64"
    },
    {
      "ID": 690950781,
      "Title": "Add file reader validations",
      "State": "open",
      "CreatedAt": "2020-09-02T11:41:55Z",
      "URL": "https://github.com/dipjyotimetia/HybridTestFramewrok/issues/63"
    },
    {
      "ID": 690950708,
      "Title": "add kafka testing",
      "State": "open",
      "CreatedAt": "2020-09-02T11:41:48Z",
      "URL": "https://github.com/dipjyotimetia/HybridTestFramewrok/issues/62"
    },
    {
      "ID": 690950641,
      "Title": "add rabitmq testing support",
      "State": "open",
      "CreatedAt": "2020-09-02T11:41:43Z",
      "URL": "https://github.com/dipjyotimetia/HybridTestFramewrok/issues/61"
    }
  ],
  "MobileTestFramework",
  [
    {
      "ID": 793821012,
      "Title": "Add AWS Device Farm support",
      "State": "open",
      "CreatedAt": "2021-01-26T00:19:55Z",
      "URL": "https://github.com/dipjyotimetia/MobileTestFramework/issues/88"
    }
  ]
]

Project structure

Serverless Framework

May 16, 2022 · One min read

Dipjyoti Metia

Chapter Lead - Testing

alt text

Where to start?

npm install -g serverless

alt text

Create IAM user
Setup user access
serverless config credentials --provider aws --key xxxxxxxxxxxxxx --secret xxxxxxxxxxxxxx
Create project
serverless create --template aws-nodejs --path my-service
Serverless yml
Serverless Deploy
serverless deploy -v

alt text

Serverless offline

https://github.com/dherault/serverless-offline

serverless plugin install --name serverless-offline
serverless offline start

alt text

Insomnia

alt text

Serverless dashbird

Mongodb

alt text

$ npm init -y
$ npm i --save-dev serverless-offline
$ npm i --save mongoose dotenv
sls offline start --skipCacheInvalidation

The Challenge of Testing Generative AI​

Essential Testing Techniques for LLM Applications​

1. Prompt Testing​

2. Regression Testing​

3. Adversarial Testing​

4. Hallucination Testing​

Implementing a Comprehensive Testing Strategy​

Conclusion​

1. Boolean Algebra and Logic​

2. Number Systems and Binary Mathematics​

3. Basic Algebra and Functions​

Variables and Constants​

Functions and Mapping​

4. Modular Arithmetic​

5. Basic Statistics and Probability​

Data Analysis​

Performance Optimization​

6. Linear Algebra Basics​

Graphics Programming​

Machine Learning​

7. Set Theory​

Data Structures​

Algorithm Design​

Practical Applications​

1. Linear Algebra and Vectors​

2. Statistics and Probability​

3. Number Theory and Modular Arithmetic​

4. Calculus and Optimization​

5. Set Theory and Combinatorics​

Conclusion​

Table of Contents​

Installing Prerequisites​

Essential Extensions​

Advanced Configuration​

Productivity Tips and Tricks​

1. Keyboard Shortcuts​

2. Snippets​

3. Task Automation​

Debugging Setup​

Theme and UI Customization​

Conclusion​

What are pipx and poetry?​

Setting Up Your Environment​

1. Installing pipx​

2. Installing poetry using pipx​

Creating a New GenAI Project​

1. Project Initialization​

2. Configuring poetry​

3. Installing Dependencies​

Working with Virtual Environments​

1. Activating the Environment​

2. Running Scripts​

Best Practices for GenAI Projects​

1. Managing GPU Dependencies​

2. Dependency Groups​

3. Version Control​

Common Workflows​

1. Adding New Dependencies​

2. Updating Dependencies​

3. Exporting Requirements​

Troubleshooting​

1. GPU Dependencies​

2. Memory Issues​

Conclusion​

What is serverless​

Introduction to cloud functions​

Functions Framework​

Deploy cloud function​

Why Mocking using cloud function​

Use cases of mocking using cloud function​

System Testing​

Performance testing​

What is Apache Kafka?​

Topics, Partitions and Offsets​

Brokers & Topics​

Topics replication​

Producer​

Consumer​

Zookeeper​

Schema Registry​

The Challenge of Testing Generative AI

Essential Testing Techniques for LLM Applications

1. Prompt Testing

2. Regression Testing

3. Adversarial Testing

4. Hallucination Testing

Implementing a Comprehensive Testing Strategy

Conclusion

1. Boolean Algebra and Logic

2. Number Systems and Binary Mathematics

3. Basic Algebra and Functions

Variables and Constants

Functions and Mapping

4. Modular Arithmetic

5. Basic Statistics and Probability

Data Analysis

Performance Optimization

6. Linear Algebra Basics

Graphics Programming

Machine Learning

7. Set Theory

Data Structures

Algorithm Design

Practical Applications

1. Linear Algebra and Vectors

2. Statistics and Probability

3. Number Theory and Modular Arithmetic

4. Calculus and Optimization

5. Set Theory and Combinatorics

Conclusion

Table of Contents

Installing Prerequisites

Essential Extensions

Advanced Configuration

Productivity Tips and Tricks

1. Keyboard Shortcuts

2. Snippets

3. Task Automation

Debugging Setup

Theme and UI Customization

Conclusion

What are pipx and poetry?

Setting Up Your Environment

1. Installing pipx

2. Installing poetry using pipx

Creating a New GenAI Project

1. Project Initialization

2. Configuring poetry

3. Installing Dependencies

Working with Virtual Environments

1. Activating the Environment

2. Running Scripts

Best Practices for GenAI Projects

1. Managing GPU Dependencies

2. Dependency Groups

3. Version Control

Common Workflows

1. Adding New Dependencies

2. Updating Dependencies

3. Exporting Requirements

Troubleshooting

1. GPU Dependencies

2. Memory Issues

Conclusion

What is serverless

Introduction to cloud functions

Functions Framework

Deploy cloud function

Why Mocking using cloud function

Use cases of mocking using cloud function

System Testing

Performance testing

What is Apache Kafka?

Topics, Partitions and Offsets

Brokers & Topics

Topics replication

Producer

Consumer

Zookeeper

Schema Registry