Batch

Nitric provides functionality that allows you to run large-scale jobs in parallel across multiple virtual machines or compute resources. Unlike Nitric Services, which respond to real-time events (APIs, Schedules, etc.), Batch is intended to efficiently handle tasks that can be processed in batches, which means they don't need to run in real time but can be executed asynchronously.

Batch services are currently in Preview and are currently only available in the following languages: JavaScript, Python, Go, and Dart, using the nitric/aws@1.14.0, nitric/gcp@1.14.0 or later.

Overview

Batch is designed for workloads that:

Require significant computing resources (CPU, memory, GPU)
Can be processed asynchronously
Need to run in parallel across multiple machines
Don't require real-time responses

Common use cases include:

Machine learning model training
Image and video processing
Data analysis and transformation
Large-scale data processing
Scientific computing

Enabling Batches

Batches are currently in Preview. To enable this feature in your project add the following to your nitric.yaml file:

preview:
  - batch-services

Core Concepts

Batch

A Batch is similar to a Nitric Service, but it's intended for work with a definitive start and finish. Where a service is designed to be reactive, a batch is designed to be proactive and run a series of jobs in parallel.

Job Definition

A Job Definition describes a type of work to be done by a Nitric Batch. It includes:

The handler function to execute
Resource requirements (CPU, memory, GPU)
Environment variables and configuration

Job

A Job is an instance of a Job Definition that is running within a Batch. Jobs can be started from other Nitric Services or Batches.

Limitations

Jobs are designed to be long running HPC workloads and can take some time to spin up. They are not designed with reactivity in mind and are not suitable for responding to events from cloud resources.

Jobs are unable to run the following:

Topic Subscriptions
Bucket Notifications
API & HTTP resources
Websocket message handlers

Jobs can be used to read and write to/from all nitric resources, for example they can publish new messages to a Topic, read and write to a Bucket, or read and write to a Database. They just can't respond to real-time events from these resources.

Defining Batches

Batches are defined similarly to services in a project's nitric.yaml file. For example:

batch-services:
  - match: ./batches/*.ts
    start: yarn dev:services $SERVICE_PATH

Batches can contain any number of Job Definitions.

Defining a Job

Within a Batch we create Job Definitions, by creating a new Job with a unique name and defining a handler function that will be executed when the job is submitted.

import { job } from '@nitric/sdk'

const analyze = job('analyze')

// Use `handler` to register the callback function that will run when a job is submitted
analyze.handler(
  async (ctx) => {
    // Do some work
    console.log('Processing job:', ctx.jobName)
    console.log('Job payload:', ctx.data)
  },
  { cpus: 1, memory: 1024, gpus: 0 },
)

Submitting Jobs for Execution

Jobs may be submitted from Nitric services or other batches using the submit method on the job reference. When submitting a job you can provide a payload that will be passed to the job handler function.

import * as nitric from '@nitric/sdk'

const api = nitric.api('public')
const analyze = nitric.job('analyze').allow('submit')

api.post('/submit-job', async (ctx) => {
  await analyze.submit({
    data: 'some data to process',
  })
  return ctx
})

Best Practices

Resource Allocation: Carefully consider CPU, memory, and GPU requirements for your jobs.
Error Handling: Implement robust error handling in job handlers.
Monitoring: Set up monitoring for job execution and completion.
Cost Optimization: Use appropriate instance types and job durations.
Job Dependencies: Consider job dependencies when submitting multiple jobs.

Cloud Service Mapping

Each cloud provider comes with a set of default services used when deploying resources. You can find the default services for each cloud provider below.

AWS
Azure - Coming soon
Google Cloud

If you need support for additional clouds, let us know by opening an issue or joining the conversation on Discord.