1. 程式人生 > 實用技巧 >使用Cortex把PyTorch模型部署到生產中

使用Cortex把PyTorch模型部署到生產中

Skip to content
PullrequestsIssues Marketplace Explore

/cortex

master 26branches36tags Go to fileAdd fileCode

Latest commit

vishalbolluUpdate batch api sample images (
#1469)
134ee1d15 hours ago

Git stats

Files

Type Name Latest commit message Commit time .circleci Override the number of parallel jobs for the CI machine (#1421) 15 days ago .github Update docs links (#1095) 5 months ago build Upload zipped cli to S3 (#1457) 4 days ago
cli
Change default starting local port for 8888 to 8890 (#1456) yesterday dev Add comment to load.go yesterday docs Update ECR login commands in system packages docs (#1455) yesterday examples Update batch api sample images (#1469) 15 hours ago images Upgrade istio (#1422) 13 days ago
manager
Misc cleanup 6 days ago pkg Ensure log group is created when multiple prefixes exist for the Batc… yesterday .dockerignore Initial commit 2 years ago .gitbook.yaml Update tutorial links 2 months ago .gitignore Add support for Inferentia ASICs (#1119) 4 months ago CODE_OF_CONDUCT.md Update domain name 2 years ago CONTRIBUTING.md Update CONTRIBUTING.md 12 months ago LICENSE Add format and lint to Makefile and CI (#23) 2 years ago Makefile Parallelize make registry-*, ci-build-images and ci-push-images comma… 15 days ago README.md Update README.md (#1416) 7 days ago get-cli.sh Disable prompts in get-cli.sh if not running interactively (#1372) 28 days ago go.mod Upgrade istio (#1422) 13 days ago go.sum Upgrade istio (#1422) 13 days ago

README.md


installdocumentationexampleswe're hiringchat with us


Model serving at scale

Deploy

  • Deploy TensorFlow, PyTorch, ONNX, scikit-learn, and other models.
  • Define preprocessing and postprocessing steps in Python.
  • Configure APIs as realtime or batch.
  • Deploy multiple models per API.

Manage

  • Monitor API performance and track predictions.
  • Update APIs with no downtime.
  • Stream logs from APIs.
  • Perform A/B tests.

Scale

  • Test locally, scale on your AWS account.
  • Autoscale to handle production traffic.
  • Reduce cost with spot instances.

How it works

Write APIs in Python

Define any real-time or batch inference pipeline as simple Python APIs, regardless of framework.

# predictor.py

from transformers import pipeline

class PythonPredictor:
  def __init__(self, config):
    self.model = pipeline(task="text-generation")

  def predict(self, payload):
    return self.model(payload["text"])[0]

Configure infrastructure in YAML

Configure autoscaling, monitoring, compute resources, update strategies, and more.

# cortex.yaml

- name: text-generator
  predictor:
    path: predictor.py
  networking:
    api_gateway: public
  compute:
    gpu: 1
  autoscaling:
    min_replicas: 3

Scale to handle production traffic

Handle traffic with request-based autoscaling. Minimize spend with spot instances and multi-model APIs.

$ cortex get text-generator

endpoint: https://example.com/text-generator

status   last-update   replicas   requests   latency
live     10h           10         100000     100ms

Integrate with your stack

Integrate Cortex with any data science platform and CI/CD tooling, without changing your workflow.

# predictor.py

import tensorflow
import torch
import transformers
import mlflow

...

Run on your AWS account

Run Cortex on your AWS account (GCP support is coming soon), maintaining control over resource utilization and data access.

# cluster.yaml

region: us-west-2
instance_type: g4dn.xlarge
spot: true
min_instances: 1
max_instances: 5

Focus on machine learning, not DevOps

You don't need to bring your own cluster or containerize your models, Cortex automates your cloud infrastructure.

$ cortex cluster up

confguring networking ...
configuring logging ...
configuring metrics ...
configuring autoscaling ...

cortex is ready!

Get started

bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.20/get-cli.sh)"

See ourinstallation guide, then deploy one of ourexamplesor bring your own models to buildrealtime APIsandbatch APIs.

About

Deploy machine learning in production

cortex.dev

Resources

Readme

License

Apache-2.0 License

Releases36

v0.20.0Latest 21 days ago + 35 releases

Contributors16

+ 5 contributors

Languages