Nuclio functions
nuclio is a high performance serverless platform which runs over docker or kubernetesand automate the development, operation, and scaling of code (written in 8 supported languages).Nuclio is focused on data analytics and ML workloads, it provides extreme performance and parallelism, supports stateful and data intensiveworkloads, GPU resource optimization, check-pointing, and 14 native triggers/streaming protocols out of the box including HTTP, Cron, batch, Kafka, Kinesis,Google pub/sub, Azure event-hub, MQTT, etc. additional triggers can be added dynamically (e.g. ).
nuclio can run in the cloud as a managed offering, or on any Kubernetes cluster (cloud, on-prem, or edge)
Using Nuclio In Data Science Pipelines
Nuclio functions can be used in the following ML pipline tasks:
- Data collectors, ETL, stream processing
- Data preparation and analysis
- Hyper parameter model training
- Real-time model serving
- Feature vector assembly (real-time data preparation)
Containerized functions (+ dependent files and spec) can be created directly from a Jupyter Notebookusing magic commands or SDK API calls (see ),or they can be built/deployed using KubeFlow Pipeline (see: nuclio pipeline components)e.g. if we want to deploy/update Inference functions right after we update an ML model.
Nuclio contain detailed documentation on the installation and usage.can also follow this interactive tutorial.
The simplest way to install is using Helm
, assuming you deployed Helm on your cluster, type the following commands:
Browse to the dashboard URL, you can create, test, and manage functions using a visual editor.
Writing and Deploying a Simple Function
The full notebook with the example below can be found here
before you begin install the latest nuclio-jupyter
package:
pip install --upgrade nuclio-jupyter
We write and test our code inside a notebook like any other data science code.We add some %nuclio
magic commands to describe additional configurations such as which packages to install,CPU/Mem/GPU resources, how the code will get triggered (http, cron, stream), environment variables,additional files we want to bundle (e.g. ML model, libraries), versioning, etc.
First we need to import nuclio
package (we add an ignore
comment so this line wont be compiled later):
# nuclio: ignore
import nuclio
We add function spec, environment, configuration details using magic commands:
and we write our code as usual, just make sure we have a handler function whichis invoked to initiate our run. The function accepts a context and an event, e.g.: def handler(context, event)
Function code
the following example show accepting text and doing NLP processing (correction, translation, sentiments):
from textblob import TextBlob
import os
def handler(context, event):
context.logger.info('This is an NLP example! ')
# process and correct the text
blob = TextBlob(str(event.body.decode('utf-8')))
# debug print the text before and after correction
# calculate sentiments
context.logger.info_with("Sentiment",
polarity=str(corrected.sentiment.polarity),
subjectivity=str(corrected.sentiment.subjectivity))
# read target language from environment and return translated text
lang = os.getenv('TO_LANG','fr')
return str(corrected.translate(to=lang))
# nuclio: ignore
event = nuclio.Event(body=b'good morninng')
handler(context, event)
Finally we deploy our function using the magic commands, SDK, or KubeFlow Pipeline.we can simply write and run the following command a cell:
%nuclio deploy -n nlp -p ai -d <nuclio-dashboard-url>
if we want more control we can use the SDK:
We can also deploy our function directly from Git:
addr = nuclio.deploy_file('git://github.com/nuclio/nuclio#master:/hack/examples/python/helloworld',
name='hw', project='myproj', dashboard_url='<dashboard-url>')
resp = requests.get('http://' + addr)
print(resp.text)
We can deploy and test functions as part of a KubeFlow pipeline step.after installing nuclio in your cluster (see instructions above), you can run the following pipeline:
import kfp
from kfp import dsl
nuclio_deploy = kfp.components.load_component(url='https://raw.githubusercontent.com/kubeflow/pipelines/master/components/nuclio/deploy/component.yaml')
nuclio_invoke = kfp.components.load_component(url='https://raw.githubusercontent.com/kubeflow/pipelines/master/components/nuclio/invoker/component.yaml')
@dsl.pipeline(
name='Nuclio deploy and invoke demo',
description='Nuclio demo, build/deploy a function from notebook + test the function rest endpoint'
)
def nuc_pipeline(
txt='good morningf',
):
nb_path = 'https://raw.githubusercontent.com/nuclio/nuclio-jupyter/master/docs/nlp-example.ipynb'
dashboard='http://nuclio-dashboard.nuclio.svc:8070'
# build the function image & CRD from a notebook file (in the above URL)
build = nuclio_deploy(url=nb_path, name='myfunc', project='myproj', tag='0.11', dashboard=dashboard)
# test the function with real data (function URL is taken from the build output)
test = nuclio_invoke(build.output, txt)
the code above assumes nuclio was deployed into the nuclio
namespace on the same cluster, when using a remote cluster or a different namespace you just need to change the dashboard
URL.
See (allowing to deploy, delete, or invoke functions)
Note: Nuclio is not limited to Python, see this example showing how we create a simple function from a Notebook, e.g. we can create
Go
functions if we need performance/concurrency for our inference.