Step 4: Run The Workflow#
Digital Fingerprinting
Training Pipeline#
Before the AI pipeline can be run for the first time, an initial training job must be run using the following command:
kubectl -n $NAMESPACE create job train --from=cronjob/$APP_NAME-training
You should then see a pod named train-XXXXX running within the namespace. In the example below, it is the last pod listed. Wait for this training pod to show a Completed status before proceeding. This should only take a few minutes.
kubectl -n $NAMESPACE get pods
After this initial command, training jobs are submitted as a cronjob on a regular schedule, set by default to Sundays at 3 am. This value can be modified in the values.yaml of the helm chart for the workflow.
This training session consists of two pipelines that run in parallel. One pipeline trains per-user models, where custom models are trained per detected user ID. The other pipeline trains a generic model for use when a previously unseen user is presented. The pipelines use a variety of feature columns from the authentication log, such as timestamp, IP address, and browser version, to develop a user’s fingerprint model. The trained models and associated information can be found in MLflow; the URL is provided in the next section.
Inference Pipeline#
Before proceeding, ensure that at least one training job has been submitted via the above instructions.
The AI application includes sample dashboards and components set up for this pipeline. They can be accessed at the following URLs using the credentials which can be pulled with the provided commands:
Grafana: https://dashboards.my-cluster.my-domain.com
User: admin
Password:
kubectl get secret grafana-admin-credentials -n nvidia-monitoring -o jsonpath='{.data.GF_SECURITY_ADMIN_PASSWORD}' | base64 -d
Kibana: https://kibana-<APP_NAME>-<NAMESPACE>.my-cluster.my-domain.com
User: elastic
Password:
kubectl -n $NAMESPACE get secret $APP_NAME-es-elastic-user -o jsonpath='{.data.elastic}' | base64 -d
MinIO: https://minio-<APP_NAME>-<NAMESPACE>.my-cluster.my-domain.com
User: minioadmin
Password:
kubectl -n $NAMESPACE get secret s3-admin -o jsonpath='{.data.MINIO_ROOT_PASSWORD}' | base64 -d
MLflow: https://mlflow-<APP_NAME>-<NAMESPACE>.my-cluster.my-domain.com
This information can also be found in the Helm release’s notes.txt via the following command:
helm status $APP_NAME -n $NAMESPACE
These components support the pipeline’s functionality, allowing the user to interact with and monitor the activity of the pipeline and data.
For example, Grafana’s dashboard shows relevant data regarding the AI application pipeline performance and functionality and anomalous logs detected by the pipeline. Initially, this dashboard will be blank because the pipeline has not received any streaming data.
A mock data producer to simulate Azure AD Active Directory logs has been included with this solution. To start streaming data into the pipeline, you will need to scale this data producer up to at least one instance using the following command:
kubectl -n $NAMESPACE scale deploy/$APP_NAME-mock-data-producer --replicas=1
Each data producer instance sends roughly 10 msgs/sec. After scaling, you should then see a new data-producer pod running in the namespace:
Once the data producer is running and streaming data, you should now see information populating into the Grafana dashboard. You can open up the dashboard by logging in using the credentials retrieved above, selecting the Dashboards icon in the left navigation pane, selecting browse, then double-clicking the Azure AD - Suspicious Activity Alerts dashboard to open it.
Kibana can also be used to view the data run through the pipeline for analysis and investigation. You can view the data by logging in to the Kibana dashboard using the credentials retrieved earlier, following the steps below:
Open the menu on the top left and select Discover under Analytics.
Click Create Index Pattern.
Enter morpheus-dfp under the Name field, and select createdDateTime under the Timestamp field. Then click Create Index Pattern.
Open the menu on the top left and select Discover under Analytics again.
Change the filter from logs-* to morpheus-dfp.
You should now see all the data after it has been streamed through the pipeline.
Overall pipeline functionality works as follows:
Data is streamed into Kafka from the mock data producer.
Data is ingested into the pipeline, pre-processed, inferenced, and post-processed.
Results from the pipeline are aggregated into Prometheus and presented via Grafana.
After the data runs through the pipeline, it is saved in Elasticsearch for later viewing and analysis via Kibana.
For larger volumes of data, the data producer pods can be scaled up in parallel using the following command:
kubectl -n $NAMESPACE scale deploy/$APP_NAME-mock-data-producer --replicas=3
Once this is done, the pipeline can also be scaled to multiple instances running on multiple GPUs to support this increased throughput using the following commands:
First, we will reconfigure the Kafka topic to allow more listeners. Edit the Kafka topic by running the following command:
kubectl edit kafkatopic -n $NAMESPACE $APP_NAME-inbox
Then, change the number of partitions from 1 to a higher number, such as 5. This dictates the maximum number of listeners for this topic.
Now, we’ll scale up the number of pipeline pods using the following command:
kubectl -n $NAMESPACE scale deploy/$APP_NAME-inference-pipeline --replicas=2
You should note an increased throughput in the dashboard as the number of messages per second scales up.
That’s all there is to running the Digital Fingerprinting AI Workflow. Feel free to explore the other components deployed as a part of the solution. You can also view the source code included in the Digital Fingerprinting Collection on NGC to determine how this workflow is constructed and how you can customize it to your specific use case and environment. For example, the mock data producer included with this workflow can be replaced, and real streaming data can be connected to this pipeline.