Diagnostic Data Processing on Cloudera Altus

Feed: Cloud – Cloudera Engineering Blog.
Author: Shelby Khan.

Fig 1 – Architecture

Introduction

Many of Cloudera’s customers set up Cloudera Manager to collect their clusters’ diagnostic data on a regular schedule and automatically send that data to Cloudera. Cloudera analyzes this data, identifies potential problems in the customer’s environment, and alerts customers, requiring fewer back-and-forths with our customers when they file a support case and provides Cloudera with critical information to improve future versions of all of Cloudera’s software. If Cloudera discovers a serious issue, Cloudera searches this diagnostic data and proactively notifies Cloudera customers who might encounter problems due to the issue. This blog post explains how Cloudera internally uses the Altus as a Service platform in the cloud to perform these analyses. Offloading processing and ad-hoc visualization to the cloud reduces costs since compute resources are used only when needed. Transient ETL workloads process incoming data and stateless data warehouse clusters are used for ad-hoc visualizations. The clusters share metadata via Altus SDX.

Overview

In Cloudera EDH, diagnostic bundles are ingested from customer environments and normalized using Apache NiFi. The bundles are stored in an AWS S3 bucket with date-based partitions (Step 1 in Fig 1). AWS Lambda is scheduled daily (Step 2 in Fig 1) to process the previous day’s data (Step 3 in Fig 1) by spinning up Altus Data Engineering clusters to execute ETL processing and terminating the clusters on job completion. The jobs on these DE clusters produce a fact table and three dimension tables (Star schema). This extracted data is stored in a different S3 bucket (Step 4 in Fig 1) and the metadata produced from these processes, such as schema and partitions, are stored in Altus SDX (Step 5 in Fig 1). Whenever data needs to be visualized, a stateless Altus Data Warehouse cluster is created, which provides easy access to the data via the built-in SQL Editor or JDBC/ODBC Impala connector (Step 6,7 in Fig 1).

Altus SDX

We create configured SDX namespaces for sharing metadata and fine-grained authorization between workloads that run on clusters that we create in Altus Data Engineering and Data Warehouse. Configured SDX namespaces are built on Hive metastore and Sentry databases. These databases are set up and managed by Altus customers.

First, we will create two databases in an external database.

<br>
$ mysql -h support-bundle-analyzer.cxxhh0eqnvu7.us-west-2.rds.amazonaws.com -u sba -p<br>
Enter password:<br>
Welcome to the MySQL monitor.  Commands end with ; or g.<br>
Your MySQL connection id is 51918<br>
Server version: 5.7.23-log Source distribution
<p>Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.</p>
<p>Oracle is a registered trademark of Oracle Corporation and/or its<br>
affiliates. Other names may be trademarks of their respective<br>
owners.</p>
<p>Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the current input statement.</p>
<p>mysql> create database hive_c6;<br>
Query OK, 1 row affected (0.05 sec)</p>
<p>mysql> create database sentry_c6;<br>
Query OK, 1 row affected (0.02 sec)</p>

$ mysql –h support–bundle–analyzer.cxxhh0eqnvu7.us–west–2.rds.amazonaws.com –u sba –p

Enter password:

Welcome to the MySQL monitor. Commands end with ; or g.

Your MySQL connection id is 51918

Server version: 5.7.23–log Source distribution

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the current input statement.

mysql> create database hive_c6;

Query OK, 1 row affected (0.05 sec)

mysql> create database sentry_c6;

Query OK, 1 row affected (0.02 sec)

Fig 2 – Hive and Sentry external database creation

We will then create a configured namespace using those databases.

configured namespace creation cloudera altus

Fig 3 – Configured namespace creation

Altus will initialize the schemas of Hive metastore and Sentry databases when a cluster using a configured namespace is started. We chose to grant namespace admin group all privileges so that when the internal user ‘altus’, who executes the Spark job in Data Engineering clusters, has necessary DDL permissions.

Altus Data Engineering

We need to process the bundles that were uploaded the previous day. To do this, we need a mechanism to start an Altus Data Engineering cluster periodically, which executes the job and terminates after the processing is done. We execute an AWS Lambda function that is periodically triggered by an AWS Cloudwatch rule. This AWS Lambda uses Altus Java SDK to kick off an ephemeral Data Engineering cluster with a Spark job.

<br>
public DEJobKickoff.Response handleRequest(DEJobKickoff.Request request,<br>
                                          Context context) {<br>
   DataengClient deClient =<br>
       DataengClientBuilder.defaultBuilder()<br>
           .withCredentials(new BasicAltusCredentialsProvider())<br>
           .build();<br>
   CreateAWSClusterRequest awsClusterRequest = new CreateAWSClusterRequest();<br>
   awsClusterRequest.setClusterName(“support-bundle-analyzer-secure”);<br>
   awsClusterRequest.setCdhVersion(“CDH61”);<br>
   awsClusterRequest.setServiceType(“SPARK”);<br>
   awsClusterRequest.setEnvironmentName(“atm-secure”);<br>
   awsClusterRequest.setNamespaceName(“sba_c6”);<br>
   awsClusterRequest.setWorkersGroupSize(3);<br>
   awsClusterRequest.setInstanceType(“m4.xlarge”);<br>
   awsClusterRequest.setAutomaticTerminationCondition(“EMPTY_JOB_QUEUE”);<br>
   JobRequest jobRequest = new JobRequest();<br>
   SparkJobRequest sparkJobRequest = new SparkJobRequest();<br>
   sparkJobRequest.setJars(<br>
       Collections.singletonList(<br>
           “s3a://ganeshk-hive-qa/altus_support_bundle_analyzer_v3-9.jar”));<br>
   sparkJobRequest.setMainClass(“com.cloudera.altus_support_bundle_analyzer.Extractor”);<br>
   LocalDate yesterday = LocalDate.now().minusDays(1);<br>
   sparkJobRequest.setApplicationArguments(Collections.singletonList(yesterday.toString()));<br>
   jobRequest.setSparkJob(sparkJobRequest);<br>
   jobRequest.setName(“sba”);<br>
   awsClusterRequest.setJobs(Collections.singletonList(jobRequest));<br>
   deClient.createAWSCluster(awsClusterRequest);<br>
   return new DEJobKickoff.Response(“Success!”);<br>
}

public DEJobKickoff.Response handleRequest(DEJobKickoff.Request request,

Context context) {

DataengClient deClient =

DataengClientBuilder.defaultBuilder()

.withCredentials(new BasicAltusCredentialsProvider())

.build();

CreateAWSClusterRequest awsClusterRequest = new CreateAWSClusterRequest();

awsClusterRequest.setClusterName(“support-bundle-analyzer-secure”);

awsClusterRequest.setCdhVersion(“CDH61”);

awsClusterRequest.setServiceType(“SPARK”);

awsClusterRequest.setEnvironmentName(“atm-secure”);

awsClusterRequest.setNamespaceName(“sba_c6”);

awsClusterRequest.setWorkersGroupSize(3);

awsClusterRequest.setInstanceType(“m4.xlarge”);

awsClusterRequest.setAutomaticTerminationCondition(“EMPTY_JOB_QUEUE”);

JobRequest jobRequest = new JobRequest();

SparkJobRequest sparkJobRequest = new SparkJobRequest();

sparkJobRequest.setJars(

Collections.singletonList(

“s3a://ganeshk-hive-qa/altus_support_bundle_analyzer_v3-9.jar”));

sparkJobRequest.setMainClass(“com.cloudera.altus_support_bundle_analyzer.Extractor”);

LocalDate yesterday = LocalDate.now().minusDays(1);

sparkJobRequest.setApplicationArguments(Collections.singletonList(yesterday.toString()));

jobRequest.setSparkJob(sparkJobRequest);

jobRequest.setName(“sba”);

awsClusterRequest.setJobs(Collections.singletonList(jobRequest));

deClient.createAWSCluster(awsClusterRequest);

return new DEJobKickoff.Response(“Success!”);

This will process the diagnostic data that was stored in a S3 bucket. It will create a fact table and three dimension tables. The data for those tables will be stored in a different S3 bucket. The metadata will be stored in Altus SDX. The cluster is deleted at the end as the job only runs for 30 minutes.

Altus Data Warehouse

Whenever we want to analyze the data, we spin up an Altus Data Warehouse cluster in the same environment, using the same namespace as the Altus Data Engineering cluster. Doing so allows the Altus Data Warehouse to have direct access to the data and metadata created by the Altus Data Engineering clusters.

Fig 4 – Data Warehouse cluster creation

Once the cluster is created we can use the built-in Query Editor to quickly analyze the data.

Fig 5 – Query Data Warehouse cluster using SQL Editor

If you need to generate custom dashboards, connect to the Altus Data Warehouse clusters using Cloudera Impala JDBC/ODBC connector. Recently, we released a new Impala JDBC driver which can connect to an Altus Data Warehouse cluster by simply specifying the cluster’s name.

Fig 6 – Data Warehouse JDBC connection using cluster name

In the image above, we show how easy it is to connect Tableau to the tables in Altus Data Warehouse.

Fig 7 – Visualizing distribution of services in the Cloudera clusters

In the above visualization, you can see the distribution of services running in Cloudera clusters. Note that this is based on a random dataset.

What’s Next?

To get started with a 30-day free Altus trial, visit us at https://www.cloudera.com/products/altus.html. Send questions or feedback to the Altus community forum.

Diagnostic Data Processing on Cloudera Altus

Introduction

Overview

Altus SDX

Altus Data Engineering

Altus Data Warehouse

What’s Next?

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...