Monthly Archives: June 2021

  • 0

Install and configure Spark History Server (SHS) on Kubernetes K8s

We always struggle like how to install and configure SHS on Kubernetes with gas event log. So here is your solution.

Create a shs-gcs.yaml deployments file which will be used to deploy shs service. 

 

 

 

pvc:
enablePVC: false
existingClaimName: nfs-pvc
eventsDir: “/”
nfs:
enableExampleNFS: false
pvName: nfs-pv
pvcName: nfs-pvc
gcs:
enableGCS: true
secret: history-secrets
key: tc-sc-bi-bigdata-ifwk-new-dev-48a2f0a984bb.json
logDirectory: gs://tc-sc-bi-bigdata-ingestion-dev-spark-on-k8s/eventsLogs/

******************************** Step 1 ********************************

(base) saurabhkumar@Saurabhs-MacBook-Pro stats % gcloud container clusters get-credentials spark-on-gke
Fetching cluster endpoint and auth data.
kubeconfig entry generated for spark-on-gke.

(base) saurabhkumar@Saurabhs-MacBook-Pro stats % kubectl cluster-info
Kubernetes master is running at https://10.2.4.110
GLBCDefaultBackend is running at https://10.2.4.110/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy
KubeDNS is running at https://10.2.4.110/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://10.2.4.110/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

******************************** Step 2 ********************************

(base) saurabhkumar@Saurabhs-MacBook-Pro stats % kubectl get secrets
NAME TYPE DATA AGE
default-token-2v6p5 kubernetes.io/service-account-token 3 71d
spark-sa Opaque 1 70d
(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % kubectl create secret generic history-secrets –from-file=gcp-project-48a2f0a984bb.json
secret/history-secrets created
(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % kubectl get secrets

NAME TYPE DATA AGE
default-token-2v6p5 kubernetes.io/service-account-token 3 71d
history-secrets Opaque 1 5s
sh.helm.release.v1.spark-history-server-1624358382.v1 helm.sh/release.v1 1 11m
spark-history-server-1624358382-token-mlh5j kubernetes.io/service-account-token 3 11m
spark-sa Opaque 1 70d

(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % kubectl describe secrets/history-secrets
Name: history-secrets
Namespace: default
Labels: <none>
Annotations: <none>

Type: Opaque

Data
====
gcp-project-48a2f0a984bb.json: 2358 bytes

******************************** Step 3 ********************************

(base) saurabhkumar@Saurabhs-MacBook-Pro stats % helm repo add stable https://charts.helm.sh/stable
“stable” already exists with the same configuration, skipping

(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % helm list -n ifw-reloaded
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
spark-history-server-1616415984 ifw-reloaded 1 2021-03-22 17:56:34.463601 +0530 IST deployed spark-history-server-1.4.3 2.4.0

(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % helm install stable/spark-history-server –values shs-gcs.yaml –generate-name
WARNING: This chart is deprecated
NAME: spark-history-server-1624360585
LAST DEPLOYED: Tue Jun 22 16:46:32 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Get the application URL by running the following commands. Note that the UI would take a minute or two to show up after the pods and services are ready.
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status by running ‘kubectl -n default get svc -w spark-history-server-1624360585′
export SERVICE_IP=$(kubectl get svc –namespace default spark-history-server-1624360585 -o jsonpath='{.status.loadBalancer.ingress[0].ip}’)
NOTE: If on OpenShift, run the following command instead:
export SERVICE_IP=$(oc get svc –namespace default spark-history-server-1624360585 -o jsonpath='{.status.loadBalancer.ingress[0].hostname}’)
echo http://$SERVICE_IP:map[name:http-historyport number:18080]

******************************** Step 4 ********************************
(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 71d
spark-history-server-1624360585 LoadBalancer 10.1.255.20 <pending> 18080:31739/TCP 17s

(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 71d
spark-history-server-1624360585 LoadBalancer 10.1.255.20 10.1.0.113 18080:31739/TCP 54s
******************************** Step 5 ********************************

This is to uninstall shs in one go.
(base) saurabhkumar@Saurabhs-MacBook-Pro spark-3.1.1-bin-hadoop2.7 % helm uninstall spark-history-server-1616415984 -n ifw-reloaded
Error: uninstallation completed with 2 error(s): clusterrolebindings.rbac.authorization.k8s.io “spark-history-server-1616415984-crb” is forbidden: User “system:serviceaccount:default:ifw-team” cannot delete resource “clusterrolebindings” in API group “rbac.authorization.k8s.io” at the cluster scope; clusterroles.rbac.authorization.k8s.io “spark-history-server-1616415984-cr” is forbidden: User “system:serviceaccount:default:ifw-team” cannot delete resource “clusterroles” in API group “rbac.authorization.k8s.io” at the cluster scope

 

Please feel free to give your valuable feedback.