azuredatastudio/extensions/mssql/notebooks/TSG/cluster-status.ipynb

{
    "metadata": {
        "kernelspec": {
            "name": "python3",
            "display_name": "Python 3"
        },
        "language_info": {
            "name": "python",
            "version": "3.6.6",
            "mimetype": "text/x-python",
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "pygments_lexer": "ipython3",
            "nbconvert_exporter": "python",
            "file_extension": ".py"
        }
    },
    "nbformat_minor": 2,
    "nbformat": 4,
    "cells": [
        {
            "cell_type": "markdown",
            "source": "![11811317_10153406249401648_2787740058697948111_n](https://raw.githubusercontent.com/Microsoft/sqlworkshops/master/graphics/solutions-microsoft-logo-small.png)\n\n# View the status of your big data cluster\nThis notebook allows you to see the status of the controller, master instance, and pools in your SQL Server big data cluster.\n\n## <span style=\"color:red\">Important Instructions</span>\n### **Before you begin, you will need:**\n* Big data cluster name\n* Controller username\n* Controller password\n* Controller endpoint \n\nYou can find the controller endpoint from the big data cluster dashboard in the Service Endpoints table. The endpoint is listed as **Cluster Management Service.**\n\nIf you do not know the credentials, ask the admin who deployed your cluster.\n\n### **Instructions**\n* For the best experience, click **Run Cells** on the toolbar above. This will automatically execute all code cells below and show the cluster status in each table.\n* When you click **Run Cells** for this Notebook, you will be prompted at the *Log in to your big data cluster* code cell to provide your login credentials. Follow the prompts and press enter to proceed.\n* **You won't need to modify any of the code cell contents** in this Notebook. If you accidentally made a change, you can reopen this Notebook from the bdc dashboard.\n\n\n",
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": "## **Dependencies**\r\n\r\n> This Notebook will try to install these dependencies for you.\r\n\r\n----------------------------------------------------------------\r\n<table>\r\n<colgroup>\r\n<col style=\"width: 10%\" />\r\n<col style=\"width: 10%\" />\r\n<col style=\"width: 10%\" />\r\n<col style=\"width: 10%\" />\r\n</colgroup>\r\n<thead>\r\n<tr class=\"header\">\r\n<th>Tool</th>\r\n<th>Required</th>\r\n<th>Description</th>\r\n</tr>\r\n</thead>\r\n<tbody>\r\n<tr class=\"odd\">\r\n<td><strong>mssqlctl</strong></td>\r\n<td>Yes</td>\r\n<td>Command-line tool for installing and managing a big data cluster.</td>\r\n</tr>\r\n<tr class=\"even\">\r\n<td><strong>pandas</strong></td>\r\n<td>Yes</td>\r\n<td>Python Library for formatting data (<a href=\"https://www.learnpython.org/en/Pandas_Basics\">More info</a>).</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<p>",
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": "### **Install latest version of mssqlctl**",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "import sys, platform\r\n\r\nif platform.system()==\"Windows\":\r\n    user = ' --user'\r\nelse:\r\n    user = ''\r\n\r\ndef executeCommand(cmd, successMsgs, printMsg):\r\n    print(printMsg)\r\n    cmdOutput = !{cmd}\r\n    cmdOutput = ''.join(cmdOutput)\r\n    if any(msg in cmdOutput for msg in successMsgs):\r\n        print(f\"\\nSuccess >> \" + cmd)\r\n    else:\r\n        raise SystemExit(f'\\nFailed during:\\n\\n\\t{cmd}\\n\\nreturned: \\n' + ''.join(cmdOutput) + '.\\n')\r\n\r\ninstallPath = 'https://private-repo.microsoft.com/python/ctp3.1/mssqlctl/requirements.txt'\r\ncmd = f'{sys.executable} -m pip uninstall --yes mssqlctl-cli-storage'\r\ncmdOutput = !{cmd}\r\n\r\ncmd = f'{sys.executable} -m pip uninstall -r {installPath} --yes'\r\nexecuteCommand(cmd, ['is not installed', 'Successfully uninstalled mssqlctl'], 'Uninstalling mssqlctl:')\r\n\r\ncmd = f'{sys.executable} -m pip install -r {installPath}{user} --trusted-host helsinki'\r\ncmdOutput = !{cmd}\r\nexecuteCommand(cmd, ['Requirement already satisfied', 'Successfully installed mssqlctl'], 'Installing the latest version of mssqlctl:')",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "### **Install latest version of pandas**",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "#install pandas\r\ncmd = f'{sys.executable} -m pip show pandas'\r\ncmdOutput = !{cmd}\r\nif len(cmdOutput) > 0 and '0.24' in cmdOutput[1]:\r\n    print('Pandas required version is already installed!')\r\nelse:\r\n    pandasVersion = 'pandas==0.24.2'\r\n    cmd = f'{sys.executable} -m pip install {pandasVersion}'\r\n    cmdOutput = !{cmd}\r\n    print(f'\\nSuccess: Upgraded pandas.')",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "## **Log in to your big data cluster**\r\nTo view cluster status, you will need to connect to your big data cluster through mssqlctl. \r\n\r\nWhen you run this code cell, you will be prompted for:\r\n- Cluster name\r\n- Controller username\r\n- Controller password\r\n\r\nTo proceed:\r\n- **Click** on the input box\r\n- **Type** the login info\r\n- **Press** enter.\r\n\r\nIf your cluster is missing a configuration file, you will be asked to provide your controller endpoint. (Format: **https://00.00.00.000:00000**)",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "import os, getpass, json\nimport pandas as pd\nimport numpy as np\nfrom IPython.display import *\n\ndef PromptForInfo(promptMsg, isPassword, errorMsg):\n    if isPassword:\n        promptResponse = getpass.getpass(prompt=promptMsg)\n    else:\n        promptResponse = input(promptMsg)\n    if promptResponse == \"\":\n        raise SystemExit(errorMsg + '\\n')\n    return promptResponse\n\n# Prompt user inputs:\ncluster_name = PromptForInfo('Please provide your Cluster Name: ', False, 'Cluster Name is required!')\n\ncontroller_username = PromptForInfo('Please provide your Controller Username for login: ', False, 'Controller Username is required!')\n\ncontroller_password = PromptForInfo('Controller Password: ', True, 'Password is required!')\nprint('***********')\n\n!mssqlctl logout\n# Login in to your big data cluster \ncmd = f'mssqlctl login -n {cluster_name} -u {controller_username} -a yes'\nprint(\"Start \" + cmd)\nos.environ['CONTROLLER_USERNAME'] = controller_username\nos.environ['CONTROLLER_PASSWORD'] = controller_password\nos.environ['ACCEPT_EULA'] = 'yes'\n\nloginResult = !{cmd}\nif 'ERROR: Please check your kube config or specify the correct controller endpoint with: --controller-endpoint https://<ip>:<port>.' in loginResult[0] or 'ERROR' in loginResult[0]:\n    controller_ip = input('Please provide your Controller endpoint: ')\n    if controller_ip == \"\":\n        raise SystemExit(f'Controller IP is required!' + '\\n')\n    else:\n        cmd = f'mssqlctl login -n {cluster_name} -e {controller_ip} -u {controller_username} -a yes'\n        loginResult = !{cmd}\nprint(loginResult)\n",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "## **Status of big data cluster**\r\nAfter you successfully login to your bdc, you can view the overall status of each container before drilling down into each component.",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "# Display status of big data cluster\ndef formatColumnNames(column):\n    return ' '.join(word[0].upper() + word[1:] for word in column.split())\n\npd.set_option('display.max_colwidth', -1)\ndef show_results(input):\n    input = ''.join(input)\n    results = json.loads(input)\n    df = pd.DataFrame(results)\n    df.columns = [formatColumnNames(n) for n in results[0].keys()]\n    mydata = HTML(df.to_html(render_links=True))\n    display(mydata)\n\nresults  = !mssqlctl bdc status show\nstrRes = ''.join(results)\njsonRes = json.loads(strRes)\ndtypes = '{'\nspark = [x for x in jsonRes if x['kind'] == 'Spark']\nif spark:\n    spark_exists = True\nelse:\n    spark_exists = False\nshow_results(results)",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "## **Cluster Status**\r\nFor each cluster component below, running each code cell will generate a table. This table will include:\r\n\r\n----------------------------------------------------------------\r\n<table>\r\n<colgroup>\r\n<col style=\"width: 10%\" />\r\n<col style=\"width: 10%\" />\r\n</colgroup>\r\n<thead>\r\n<tr class=\"header\">\r\n<th>Column Name</th>\r\n<th>Description</th>\r\n</tr>\r\n</thead>\r\n<tbody>\r\n<tr class=\"odd\">\r\n<td><strong>Kind</strong></td>\r\n<td>Identifies if component is a pod or a set.</td>\r\n</tr>\r\n<tr class=\"even\">\r\n<td><strong>LogsURL</strong></td>\r\n<td>Link to <a href=\"https://www.elastic.co/guide/en/kibana/current/introduction.html\">Kibana</a> logs which is used for troubleshooting.</td>\r\n</tr>\r\n<tr class=\"odd\">\r\n<td><strong>Name</strong></td>\r\n<td>Provides the specific name of the pod or set.</td>\r\n</tr>\r\n<tr class=\"even\">\r\n<td><strong>NodeMetricsURL</strong></td>\r\n<td>Link to <a href=\"https://grafana.com/docs/guides/basic_concepts/\">Grafana</a> dashboard to view key metrics of the node.</td>\r\n</tr>\r\n<tr class=\"odd\">\r\n<td><strong>SQLMetricsURL</strong></td>\r\n<td>Link to <a href=\"https://grafana.com/docs/guides/basic_concepts/\">Grafana</a> dashboard to view key metrics of the SQL instance</td>\r\n</tr>\r\n<tr class=\"even\">\r\n<td><strong>State</strong></td>\r\n<td>Indicates state of the pod or set..</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n<p>",
            "metadata": {}
        },
        {
            "cell_type": "markdown",
            "source": "### **Controller status**\nTo learn more about the controller, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-controller?view=sql-server-ver15)",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "# Display status of controller\nresults = !mssqlctl bdc control status show\nshow_results(results)",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "### **Master Instance status**\nTo learn more about the master instance, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-master-instance?view=sqlallproducts-allversions)",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "# Display status of master instance\nresults = !mssqlctl bdc pool status show -k master -n default\nshow_results(results)",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "### **Compute Pool status**\nTo learn more about compute pool, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-compute-pool?view=sqlallproducts-allversions)",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "# Display status of compute pool\nresults = !mssqlctl bdc pool status show -k compute -n default\nshow_results(results)",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "### **Storage Pool status**\nTo learn more about storage pool, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-storage-pool?view=sqlallproducts-allversions)",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "# Display status of storage pools\nresults = !mssqlctl bdc pool status show -k storage -n default\nshow_results(results)",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "### **Data Pool status**\nTo learn more about data pool, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-data-pool?view=sqlallproducts-allversions)",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "# Display status of data pools\nresults = !mssqlctl bdc pool status show -k data -n default\nshow_results(results)",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        },
        {
            "cell_type": "markdown",
            "source": "### **Spark Pool status**\nDisplays status of spark pool if it exists. Otherwise, will show as \"No spark pool.\"",
            "metadata": {}
        },
        {
            "cell_type": "code",
            "source": "# Display status of spark pool\nif spark_exists:\n    results = !mssqlctl bdc pool status show -k spark -n default\n    show_results(results)\nelse:\n    print('No spark pool.')",
            "metadata": {},
            "outputs": [],
            "execution_count": 0
        }
    ]
}