Insights generator project (#1066)

* InsightsGenerator project template files

* Add insights projects to SLN

* Setting up siggen class (#1003)

* Setting up siggen class

* fixed the learn method

* Refactoring code
Fixed compile errors

* renamed results to result

* Basic transformation logic (#1004)

* Fix a couple bugs and add a simple test (#1007)

* Fix a couple bugs and add a simple test

* More tests and bug fix

* Nara/workflow (#1006)

* added a queue processor

* ordered  using statements

* Armemon/analytics (#1008)

* Basic transformation logic

* changed some structure of siggen

* added sum and average method, as well as select rows by input name

* add insights to results

* min, max added

Co-authored-by: Karl Burtram <karlb@microsoft.com>
Co-authored-by: Aasim Khan <aasimkhan30@gmail.com>
Co-authored-by: Arslan Memon <armemon@microsoft.com>

* Added rules engine base implementation (#1005)

* Added rules engine base implementation

* update comments

* addressing comments

* adding template text to columnheaders object

* adding template text to columnheaders object

* fixing columnheaders class

* Added test

* Added Template Parser unit test in Test project

* Deleted unnecessary files and reverted the files that were modified by mistake

Co-authored-by: Jinjing Arima <jiarima@microsoft.com>

* Insights generator message handler placeholder (#1013)

* Aasim/insights/insight methods (#1014)

* Basic transformation logic

* changed some structure of siggen

* Added top and bottom insight functions

* Added top, bottom insights
Added tests for top, bottom insights

* Armemon/insights2 (#1011)

* max and min insightsperslice, and tests

* got rid of unneccesssary function

* get indexes

Co-authored-by: Arslan Memon <armemon@microsoft.com>

* Armemon/insights2 (#1012)

* max and min insightsperslice, and tests

* got rid of unneccesssary function

* get indexes

* learn for stringinputtyype

* add learn implentation

Co-authored-by: Arslan Memon <armemon@microsoft.com>

* Added Tests
Removed duplicate methods

Co-authored-by: Karl Burtram <karlb@microsoft.com>
Co-authored-by: arslan9955 <53170027+arslan9955@users.noreply.github.com>
Co-authored-by: Arslan Memon <armemon@microsoft.com>

* Armemon/insights2 (#1016)

* Basic transformation logic

* changed some structure of siggen

* Added top and bottom insight functions

* Added top, bottom insights
Added tests for top, bottom insights

* max and min insightsperslice, and tests

* got rid of unneccesssary function

* get indexes

* learn for stringinputtyype

* add learn implentation

* add unique inputs

* fix merge error

* add to result

Co-authored-by: Karl Burtram <karlb@microsoft.com>
Co-authored-by: Aasim Khan <aasimkhan30@gmail.com>
Co-authored-by: Arslan Memon <armemon@microsoft.com>

* Added all the templates (#1015)

* Added all the templates
Added a method to find matched template

* Added a function to replace # and ## values in a template

* Added ReplaceHashesInTemplate call

* Added comments

* Updated the template txt

* Updated GetTopHeadersWithHash function to add #toplist

* Updated the logic per offline discussion with Hermineh

* Update request handler contract to take array (#1020)

* added rulesengine findmatchingtemplate (#1019)

* Add support for getting DacFx deploy options from a publish profile (#995)

* add support for getting options from a publish profile

* update comments

* set values for default options if they aren't specified in the publish profile

* addressing comments

* Updating to latest DacFx for a bug fix (#1010)

* added rulesengine findmatchingtemplate

* Update DacFx deploy and generate script with options (#998)

* update deploy and generate script to accept deployment options

* add tests

* add test with option set to true

* merge

* merge

* incorporated FindMatchedTemplate

Co-authored-by: Kim Santiago <31145923+kisantia@users.noreply.github.com>
Co-authored-by: Udeesha Gautam <46980425+udeeshagautam@users.noreply.github.com>

* -Added logic for Insights Generator Service Handler (#1017)

* -Added logic for Insights Generator Service Handler

* Fixed some logic

* Adding workflow test

* Update transform and add tests (#1024)

* Jiarima/fix rules engine logic (#1021)

* Added all the templates
Added a method to find matched template

* Added a function to replace # and ## values in a template

* Added ReplaceHashesInTemplate call

* Added comments

* Updated the template txt

* Updated GetTopHeadersWithHash function to add #toplist

* Updated the logic per offline discussion with Hermineh

* Update with the fixes

* Updated template and foreach conditions

* Added distinct

* Updated tests according to the logic change (#1026)

* Nara/remove queing (#1023)

* loc update (#914)

* loc update

* loc updates

* Add support for getting DacFx deploy options from a publish profile (#995)

* add support for getting options from a publish profile

* update comments

* set values for default options if they aren't specified in the publish profile

* addressing comments

* Updating to latest DacFx for a bug fix (#1010)

* Update DacFx deploy and generate script with options (#998)

* update deploy and generate script to accept deployment options

* add tests

* add test with option set to true

* intermediate check in for merge, transformed not working

* intermediate check in for merge, transformed not working

* added test case

* merged

Co-authored-by: khoiph1 <khoiph@microsoft.com>
Co-authored-by: Kim Santiago <31145923+kisantia@users.noreply.github.com>
Co-authored-by: Udeesha Gautam <46980425+udeeshagautam@users.noreply.github.com>

* Output data types from transform (#1029)

* Fix bug process input_g (#1030)

* Fixed the insight generator service (#1028)

* Jiarima/added more testings (#1031)

* Added another test
Updated ReplaceHashesInTemplate function to return string instead of Template

* Added third test

* Merged

* Reverted the workflow file to match with the one in hack/insights

* Bugs fixes to hook insights up to ADS (#1033)

* Bug fixes for hack insights (#1032)

* Fixed the minColumn index bug in Data Transformation
Fixed the template matching logic.

* Adding changes from PR

* Try to fix Workflow tests

* Readd workflow tests

* Fix template load location

Co-authored-by: Aasim Khan <aasimkhan30@gmail.com>
Co-authored-by: Nara <NaraVen@users.noreply.github.com>
Co-authored-by: arslan9955 <53170027+arslan9955@users.noreply.github.com>
Co-authored-by: Arslan Memon <armemon@microsoft.com>
Co-authored-by: gadudhbh <68879970+gadudhbh@users.noreply.github.com>
Co-authored-by: Jinjing Arima <jiarima@microsoft.com>
Co-authored-by: jiarima <68882862+jiarima@users.noreply.github.com>
Co-authored-by: Kim Santiago <31145923+kisantia@users.noreply.github.com>
Co-authored-by: Udeesha Gautam <46980425+udeeshagautam@users.noreply.github.com>
Co-authored-by: khoiph1 <khoiph@microsoft.com>
This commit is contained in:
Karl Burtram
2020-09-03 14:15:51 -07:00
committed by GitHub
parent 9784c3eaa2
commit 5cf5b59a0d
29 changed files with 2281 additions and 12 deletions

View File

@@ -0,0 +1,462 @@
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Net.Sockets;
using System.Globalization;
namespace Microsoft.InsightsGenerator
{
class SignatureGenerator
{
private DataArray Table;
public SignatureGeneratorResult Result;
public SignatureGenerator(DataArray table)
{
this.Table = table;
Result = new SignatureGeneratorResult();
}
public SignatureGeneratorResult Learn()
{
var stringInputIndexes = new List<int>();
var timeInputIndexes = new List<int>();
var slicerIndexes = new List<int>();
var outputIndexes = new List<int>();
for (var i = 0; i < Table.TransformedColumnNames.Length; i++)
{
if (Table.TransformedColumnNames[i].Contains("input_g"))
{
stringInputIndexes.Add(i);
}
if (Table.TransformedColumnNames[i].Contains("input_t"))
{
timeInputIndexes.Add(i);
}
if (Table.TransformedColumnNames[i].Contains("slicer"))
{
slicerIndexes.Add(i);
}
if (Table.TransformedColumnNames[i].Contains("output"))
{
outputIndexes.Add(i);
}
}
foreach (int stringIndex in stringInputIndexes)
{
foreach (int outputIndex in outputIndexes)
{
ExecuteStringInputInsights(stringIndex, outputIndex);
foreach (int slicerIndex in slicerIndexes)
{
ExecuteStringInputSlicerInsights(stringIndex, outputIndex, slicerIndex);
}
}
}
return Result;
}
public void ExecuteStringInputInsights(int inputCol, int outputCol)
{
var n = Table.Cells.Length;
if (Table.Cells.Length >8) {
n = 3;
}
OverallAverageInsights(outputCol);
OverallBottomInsights(n, inputCol, outputCol);
OverallMaxInsights(outputCol);
OverallMinInsights(outputCol);
OverallSumInsights(outputCol);
OverallTopInsights(n, inputCol, outputCol);
UniqueInputsInsight(inputCol);
}
public void ExecuteStringInputSlicerInsights(int inputCol, int outputCol, int slicerCol)
{
var n = Table.Cells.Length;
if (Table.Cells.Length > 8)
{
n = 3;
}
if (Table.Cells.Length > 50)
{
n = 5;
}
SlicedMaxInsights(slicerCol, outputCol);
SlicedAverageInsights(slicerCol, outputCol);
SlicedBottomInsights(n, inputCol, slicerCol, outputCol);
SlicedSumInsights(slicerCol, outputCol);
SlicedPercentageInsights(slicerCol, outputCol);
SlicedSumInsights(slicerCol, outputCol);
SlicedMinInsights(slicerCol, outputCol);
SlicedTopInsights(n, inputCol, slicerCol, outputCol);
}
public void UniqueInputsInsight(int inputCol)
{
List<string> insight = new List<string>();
// Adding the insight identifier
insight.Add(SignatureGeneratorResult.uniqueInputsIdentifier);
var uniqueInputs = GetUniqueColumValues(inputCol);
insight.Add(uniqueInputs.Length.ToString());
insight.AddRange(uniqueInputs);
Result.Insights.Add(insight);
}
public void OverallTopInsights(long n, int inputColumn, int outputColumn)
{
List<string> insight = new List<string>();
// Adding the insight identifier
insight.Add(SignatureGeneratorResult.topInsightIdentifier);
insight.AddRange(GenericTop(Table.Cells, n, inputColumn, outputColumn));
Result.Insights.Add(insight);
}
public void SlicedTopInsights(long n, int inputColumn, int sliceColumn, int outputColumn)
{
List<string> insight = new List<string>();
// Adding the insight identifier
insight.Add(SignatureGeneratorResult.topSliceInsightIdentifier);
object[] slices = GetUniqueColumValues(sliceColumn);
insight.Add(slices.Length.ToString());
foreach (var slice in slices)
{
insight.Add(slice.ToString());
var sliceTable = CreateSliceBucket(sliceColumn, slice.ToString());
insight.AddRange(GenericTop(sliceTable, n, inputColumn, outputColumn));
}
Result.Insights.Add(insight);
}
public List<string> GenericTop(Object[][] table, long n, int inputColumn, int outputColumn)
{
List<string> insight = new List<string>();
Object[][] sortedTable = SortCellsByColumn(table, outputColumn);
double outputSum = CalculateColumnSum(sortedTable, outputColumn);
for (int i = sortedTable.Length - 1; i >= 0 && i >= sortedTable.Length - n; i--)
{
double percent = Percentage(Double.Parse(sortedTable[i][outputColumn].ToString()), outputSum);
string temp = String.Format("{0} ({1}) {2}%", sortedTable[i][inputColumn].ToString(), sortedTable[i][outputColumn].ToString(), percent);
insight.Add(temp);
}
// Adding the count of the result
insight.Insert(0, insight.Count.ToString());
return insight;
}
public void OverallBottomInsights(long n, int inputColumn, int outputColumn)
{
List<string> insight = new List<string>();
// Adding the insight identifier
insight.Add(SignatureGeneratorResult.bottomInsightIdentifier);
insight.AddRange(GenericBottom(Table.Cells, n, inputColumn, outputColumn));
Result.Insights.Add(insight);
}
public void SlicedBottomInsights(long n, int inputColumn, int sliceColumn, int outputColumn)
{
List<string> insight = new List<string>();
// Adding the insight identifier
insight.Add(SignatureGeneratorResult.bottomSliceInsightIdentifier);
object[] slices = GetUniqueColumValues(sliceColumn);
insight.Add(slices.Length.ToString());
foreach (var slice in slices)
{
insight.Add(slice.ToString());
var sliceTable = CreateSliceBucket(sliceColumn, slice.ToString());
insight.AddRange(GenericBottom(sliceTable, n, inputColumn, outputColumn));
}
Result.Insights.Add(insight);
}
public List<string> GenericBottom(Object[][] table, long n, int inputColumn, int outputColumn)
{
List<string> insight = new List<string>();
Object[][] sortedTable = SortCellsByColumn(table, outputColumn);
double outputSum = CalculateColumnSum(sortedTable, outputColumn);
for (int i = 0; i < n && i < sortedTable.Length; i++)
{
double percent = Percentage(Double.Parse(sortedTable[i][outputColumn].ToString()), outputSum);
string temp = String.Format("{0} ({1}) {2}%", sortedTable[i][inputColumn].ToString(), sortedTable[i][outputColumn].ToString(), percent);
insight.Add(temp);
}
// Adding the count of the result
insight.Insert(0, insight.Count.ToString());
return insight;
}
public void OverallAverageInsights(int colIndex)
{
var outputList = new List<string>();
outputList.Add(SignatureGeneratorResult.averageInsightIdentifier);
outputList.Add(CalculateColumnAverage(Table.Cells, colIndex).ToString());
Result.Insights.Add(outputList);
}
public void OverallSumInsights(int colIndex)
{
var outputList = new List<string>();
outputList.Add(SignatureGeneratorResult.sumInsightIdentifier);
outputList.Add(CalculateColumnSum(Table.Cells, colIndex).ToString());
Result.Insights.Add(outputList);
}
public void OverallMaxInsights(int colIndex)
{
var outputList = new List<string>();
outputList.Add(SignatureGeneratorResult.maxInsightIdentifier);
outputList.Add(CalculateColumnMax(Table.Cells, colIndex).ToString());
Result.Insights.Add(outputList);
}
public void OverallMinInsights(int colIndex)
{
var outputList = new List<string>();
outputList.Add(SignatureGeneratorResult.minInsightIdentifier);
outputList.Add(CalculateColumnMin(Table.Cells, colIndex).ToString());
Result.Insights.Add(outputList);
}
public void SlicedSumInsights(int sliceIndex, int colIndex)
{
var insight = new List<string>();
insight.Add(SignatureGeneratorResult.sumSliceInsightIdentifier);
object[] slices = GetUniqueColumValues(sliceIndex);
insight.Add(slices.Length.ToString());
foreach (var slice in slices)
{
insight.Add(slice.ToString());
var sliceTable = CreateSliceBucket(sliceIndex, slice.ToString());
insight.Add(CalculateColumnSum(sliceTable, colIndex).ToString());
}
Result.Insights.Add(insight);
}
public void SlicedMaxInsights(int sliceIndex, int colIndex)
{
var insight = new List<string>();
insight.Add(SignatureGeneratorResult.maxSliceInsightIdentifier);
object[] slices = GetUniqueColumValues(sliceIndex);
insight.Add(slices.Length.ToString());
foreach (var slice in slices)
{
insight.Add(slice.ToString());
var sliceTable = CreateSliceBucket(sliceIndex, slice.ToString());
insight.Add(CalculateColumnMax(sliceTable, colIndex).ToString());
}
Result.Insights.Add(insight);
}
public void SlicedMinInsights(int sliceIndex, int colIndex)
{
var insight = new List<string>();
insight.Add(SignatureGeneratorResult.minSliceInsightIdentifier);
object[] slices = GetUniqueColumValues(sliceIndex);
insight.Add(slices.Length.ToString());
foreach (var slice in slices)
{
insight.Add(slice.ToString());
var sliceTable = CreateSliceBucket(sliceIndex, slice.ToString());
insight.Add(CalculateColumnMin(sliceTable, colIndex).ToString());
}
Result.Insights.Add(insight);
}
public void SlicedAverageInsights(int sliceIndex, int colIndex)
{
var insight = new List<string>();
insight.Add(SignatureGeneratorResult.sumSliceInsightIdentifier);
object[] slices = GetUniqueColumValues(sliceIndex);
insight.Add(slices.Length.ToString());
foreach (var slice in slices)
{
insight.Add(slice.ToString());
var sliceTable = CreateSliceBucket(sliceIndex, slice.ToString());
insight.Add(CalculateColumnAverage(sliceTable, colIndex).ToString());
}
Result.Insights.Add(insight);
}
public void SlicedPercentageInsights(int sliceIndex, int colIndex)
{
var insight = new List<string>();
insight.Add(SignatureGeneratorResult.percentageSliceInsightIdentifier);
object[] slices = GetUniqueColumValues(sliceIndex);
insight.Add(slices.Length.ToString());
double totalSum = CalculateColumnSum(Table.Cells, colIndex);
foreach (var slice in slices)
{
insight.Add(slice.ToString());
var sliceTable = CreateSliceBucket(sliceIndex, slice.ToString());
double sliceSum = CalculateColumnSum(sliceTable, colIndex);
var percentagePerSlice = Percentage(sliceSum, totalSum);
insight.Add(percentagePerSlice.ToString());
}
Result.Insights.Add(insight);
}
private double CalculateColumnAverage(object[][] rows, int colIndex)
{
return Math.Round(CalculateColumnSum(rows, colIndex) / rows.Length, 2);
}
private double CalculateColumnSum(object[][] rows, int colIndex)
{
return Math.Round(rows.Sum(row => double.Parse(row[colIndex].ToString())), 2);
}
private double CalculateColumnPercentage(object[][] rows, int colIndex)
{
return rows.Sum(row => double.Parse(row[colIndex].ToString()));
}
private double CalculateColumnMin(object[][] rows, int colIndex)
{
return rows.Min(row => double.Parse(row[colIndex].ToString()));
}
private double CalculateColumnMax(object[][] rows, int colIndex)
{
return rows.Max(row => double.Parse(row[colIndex].ToString()));
}
private string[] GetUniqueColumValues(int colIndex)
{
return Table.Cells.Select(row => row[colIndex].ToString()).Distinct().ToArray();
}
public Object[][] CreateSliceBucket(int sliceColIndex, string sliceValue)
{
List<Object[]> slicedTable = new List<object[]>();
foreach (var row in Table.Cells)
{
if (row[sliceColIndex].Equals(sliceValue))
{
slicedTable.Add(DeepCloneRow(row));
}
}
return slicedTable.ToArray();
}
public object[][] SortCellsByColumn(Object[][] table, int colIndex)
{
var cellCopy = DeepCloneTable(table);
Comparer<Object> comparer = Comparer<Object>.Default;
switch (this.Table.ColumnDataType[colIndex])
{
case DataArray.DataType.Number:
Array.Sort<Object[]>(cellCopy, (x, y) => comparer.Compare(double.Parse(x[colIndex].ToString()), double.Parse(y[colIndex].ToString())));
break;
case DataArray.DataType.String:
Array.Sort<Object[]>(cellCopy, (x, y) => String.Compare(x[colIndex].ToString(), y[colIndex].ToString()));
break;
case DataArray.DataType.DateTime:
Array.Sort<Object[]>(cellCopy, (x, y) => DateTime.Compare(DateTime.Parse(x[colIndex].ToString()), DateTime.Parse(y[colIndex].ToString())));
break;
}
return cellCopy;
}
public Object[][] DeepCloneTable(object[][] table)
{
return table.Select(a => a.ToArray()).ToArray();
}
public Object[] DeepCloneRow(object[] row)
{
return row.Select(a => a).ToArray();
}
public double Percentage(double value, double sum)
{
return Math.Round((double)((value / sum) * 100), 2);
}
}
}
public class SignatureGeneratorResult
{
public SignatureGeneratorResult()
{
Insights = new List<List<string>>();
}
public List<List<string>> Insights { get; set; }
public static string topInsightIdentifier = "top";
public static string bottomInsightIdentifier = "bottom";
public static string topSliceInsightIdentifier = "topPerSlice";
public static string bottomSliceInsightIdentifier = "bottomPerSlice";
public static string averageInsightIdentifier = "average";
public static string sumInsightIdentifier = "sum";
public static string maxInsightIdentifier = "max";
public static string minInsightIdentifier = "min";
public static string averageSliceInsightIdentifier = "averagePerSlice";
public static string sumSliceInsightIdentifier = "sumPerSlice";
public static string percentageSliceInsightIdentifier = "percentagePerSlice";
public static string maxSliceInsightIdentifier = "maxPerSlice";
public static string minSliceInsightIdentifier = "minPerSlice";
public static string uniqueInputsIdentifier = "uniqueInputs";
}
/** Some general format about the output
* "time"/"string"
* "top", "3", " input (value) %OfValue ", " input (value) %OfValue ", " input (value) %OfValue "
* "top", "1", " input (value) %OfValue "
* "bottom", "3", " input (value) %OfValue ", " input (value) %OfValue ", " input (value) %OfValue "
* "average", "100"
* "mean", "100"
* "median", "100"
* "averageSlice", "#slice","nameofslice", "100", "nameofslice", "100", "nameofslice", "100"
* "topPerslice", "#slice", "nameofslice", "3", " input (value) %OfValue ", " input (value) %OfValue ", " input (value) %OfValue ",
* "nameofslice", "3", " input (value) %OfValue ", " input (value) %OfValue ", " input (value) %OfValue ",
* "nameofslice", "3", " input (value) %OfValue ", " input (value) %OfValue ", " input (value) %OfValue "
* ....
*
**/