Move Save As to ResultSet (#181)

It's an overhaul of the Save As mechanism to utilize the file reader/writer classes to better align with the patterns laid out by the rest of the query execution. Why make this change? This change makes our code base more uniform and adherent to the patterns/paradigms we've set up. This change also helps with the encapsulation of the classes to "separate the concerns" of each component of the save as function. 

* Replumbing the save as execution to pass the call down the query stack as QueryExecutionService->Query->Batch->ResultSet
    * Each layer performs it's own parameter checking
        * QueryExecutionService checks if the query exists
        * Query checks if the batch exists
        * Batch checks if the result set exists
        * ResultSet checks if the row counts are valid and if the result set has been executed
    * Success/Failure delegates are passed down the chain as well
* Determination of whether a save request is a "selection" moved to the SaveResultsRequest class to eliminate duplication of code and creation of utility classes
* Making the IFileStream* classes more generic
    * Removing the requirements of max characters to store from the GetWriter method, and moving it into the constructor for the temporary buffer writer - the values have been moved to the settings and given defaults
    * Removing the individual type writers from IFileStreamWriter
    * Removing the individual type writers from IFIleStreamReader
* Adding a new overload for WriteRow to IFileStreamWriter that will write out data, given a row's worth of data and the list of columns
* Creating a new IFileStreamFactory that creates a reader/writer pair for reading from the temporary files and writing to CSV files
* Creating a new IFileStreamFactory that creates a reader/writer pair for reading from the temporary files and writing to JSON files
* Dramatically simplified the CSV encoding functionality
* Removed duplicated logic for saving in different types and condensed down to a single chain that only differs based on what type of factory is provided
* Removing the logic for managing the list of save as tasks, since the ResultSet now performs the actual saving work, there's no real need to expose the internals of the ResultSet
* Adding new strings to the sr.strings file for save as error messages
* Completely rewriting the unit tests for the save as mechanism. Very fine grained unit tests now that should cover majority of cases (aside from race conditions)


* Refactoring maxchars params into settings and out of file stream factory

* Removing write*/read* methods from file stream readers/writers

* Migrating the CSV save as to the resultset

* Tweaks to unit testing to eliminate writing files to disk

* WIP, moving to a base class for save results writers

* Everything is wired up and compiles

* Adding unit tests for CSV encoding

* Adding unit tests for CSV and Json writers

* Adding tests to the result set for saving

* Refactor to throw exceptions on errors instead of calling failure handler

* Unit tests for batch/query argument in range

* Unit tests

* Adding service integration unit tests

* Final polish, copyright notices, etc

* Adding NULL logic

* Fixing issue of unicode to utf8

* Fixing issues as per @kburtram code review comments

* Adding files that got broken?
This commit is contained in:
Benjamin Russell
2016-12-21 17:52:34 -08:00
committed by GitHub
parent adc9672fa3
commit 7ea1b1bb87
29 changed files with 1880 additions and 918 deletions

View File

@@ -0,0 +1,118 @@
//
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.
//
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using Microsoft.SqlTools.ServiceLayer.QueryExecution.Contracts;
namespace Microsoft.SqlTools.ServiceLayer.QueryExecution.DataStorage
{
/// <summary>
/// Writer for writing rows of results to a CSV file
/// </summary>
public class SaveAsCsvFileStreamWriter : SaveAsStreamWriter
{
#region Member Variables
private readonly SaveResultsAsCsvRequestParams saveParams;
private bool headerWritten;
#endregion
/// <summary>
/// Constructor, stores the CSV specific request params locally, chains into the base
/// constructor
/// </summary>
/// <param name="stream">FileStream to access the CSV file output</param>
/// <param name="requestParams">CSV save as request parameters</param>
public SaveAsCsvFileStreamWriter(Stream stream, SaveResultsAsCsvRequestParams requestParams)
: base(stream, requestParams)
{
saveParams = requestParams;
}
/// <summary>
/// Writes a row of data as a CSV row. If this is the first row and the user has requested
/// it, the headers for the column will be emitted as well.
/// </summary>
/// <param name="row">The data of the row to output to the file</param>
/// <param name="columns">
/// The entire list of columns for the result set. They will be filtered down as per the
/// request params.
/// </param>
public override void WriteRow(IList<DbCellValue> row, IList<DbColumnWrapper> columns)
{
// Write out the header if we haven't already and the user chose to have it
if (saveParams.IncludeHeaders && !headerWritten)
{
// Build the string
var selectedColumns = columns.Skip(ColumnStartIndex ?? 0).Take(ColumnCount ?? columns.Count)
.Select(c => EncodeCsvField(c.ColumnName) ?? string.Empty);
string headerLine = string.Join(",", selectedColumns);
// Encode it and write it out
byte[] headerBytes = Encoding.UTF8.GetBytes(headerLine + Environment.NewLine);
FileStream.Write(headerBytes, 0, headerBytes.Length);
headerWritten = true;
}
// Build the string for the row
var selectedCells = row.Skip(ColumnStartIndex ?? 0)
.Take(ColumnCount ?? columns.Count)
.Select(c => EncodeCsvField(c.DisplayValue));
string rowLine = string.Join(",", selectedCells);
// Encode it and write it out
byte[] rowBytes = Encoding.UTF8.GetBytes(rowLine + Environment.NewLine);
FileStream.Write(rowBytes, 0, rowBytes.Length);
}
/// <summary>
/// Encodes a single field for inserting into a CSV record. The following rules are applied:
/// <list type="bullet">
/// <item><description>All double quotes (") are replaced with a pair of consecutive double quotes</description></item>
/// </list>
/// The entire field is also surrounded by a pair of double quotes if any of the following conditions are met:
/// <list type="bullet">
/// <item><description>The field begins or ends with a space</description></item>
/// <item><description>The field begins or ends with a tab</description></item>
/// <item><description>The field contains the ListSeparator string</description></item>
/// <item><description>The field contains the '\n' character</description></item>
/// <item><description>The field contains the '\r' character</description></item>
/// <item><description>The field contains the '"' character</description></item>
/// </list>
/// </summary>
/// <param name="field">The field to encode</param>
/// <returns>The CSV encoded version of the original field</returns>
internal static string EncodeCsvField(string field)
{
// Special case for nulls
if (field == null)
{
return "NULL";
}
// Whether this field has special characters which require it to be embedded in quotes
bool embedInQuotes = field.IndexOfAny(new[] {',', '\r', '\n', '"'}) >= 0 // Contains special characters
|| field.StartsWith(" ") || field.EndsWith(" ") // Start/Ends with space
|| field.StartsWith("\t") || field.EndsWith("\t"); // Starts/Ends with tab
//Replace all quotes in the original field with double quotes
string ret = field.Replace("\"", "\"\"");
if (embedInQuotes)
{
ret = $"\"{ret}\"";
}
return ret;
}
}
}