Pages

Showing posts with label Power Query. Show all posts
Showing posts with label Power Query. Show all posts

February 19, 2014

Power Query: Consuming SSIS Upgrade Wizard Output File


Have you ever had the need to create summary and detailed reports to understand the results of a SSIS upgrade exercise such as  packages by  type of  errors and  warnings; the number of packages that upgraded successfully/unsuccessfully?

Upgrading SSIS packages is in most cases a straightforward exercise using the SSIS upgrade wizard. However, the output file produced by the wizard with the upgrade results (warning, errors, upgrade status, etc.) is very verbose and formatted in a way that makes it very hard to consume when upgrading a large number of packages.

I recently experienced this pain when working in a project that required to upgrade more than 1,800 SSIS packages that produced an upgrade output file with more than 35,000 lines worth of messages! See snippet of the output file below.

SNAGHTML2a116fde


Since I typically use the output file to get a better sense of the effort involved to complete the upgrade of all packages, I quickly realized that for this project I would need to come up with a way to consume such a large output file. Fortunately, I was able to create an excel report like the one below using Power Query in a very simple way. As you can see below, it is very easy to get a sense of how difficult the upgrade is going to be. In the example below, I can quickly see there are only 14 packages that had errors ad 599 with warnings. Additionally, I can easily get a break down of the type of errors and the get the list of packages by message code/type.


image

If you are interested in creating a similar report, you can download a copy of a report template I created: SSISUpgradeResultsReport Template.xlsx


Note: Make sure you have Power Query for Excel installed and you edit pqSSISUpgradeReport Power Query query to point to the location where your SSIS upgrade wizard output file is stored ( See query below)

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("D:\SSIS\SSIS Upgrade report all packages 2013 10 09.txt"))}),
    FilteredRowsBlanks = Table.SelectRows(Source, each ([Column1] <> "" and [Column1] <> "#(tab)" and [Column1] <> "#(tab)Messages" and [Column1] <> "The Upgrade is Complete")),
    InsertedCustom = Table.AddColumn(FilteredRowsBlanks, "Custom", each if Text.Range([Column1],0,19)="- Upgrading package" then [Column1] else null),
    FillDown = Table.FillDown( InsertedCustom,"Custom"),
    FilteredRows1 = Table.SelectRows(FillDown, each not Text.Contains([Column1], "- Upgrading package")),
    SplitColumnDelimiter = Table.SplitColumn(FilteredRows1,"Custom",Splitter.SplitTextByEachDelimiter({"("}, null, true),2),
    ReplacedValue = Table.ReplaceValue(SplitColumnDelimiter,")","",Replacer.ReplaceText,{"Custom.2"}),
    ReplacedValue1 = Table.ReplaceValue(ReplacedValue,"- Upgrading package","",Replacer.ReplaceText,{"Custom.1"}),
    SplitColumnDelimiter1 = Table.SplitColumn(ReplacedValue1,"Custom.1",Splitter.SplitTextByEachDelimiter({"\"}, null, true),2),
    RenamedColumns = Table.RenameColumns(SplitColumnDelimiter1,{{"Column1", "Mesage"}, {"Custom.1.1", "Folder"}, {"Custom.1.2", "PackageName"}, {"Custom.2", "UpgradeResult"}}),
    ReorderedColumns = Table.ReorderColumns(RenamedColumns,{"Folder", "PackageName", "UpgradeResult", "Mesage"}),
    SplitColumnDelimiter2 = Table.SplitColumn(ReorderedColumns,"Mesage",Splitter.SplitTextByEachDelimiter({":"}, null, false),2),
    RenamedColumns1 = Table.RenameColumns(SplitColumnDelimiter2,{{"Mesage.1", "Message Code"}, {"Mesage.2", "Message"}}),
    TrimmedMessageCode = Table.TransformColumns(RenamedColumns1,{{"Message Code", Text.Trim}}),
    Lowercase = Table.TransformColumns(TrimmedMessageCode,{{"Message Code", Text.Trim}, {"PackageName", Text.Lower}})
in
    Lowercase

September 4, 2013

Power Query: Stand Alone Vs. Corporate version

One thing that caught me off guard as I was trying to use Power Query against Power BI for Office 365 was the fact that there are 2 versions of Power Query. This actually seem to have confused at least one other person already, so here is some information in case you face the same situation:twins
Microsoft Power Query for Excel - version 1.5.3296.2082
http://www.microsoft.com/en-us/download/details.aspx?id=39379
  Microsoft Power Query Preview - version 2.6.3387.121
http://www.microsoft.com/en-us/download/details.aspx?id=39933

How are they different?

Version 1.x is the first release of Power Query the one that reached General availability on July 8 2013. This means, among other things, it is safe to run it in production environments. This is also the “standalone” version meaning it does not have the options to connect and share Queries via Power BI for Office 365 Version 2.x is  the the so called “corporate” which means has the options to interact with Office 365 Power BI sites. It is important to note that as of this date, this version is in the  PREVIEW stage which is some sort of beta period where features and functionality are still being finalized  and its use in production environments is not recommended.   So, if you are trying to use Power Query for Excel against Power BI for Office 365 make sure you have the 2.x version!




September 3, 2013

Power BI for Office 365: Installing Power BI App from SharePoint Store

I got my invitation email to the preview of Power BI for Office 365 few days ago and while the provisioning process was pretty straight forward, I run into a minor issue that needed few extra steps that were not covered in the Power BI Provisioning Video. In my case, I did not have Power BI as one of the available apps of the team site I wanted to use as shown in the video (~3:45 in video). Fortunately, there is a provisioning guide document that provide the steps to manually install Power BI app from the SharePoint store.
image
Thanks to Gilad Elyashar  for pointing me in the right direction to get this issue solved by replying to my post in the power BI forum. If you run into this or any other issue, I encourage you to make use of the Power BI forums to get answer to your questions

August 12, 2013

Power Query: Creating a Function to Correct and and Verify Addresses via External API

One of the features I found very helpful in Power Query for Excel (former codename Data Explorer)is the ability to interact with external APIs as this opens the door to many interesting use cases. In this post, I want to show how we can use Melissa Data Address Check - Verify, Correct, Geocode US and Canadian Addresses service available in Windows Azure Marketplace to create a Power Query parameterized function that can be later referenced and re-used by other queries. This address cleansing service offers up to 1000 records per month for free, but you would have to pay a monthly fee beyond that point. Although the the terms functions and APIs may intimidate some of us, you will find that the Power Query makes the process rather simple, at least in this case.

Before you start
In order to complete the steps in this post you will need:

Getting familiar with the API

We will start by spending some time on getting familiar with the service we want to use. In this case, the details page of Melissa Data service we want to use has the basic information we need to get started: the Service root URL and Input Parameters
image

Using the API within Power Query

Power Query for Excel has many options hen it comes to source data (e.g. From Web, From Database, From Other sources, etc.) so it took me a little of guessing as to which option to use in order to get send and receive data to this API. It turns out that using From Odata Feed and entering the Service root URL does the trick.
image

Upon clicking OK, we notice that Power Query detects the SuggestedAddresses function exposed by the API, and provide us with an Invoke option within the Query editor.
image

Let’s go ahead and use the Invoke button. You will get a new dialog prompting you for the input parameters. As we saw earlier, the services requires an address value, the maximum numbers of suggested address – this is in case the service has multiple suggested address – and the minimal level of confidence – a value from 0 to 1, representing the level of confidence of the service that the cleansing results are accurate. Let’s go ahead and enter some values to test it.
image

As you can see, the results returned by the API are nicely displayed in a tabular format inside of the preview pane of the query editor, ad if you click done, the results are place in an Excel table, where they can be easily refreshed and further combined with other data sets. Quite easy!
image

But, do you see a caveat here? so far, we just were able to call the Service API for a single address value that we have to type in the invoke function dialog, which won't work if we have to clean 100’s or perhaps thousands of addresses at the time. So, how could we invoke this function for each record in let’s say, another data set?

Invoking a function for each record in a table

First, let’s revisit the function query we just created to remove the InvokeSuggestedAddresses step and rename the query so we give it a more friendlier name.This should leave our function ready to be invoked from other queries.
image

Now, let’s suppose we have an Excel table with multiple address records that we want to send to Melissa Data service for cleansing and validation. In this case, we will create a new Power Query using the From Table option as shown below.
image

When the query editor opens, select the column containing the address to be cleaned, in our case the address column, and right-click. Then select the Insert Column --> Custom option. Once the Insert Custom Column dialog opens, enter the following expression to invoke the function we just created. Click Ok.

fCleanAddress([Address],1,0)

As you can see, all we are doing is invoking fCleanAddress function and passing the 3 required parameters. The address parameter will be given by the Address column available in the data set – That’s the trick! MaximumSuggestions and MinimumConfidence are hardcoded to 1 and 0 respectively - You can change that if you want, or read those values from your datset.

image

You should see now a new column called custom (feel free to rename it) and the work Table as it values. This means the results in the new column need to be further expanded in order to be able to see the detailed results from each call to Melissa Data API.
image

You can expand the custom column fields by clicking in the double arrow icon next to the column name. At this point you are presented with all available fields and have the ability to select/unselect the desired fields.
image
Once you are happy with the Selection of fields, click OK and then Done to get the query results in an excel table

Taking it further

The example above, uses an static Excel table as the starting point, but you can use this same approach to invoke the function and get API results for records coming from other types of sources that are more dynamic in nature such as queries From Web or From Database. by the way, I would not expect this type of solution to have split of a second response times as in this case data goes to the internet and back and we have no control over the API layer.

Preview version warning – the information in this post is current for July 2013 Power Query preview (Version: 1.5.3296.1161)  and it may no be subject to change for RTM version.

March 1, 2013

Power Query for Excel: The Next Piece Of The Self-Service Puzzle

Microsoft released the preview of Power Query Data Explorer for Excel, a new add-in for Excel 2010 and 2013 that promises to bring powerful capabilities for searching, transforming, shaping and merging data to excel users. A clear sign that Microsoft’ strategy to democratize BI is still in full swing and perhaps aligned with industry demands for Business Intelligence self-service platforms.

From Data Explorer preview download page:
Microsoft “Data Explorer” Preview for Excel is a new add-in that provides a seamless experience for data discovery,
data transformation and enrichment for Information Workers, BI professionals and other users.

Do you mean  Self –Service ETL?

Maybe, but in any case Data Explorer sounds a very exciting proposal given that data integration is the biggest bottle neck most BI initiatives experience, and where more often users must rely on technologies and skillset accessible only via IT departments.

Data Explorer First Glance

Once it is installed, Data Explorer manifest as new tab in the Excel ribbon.
image

From there you can do:

Online Search

Search for data sources on the web – at the moment all results seemed constraint to Wikipedia pages
SNAGHTML1ec4666

 

Get External Data

These options provides the ability to connect and import data from a number of sources. A query is the unit of work when getting external data – a query contains 1 or more steps
    SNAGHTML1fabbc1
    • From Web: Once you provide a URL, it allows you to import data from a webpage in tabular format. I was able to easily obtain data from quite a few pages (e.g Zip Codes with the Highest Median Household Income in the United States), although I imagine there are constraints as how the data is layout in the page for data explorer to work.
    • From File: quite a few options to load data from files in different formats, including XML and Excel.
    • From Database: a predefined list of major relational database types. I found some major players missing from the list such as Sybase and Netezza – I would imagine support for additional providers will be added later on.
    • From Other Sources: several interesting options here such as SharePoint lists, Odata and HDFS

SNAGHTML1fee6c5
Notice that each time you use these options a new query is created and then you are given the opportunity to transform, manipulate and shape the data. What is even more powerful is that each action is recorded in the form of a ‘step’ that get replayed each time the query is refresh. Query steps are expressed via formulas that are based new set of functions. See “Data Explorer” formula reference for more details.

Combine Data

Once data has been imported you can combine it.
    • Merge: Allows you to bring columns from 2 different queries together – similar to a  JOIN in the SQL relational world.
    • Append: Creates a new query with all records of a first query followed by the rows of a second one – similar to what a UNION ALL does in the SQL relational world does.

I created a quick sample workbook that gets data from Northwind Odata feed and another external web page (top Zip Codes with the Highest Median Household Income in the United States). It is not the most meaningful scenario but will let you get the general idea of how this stuff works. You can see a copy this sample here:


What’s next?

You can download Data Explorer and start learning about it:
Microsoft "Data Explorer" Preview for Excel
Data Explorer Help
Microsoft "Data Explorer" Preview for Excel blog

Summary

Data Explorer could play a key role in Microsoft self-service BI platform  as it reduces the friction typical BI users experience when searching, collecting and connecting source data. However, as a component of a major system it will be interesting to see what steps Microsoft will take. Few things come to mind
  • What would the “sharing” story be?
  • What would the progression/graduation from self-service to  IT-managed solution looks like? Perhaps a Data Explorer to SSIS path – I am sure many of us will like that
  • Would we get a tighter integration with the other Excel “Powers” (PowerPivot & PowerView)
  • Will it integrate with Data Quality and Master data Services?
  • Will IT be given tools that allows it to have the some sort of oversight ?