August 12, 2013

Power Query: Creating a Function to Correct and and Verify Addresses via External API

One of the features I found very helpful in Power Query for Excel (former codename Data Explorer)is the ability to interact with external APIs as this opens the door to many interesting use cases. In this post, I want to show how we can use Melissa Data Address Check - Verify, Correct, Geocode US and Canadian Addresses service available in Windows Azure Marketplace to create a Power Query parameterized function that can be later referenced and re-used by other queries. This address cleansing service offers up to 1000 records per month for free, but you would have to pay a monthly fee beyond that point. Although the the terms functions and APIs may intimidate some of us, you will find that the Power Query makes the process rather simple, at least in this case.

Before you start
In order to complete the steps in this post you will need:

Getting familiar with the API

We will start by spending some time on getting familiar with the service we want to use. In this case, the details page of Melissa Data service we want to use has the basic information we need to get started: the Service root URL and Input Parameters

Using the API within Power Query

Power Query for Excel has many options hen it comes to source data (e.g. From Web, From Database, From Other sources, etc.) so it took me a little of guessing as to which option to use in order to get send and receive data to this API. It turns out that using From Odata Feed and entering the Service root URL does the trick.

Upon clicking OK, we notice that Power Query detects the SuggestedAddresses function exposed by the API, and provide us with an Invoke option within the Query editor.

Let’s go ahead and use the Invoke button. You will get a new dialog prompting you for the input parameters. As we saw earlier, the services requires an address value, the maximum numbers of suggested address – this is in case the service has multiple suggested address – and the minimal level of confidence – a value from 0 to 1, representing the level of confidence of the service that the cleansing results are accurate. Let’s go ahead and enter some values to test it.

As you can see, the results returned by the API are nicely displayed in a tabular format inside of the preview pane of the query editor, ad if you click done, the results are place in an Excel table, where they can be easily refreshed and further combined with other data sets. Quite easy!

But, do you see a caveat here? so far, we just were able to call the Service API for a single address value that we have to type in the invoke function dialog, which won't work if we have to clean 100’s or perhaps thousands of addresses at the time. So, how could we invoke this function for each record in let’s say, another data set?

Invoking a function for each record in a table

First, let’s revisit the function query we just created to remove the InvokeSuggestedAddresses step and rename the query so we give it a more friendlier name.This should leave our function ready to be invoked from other queries.

Now, let’s suppose we have an Excel table with multiple address records that we want to send to Melissa Data service for cleansing and validation. In this case, we will create a new Power Query using the From Table option as shown below.

When the query editor opens, select the column containing the address to be cleaned, in our case the address column, and right-click. Then select the Insert Column --> Custom option. Once the Insert Custom Column dialog opens, enter the following expression to invoke the function we just created. Click Ok.


As you can see, all we are doing is invoking fCleanAddress function and passing the 3 required parameters. The address parameter will be given by the Address column available in the data set – That’s the trick! MaximumSuggestions and MinimumConfidence are hardcoded to 1 and 0 respectively - You can change that if you want, or read those values from your datset.


You should see now a new column called custom (feel free to rename it) and the work Table as it values. This means the results in the new column need to be further expanded in order to be able to see the detailed results from each call to Melissa Data API.

You can expand the custom column fields by clicking in the double arrow icon next to the column name. At this point you are presented with all available fields and have the ability to select/unselect the desired fields.
Once you are happy with the Selection of fields, click OK and then Done to get the query results in an excel table

Taking it further

The example above, uses an static Excel table as the starting point, but you can use this same approach to invoke the function and get API results for records coming from other types of sources that are more dynamic in nature such as queries From Web or From Database. by the way, I would not expect this type of solution to have split of a second response times as in this case data goes to the internet and back and we have no control over the API layer.

Preview version warning – the information in this post is current for July 2013 Power Query preview (Version: 1.5.3296.1161)  and it may no be subject to change for RTM version.

No comments:

Post a Comment

I will love to hear your feedback and comments. Thanks.