Microsoft Excel Power Query- A Free and Powerful Process Automation Tool

Depending on one’s occupation, the word automation has different meanings. In Finance and Tax, the word automation is used to describe a wide range of data transformation solutions. According to Wikipedia, “Automation is the technology by which a process or procedure is performed with minimum human assistance”.  The Tax Technology definition of automation can be summarized as a process or procedure that is repeatable without the need for repeated alternation. For example, generating a tax return on a tax application is an automated repeatable process because the same forms can generate a tax return for different legal entities and periods by simply changing a couple of dropdown options or parameters, but the form layout and design remain unchanged with every repeated use. By contrast, a legacy MS Excel design is not considered automation because they frequently require manual modification after every data update.

Let’s imagine a typical Excel workbook that has multiple worksheets and is used to produce some type of report. By design, Microsoft Excel is made up of columns and rows.  On a typical report, most columns and some rows are assigned labels or headings. In the intersection of the columns and rows lie the numbers and calculations that make up the report. Furthermore, each cell of an Excel worksheet is assigned a value or a formula that references a value in a different cell that is located on the same worksheet or on a different tab or even a different Excel workbook.

As we know, MS Excel is a very powerful tool.  Because of its capabilities to pull and lookup data in various ways to construct new calculations, it allows the user to easily transform the data into a highly customized report. Because of this power, users quickly find themselves designing highly complex solutions that pull data across columns, worksheets, and even workbooks.  Creating complex Excel reports takes time. For this reason, users often save them for reuse at a future period.  When needed, the older reports are renamed, and their old content gets replaced with the new period’s data.

Reusing older reports with the new data is a time saver.  But how much time is really saved depends on the complexity of the original design. Furthermore, there are several disadvantages to this common practice. Below are the top 3 potential issues with repurposing old reports:

  1. Time-consuming Manual Clean-up: Because the data row count changes between periods, pasting or manually entering the new data to an older report may cause existing formulas to be overridden or some calculation references may generate errors due to missing expected values. In either case, after incorporating the new data in the older reports, they almost always require some type of manual cleanup. If a workbook has references to other cells or worksheets, the cleanup time can be considerably longer and may even be measured in days or longer.
  2. Broken Links: As the same Excel workbooks continue to be reused period after period, over time, they accumulate broken links that point to some outdated data sources that no longer exist. Broken links are the main reason old MS Excels take a long time to load.
  3. Potential For Errors: When (not if) a mistake is introduced during one of the manual cleanups and it goes unnoticed, the same mistake may continue to result in inaccurate calculations for a long time. Many studies have shown close to 80% of any random Finance Department’s Excel workbooks have minor to significant errors.

So, the question is, is there a better way? The good news is, absolutely! As very briefly described below, we can use Microsoft Excel to create Alteryx-like automated solutions that:

  1. Is 100% free!
  2. Will never need manual cleanup.
  3. Will consistently produce accurate output.
  4. Will regenerate updated reports in seconds by simply clicking the “Refresh” button.

 Anatomy of Automated Reports:

To create an automated report, the report must be broken into three layers:

To illustrate this point, once again think of the tax form that was mentioned earlier. Before generating a tax return, the data is loaded to a database (Data Layer). Under the hood of every tax form are long and complex calculations that are applied to the extracted data from the database (Calculation Logic Layer).  Finally, the output of the calculations is used to populate the tax forms (Presentation Layer).

In this scenario the only layer that changes between periods is the data layer when the new data is loaded to the database. Hence, the immediate availability of the new reports as soon as the database is refreshed with the new period’s data.

Automation Using Microsoft Excel:

We can use that same tax application design concept in MS Excel to generate repeatable automated solutions. The FREE and powerful MS Excel add-in features that make automation possible are called:

  1. Power Query – Power Query, like Alteryx, is a data transformation and data preparation engine that comes with a graphical user interface for getting data from multiple sources to streamline the data transformation of any ETL (Extract, Transform, Load) process.

  2. Power Pivot – Power Pivot is another Excel add-in that can be used to create powerful pivot table analysis and sophisticated data models. With Power Pivot, we can mash up large volumes of data from various data sources to rapidly perform information analysis and insights. For example, imagine a pivot table that displays the adjusted amounts from the 1120 returns with the capability to drill into the GL trial balance book amounts and then drill even further into to the sub-ledger invoice detail all on the same pivot report.

Automation Planning:

Before a process can be automated, we need to understand its input as well as what the output needs to look like. This is an important step because automation has some strict requirements.

Automated processes demand predictable input. To ensure success, the input data must meet the following criteria consistently with every automated data feed:

  1. The automated source data must come in the same file format (e.g. PDF, Excel, Txt) or from the same ODBC-connected database.
  2. The data must be available through the same file system path (folder) or the same database connection.
  3. The input must be structured.
    • The position of each column of the input data must be consistent with every data feed.
    • If column headings are used, they must be present with every feed.
    • Random blank rows and columns cannot be included in the automated data feeds.

Please note, the above is not a complete list of automated data requirements. It should also be noted that some exceptions to the above requirements can be mitigated but they often add more complexity that should be avoided, if possible.

Maintenance Requirements:

One of the most important and often overlooked issues with the tax process automation is its maintenance.  After tens or hundreds of thousands of dollars have been invested to hire outside consultants to implement a new automated process, it is painful to see that all that investment only works properly when the solution first rolls out. After the consulting team leaves and the solution is used for the next cycle, the new automated solution can encounter a catastrophic failure by the slightest glitch in the process. And because no one on the tax team is familiar with the new automated design, the big investment can be a total loss. This unfortunate issue can easily be avoided by simply training one or more tax department staff on the maintenance of the new solutions. As mentioned in one of our online blog posts (Why do Tax Automations Fail?), not having knowledgeable staff to maintain automated solutions is one of the main reasons tax departments do not realize the return on their automation investments.