More iffy situations with Excel

As I’ve written about before, getting disparate sets of data from different sources and different formats to play together nicely in Excel can be a challenge. One of the big parts of my job is regularly entering my boss’s commission statements into spreadsheets and making sure that what I end up with balances with what the insurance company told us.

So we have our spreadsheets in one format, insurer data that sometimes is conveniently available in Excel format, but is more often PDF format, which adds a whole layer of stupid in actually trying to get the data into usable form, and any client information comes from our CRM software, which I swear has a UI that was designed for Windows 3.51 and hasn’t been updated since.

You can see the challenge, I assume.

For my non-nerd readers, here’s a relatable example. Somebody gives you a recipe for an entrée that’s a bit complicated. Say it has about 24 ingredients. Now, one third of those ingredients are written in English and give Imperial measurements in cups and ounces and so on. Another third of those ingredients are written in French, and all of the measurements are metric. Finally, the last third are written in Greek, and every measurement is something obsolete or obscure, like a minim, hogshead, or grain.

You have to turn that all into a single recipe in the same language and measurement system — it doesn’t matter which one, it just has to be one of each. And you have to do it automatically, by creating formulae to do it.

That may seem like a lot of work for a recipe with 24 ingredients, and you’d be right. But a lot of the time, I’m dealing with commission statements with hundreds of entries.

Good times!

A recent situation I ran across was a company that was playing a game of “Oops, missed paying you for a few months,” so that one statement had multiple entries for the same client and policy — anywhere from six to nine.

Meanwhile, the tracking spreadsheet is designed for one entry per person per insurance plan. Complicating it, this is a company that deals with both Medicare Supplements and Prescription Drug Coverage, so a client might have one, or the other, or both.

Finally, at least they do it right by breaking the client name into first and last but they don’t include policy number, so there isn’t a single unique identifier. Not that policy number would help, since they use the same one for both the Supplement and Drug Coverage, so you can see how it gets messy fast.

Now, elsewhere, I’ve discussed the joys of INDEX and MATCH, which allow you to do a more elaborate version of VLOOKUP that lets you search on multiple criteria anywhere in the data. It’s basically creating a matrix that will build a hidden array of zeros with a single one in it. Counting across columns and down rows, the position of that single “one” highlights the only cell with the data you want in it.

When you need to total up amounts based on multiple criteria, the stalwart SUMIF won’t do it anymore. But that’s okay, because there’s a matrix version of this function as well, and it’s called SUMIFS.

The format is simple, really. Start with “SUMIFS(.” Then you enter the target range — that is, the column you want to sum. After that, knock yourself out with as many following arguments you need in the pattern “Criteria Range,” “Criteria”, lather, rinse, repeat.

As always, the Criteria Range is the column you want to look at, and the Criteria is the single cell value you want to look for.

Let’s say that we want to use the client’s last name, first name, and plan type to sum up the commissions that were paid. Well… oh, fooled you, because we first have to create a list that only includes single instances of each of these.

Bonus round!

The Data Filter function in Excel is very powerful, and I recommend using it. In the simple version, you can set you header row to be clickable as a filter, so if you suddenly only need to see January dates or clients named Oswald, you can do it in a couple of clicks.

But there’s a more powerful version and it works like this. Say you want to take a list with multiple identical entries in order to turn it into a key for something nifty cool like SUMIFS. Here’s what you do.

Let’s say that the columns you have are Last Name, First Name, Plan, and Commission.

The one you want to sum is Commission

First highlight the data you want to reduce, including all of the columns, then cut and paste it to a new document. Copy only the header row two times to the right, leaving a blank column between the source data and the copies.

Important: delete the header names for any columns that are not unique — but only delete the cell info. Don’t delete the columns, because it breaks the following operation for some reason.

Now highlight that copied data again, except for the Commission column.

Next, smash that Data button in the ribbon, and select filter, all — or hit alt-D, F, A, if you know the value of using the keyboard over the mouse every time.

A menu window will pop up. The first field will be the data you’ve highlighted. In the second, which is “Criteria,” you want to click and then highlight that copy of headers just to the right of your highlighted data.

Next, click the “Copy to Another Location” button, highlight the second copy of the headers in the “Copy to” field, and then check the box next to “Unique records only.”

Click okay, and you now have a list of, well, just what it said on the tin — unique records only. These will be the source for your SUMIFS. You can delete the blank columns between the two if you’d like now, leaving a blank between them.

Add back the “Commission” header to the right of this. You’re going to put your SUMIFS into this column.

Let’s say that you have 100 rows of unfiltered source data, you’re going to be looking at the Last and First Name Fields and Plan Types (A, B, and C) and you’re going to be summing the Commissions, in column D.

The corresponding columns in your criteria set of unique records are F, G, H, and I — last, first, plan, commission.

Here’s what the formula in the first cell (I2) looks like:

SUMIFS($D$1:$D$100, $A$1:$A$100,F2,$B$1:$B$100,G2,$H1$H100,C2).

In plain English, this says, “Look at the column in the original data with all the money. Then, Look at the column with all the last name and find the last name on this row that matches it. But wait! Now also look at the first name column, and match it to this first name row. Then look at the plan column, and the unique name on this row.”

It’s another matrix function, building that map of ones and zeros, so to close it out, instead of hitting enter, you have to hit Shift+Ctrl+Enter. The last step in the process is to sum the results in column I and then match them to the sum of column D. If they’re the same, you’re good to go.

Copy the data from F1:I100 to wherever you’re going to connect it to the final Excel spreadsheet, but then brace yourself for when those numbers don’t match, because they rarely do on the first shot.

But I’ll save my tricks for dealing with that shit for a later post.
Kirlf, (C BY-SA 4.0), via Wikimedia Commons