More stupid Excel tricks: a secret power of IF

The hardest part about working with data, especially in large sets, is the people who input it in the first place. The reason they make it so difficult is because they’re inconsistent, not only in their day-to-day habits, but between one or more different people all entering info into the same database.

When you’re creating something solely for yourself, then by all means be as inconsistent or idiosyncratic as you want. But if it’s a group project creating information that someone like me is going to have to derive useful information from at some point in the future, inconsistency can make my job infinitely more difficult.

This is the reason why things like style guides were created — and they don’t just exist for the written word. Accounting and data management have their own style guides. So does computer programming, although that field has the advantage, because the program itself won’t let you get it wrong. Excel is the same way, although it won’t always tell you how to make it right.

Little things can cause problems and cost a business money. Sally may prefer to spell out words in addresses, like Avenue or Boulevard, while Steve likes to abbreviate with Ave or Blvd. Sam is also big on abbreviations, but always with periods. Seems innocuous, doesn’t it?

It does until the only way to make sure that a massive mailing doesn’t go to the same household at the same address twice is to compare the addresses to each other. That’s because, to a computer, 1234 Main Street, 1234 Main St, and 1234 Main St. are all completely different addresses. There’s no easy way to fix for that because computers don’t have a “kinda sorta look the same” function.

Garbage in, garbage out

It’s also important that a database be designed properly. For example, names should always be entered as separate units — title/prefix, first name, middle name, last name, suffix. They can be combined later when necessary. A lot of good databases do this, but it’s completely worthless if somebody enters the first and middle names in the first name field or adds the suffix to the last name. You may have heard the expression “garbage in, garbage out,” and this is a prime example of that. All of the right fields were there, but if used improperly, it doesn’t matter.

Of course, the proper fields aren’t always included. One example I’ve had to wrestle with recently is a database showing the various insurance policies people have had with the agency. Now, that is useful and necessary information, as well as something that legally needs to be maintained. And it’s all right that a person gets one row of data for each policy that they’ve had. Some people will have one or two rows, others might have a dozen or more.

So what’s the problem? This: There are no data flags to indicate “this is the policy currently in effect.” This is doubly complicated since it’s Medicare related health insurance, so someone can have up to two active policies at a time, one covering prescription medications and the other a Medicare supplement. Or a policy may have expired after they decide to drop an MAPD and go back to “original” Medicare but the only way to know that is to look for an ending or termination date — if it was ever entered.

The secret power of “IF”

This is where one of my “stupid Excel tricks” came into it. You may or may not be familiar with some of the numeric functions dealing with columns or rows of numbers, but they basically operate on a whole range. They include functions like SUM, MAX, MIN, and AVG. The usual usage is to apply them to a defined range or series of cells and they have no operators, so you get things like:

=SUM([Range])
=MAX([Range])
=MIN([Cell1],[Cell2],[Cell3],...[Cellx])

Here’s the fun trick, though. If you add one or more “IF” statements within any of these functions, you can perform the operation on a sub-range of data defined by certain criteria. In the example I’m giving, it would look at all of the insurance effective dates for one person and determine the most recent one, which is usually a good indicator of which policy is in effect.

Generally, each item you’re evaluating is in the form of [DataRange]=[CellValue], or in actual Excel terminology, it might look like “$A$1:$A$470=A12” for the version entered in row 12. After the criteria ranges, you enter the range that you want to perform the operation on, close out the parenthesis, then enter.

So let’s say that we have last name in column B, first name in column D, and the dates we want to look at to find the latest are in column N. Our formula would look like this, assuming that the first row has the field headers and the data starts in row two:

=MAX(IF($B$1:$B$525=B2,IF($D$1:$D:$D$525=D2,$N$1:$N$525))

If you’ve entered it right, the formula should be displaying the right number. In effect, you’ll have created a column down the far right side in which the value opposite any particular person’s name equals the maximum date value, meaning the latest. Then you can do an advanced filter (oh, google it!) to pull out just the unique name data and date, then use that to do an INDEX and MATCH to create a dataset of just the most recent plan and effective date. (I covered those two functions in a previous post.)

Or… the original database administrator could have just put those current plan flags in the data in first place, set them to automatically update whenever a newer plan of the same type was added, and voilà! Every step since I wrote “This is where one of my “stupid Excel tricks came into it” 396 words ago would have been unnecessary. Time and money saved and problem solved because there was never a problem in the first place.

The art of improv in Excel

On the other hand… solving these ridiculous problems of making large but inconsistent datasets consistent with as little need as possible to look at every individual record just lets me show off my ninja skills with Excel.

It’s really no different than improv. Badly entered data keeps throwing surprises at me, and I have to keep coming up with new and improved ways to ferret out and fix that bad data. In improv, this is a good thing, and one of our mottos is, “Get yourself in trouble,” because that creates comedy gold as things in the scene either get irredeemably worse or are suddenly resolved.

In real life, not so much. It’s a pain in the ass to have to fix the curveballs tossed at us by other people’s laziness and lack of diligence — unless we approach it like a game and an interesting challenge. Then, real life becomes improv again in the best sense.

And I’ll find it forever amusing that the same rules can apply to both a spontaneous, unplanned, free-wheeling art form, and an un-wielding, rigid and unforgiving computer program. They both have their rules. Only the latter won’t allow them to be bent. Okay, some improv games have rules that are not supposed to be bent. But half the fun is in bending those rules, intentionally or not.

With Excel and data-bashing, all of the fun is in following Excel’s rules, but getting them to do things they were never intended to.

Image source: Author, sample output from a completely randomized database generator in Excel used to create completely artificial test data for practicing functions and formulae without compromising anyone’s privacy. Maybe I’ll write about this one some day, if there’s interest.

Stupid Excel tricks #1: INDEX and MATCH

Enter the matrix… math

There is an entire class of functions in Excel that take things to a whole new level, and they are called matrices. Maybe you ran across this in math in school and have forgotten, maybe not, but the idea with a matrix is that it takes one grid of numbers of X x Y dimensions and uses operators to manipulate it using another grid of numbers that may or may not have the same dimensions.

The great part is that to use these functions in Excel, you don’t need to know any of that. Like I’ve mentioned before, it’s exactly like using a cookbook. Plug in the ingredients as specified, voila, the dish pops out the other end.

Maybe you’ve used the functions VLOOKUP and HLOOKUP, or maybe not, but they can be useful if you want to match exactly one criteria in a table and if the data you’re looking up is somewhere to the right of that criteria. So it’s perfect if you have something like a unique account number on the far left and want to use that to look up a name or phone number to the right of it:

=VLOOKUP(M2,$A1:$L556,6,FALSE)

This tells Excel to take the value in cell M2, compare it to all of the values in column A of the named range, then look up the value in the sixth column counting from the column defined in the second variable (in this case, F) where the first column is equal to M2. “FALSE” just means to use an exact match, whereas “TRUE” would mean to use an approximate match.

Again, this is great if you’re searching something with unique values in both places — there is only one account number, and only one data point associated with it.

Now what if you have multiple entries for the same person with different account numbers, or multiple sizes and colors of a product with differing prices, or you need to search on more than one data point in different columns, or your table was set up with the criteria you want to use somewhere to the right of the data points you’re searching?

Welcome, matrix functions! These are two nested commands that work miracles together. The first is INDEX, and what it basically does is point to a column with data that you’re going to pull stuff from, then follow that up with the criteria you’re going to use to do that. You can see the difference from the LOOKUP functions right off the bat, because those start with the single data point you’re going to use to search the data. The INDEX function starts with the place you’re going to get the answer from.

The MATCH function is the matrix math, and it allows you to specify multiple criteria matched to different columns in the source data. The nice part about it is that you can have as many different criteria as you need — first name, last name, account number; size, gender, color, style; title, author, binding, edition; and so on. And each of these can point to any particular bit of data you need — monthly cost, price, location, phone number, address, and so on. Any bit of data in the table can be found this way.

If you want to put a physical analogy on it, it’s this. LOOKUP functions are a librarian with a sliding ladder that moves horizontally or can be climbed vertically. But the way it works is that they first move it or climb it in the direction you specify until it hits the target word. Then, it slides or climbs the other direction however many rows or columns you specified, and has now targeted exactly one cell with the answer. Oh — and it can only move to the right or down from that initial search cell.

On the other hand, think of INDEX and MATCH as a whole bunch of librarians who have set out all over the same bookcases, but are simultaneously searching the rows and columns, and calling back and forth to each other to indicate what bits they’ve found that match.

If you work with any kind of inventory or any data sets where people’s info is broken down (as it should be) into separate first and last names and account identifiers, then you need to know these functions, because they will save you a ton of time. And the basic way they work is like this:

INDEX($E1:$E1405,MATCH(1,(W2=$C$1:$C$1405)*(X2=$D$1:$D$1405)*(AA2=$J1:$J1405),0))

(Note: All column and row designations here are arbitrary and made up, so they don’t matter.)

That might look complicated, but it’s not. Let’s break it down. The first part, referring to the E column is the “Where” of the formula. That is, this is the column you’re pulling your data from. For example, if you want to use size, color, and style to find price, then this would be whatever column has the price data in it.

Next, we nest the MATCH function, and this lets INDEX know that what comes next will be the instructions it needs. The “1,” inside the parenthesis is a flag for MATCH, telling it to return one value. After that, each nested thing — and you can have as many as you need — follows the form “Single cell to look at equals column to search.” So, as seen here, for example, in the search data, column W might be the first name, and cell W2 is the cell corresponding what we’re looking at. Meanwhile, column C in the target data might also include first names, so what we’re saying is “Look for the single value of W2 down the entire column of C1 to C1405. The dollar signs are there to lock it as a fixed range.

All of the other parentheticals here follow the same pattern. Maybe X is the column for last name in the source and D is where the last names are in the target; and AA is account number, as is J.

The two other interesting things to note in building matrix equations: The single cell and the column are joined by an equals sign, not a comma, and this is important because, without it, your formula will break. What this tells Excel is that whatever the matrix pulls out of single cell must equal what’s in the column at that point.

The other thing to notice is that between the searches within parentheses, there aren’t commas, but rather asterisks, *, which indicate multiplication, and this is the heart of Matrix math.

What this tells the formula is to take the results of the first thingie, apply those criteria and pass it along to the second. In other words, if the first evaluation turned up nothing, that is mathematically a zero, and so it would quash anything coming from the second and third functions. On the other hand, if it comes up as a one, then whatever the second formula turns up will stay if there’s a one, dump if not, and then pass on to the third, fourth, etc..

Lather, rinse, repeat, for as many steps down the process you’ve created. A false at any point in the matrix math will kill it and result in nil. Meanwhile, as long as the tests keep turning up positives, what will fall out of the ass end of it is the honest legit “This data is the true data.”

Funny how that works, isn’t it? The only other trick you need to remember is that after you’ve entered this formula, you need to close it out by hitting Ctrl-Shift-Enter to let Excel know it’s a matrix formula. Then, if you want to copy it, you can’t use the usual Ctrl-C, Ctrl-V. Instead, you have to highlight the column with the formula at the top, then hit Ctrl-D. Voila… the whole thing repeats down the column.

And there you have it — a way to search multiple criteria in a row in order to find a specific data point in a table. You’re welcome.

Don’t make it rocket science when it’s not

So many tools

It never ceases to boggle my mind when people don’t jump on the chance to learn and fully take advantage of the amazing modern tools we’ve been handed and which are ubiquitous. If you work in any kind of office environment at all, whether it’s some stodgy traditional business or a bleeding-edge industry like tech or gaming, at the very least you’re dealing with either Microsoft’s Word, Excel, Outlook, etc., or the Apple equivalents.

If you’re using the Open Office or Chrome/Cloud versions, then this piece probably isn’t directed at you because you definitely get it. But, otherwise… really, people? These are literally the things that you use every day, and yet I constantly see very few people ever progressing beyond the merest basic ability to use any of the programs.

That is: Open document, type shit with defaults, save or send as-is.

If I open a spreadsheet you’ve worked on in an older version of Excel and see three tabs at the bottom named Sheets 1, 2, and 3, I will know that you’re an amateur. Likewise if the font is set to that hideous Calibri. Same thing in Word minus the tabs, but same crappy font, ragged aligned left, with auto-spacing before paragraphs or lines.

Word to the wise, people. The first thing you should do in Word is go in and set your default formatting so that the autospace before lines or paragraphs is 0, and line spacing is single.

Why is paper still a thing?

But this is just an intro to some recent heinous, and it’s this. I’ve managed to stumble into a situation where a lot of coworkers prefer to do things on paper, and it makes me nuts. Simple question: Why? Physical files can only be in one place, usually aren’t in the place where they’re supposed to be, and there isn’t a magic search function that can find them other than somebody maybe remembering that they worked on it recently, and where they put it. There’s also no standardization of fonts, so if someone scribbles a note in that file, there’s no guarantee that someone else will be able to read it six months later.

Not to mention that it’s just wasteful. Especially wasteful when there are so many ways to avoid it and so many resources to make that easier.

Case in point: One of the things I do regularly is enter and reconcile commission statements from various vendors, but I’ve had to do it by printing the things, manually entering the data into a spreadsheet, and then doing a careful audit to fix the inevitable errors, since some of these run to hundreds of entries.

But then I figured out how to pull the data directly from the statement, slap it into Excel, format it, and then use a few formulae to pull the new info into the old spreadsheet. The great advantages are that it uses the original data directly, so there are no entry errors to deal with. Also, the second pass just involves pulling out a copy of the original statement data and the target input by formula data, putting them side-by-side, using a few more formulae to spot errors due to differences in how names were spelled, making a few tweaks, and reconciling the thing a lot faster than before.

Pre-paperless innovation, a big statement could take me a few days (interspersed among all the other office duties) to finally balance it to zero. New method? I made it through four statements in one day, each one entered and balanced in two steps instead of about six.

The thing is, this isn’t really all that difficult, and anybody could learn to do it. One of the big helps in this process were the Excel functions INDEX and MATCH (which I’ll explain in a future post), and it took all of a two minute Google search and then reading the first good link to figure out how they worked in order to figure out how to do what I needed to do. What I needed to do: Compare the client’s first and last names and insurance plan type in one table in order to pull out a specific number from another. And this is literally all you need to do to learn how to make your office tools work for you.

Try it. Google “change the default font in Word,” or “turn off auto-correct in Word,” or “alternatives to VLOOKUP in Excel,” or any one of a number of other topics, and you’ll find the answers. It really isn’t any more complicated than reading a cookbook and making food from a recipe. Really, it’s not.

Using computers made easy

There is too much of an aura of mystery put around computers, but trust me, they are more simple than you think — and I’ve been working with them since… well, since most of my life, because I was just born at the right time. All that they ultimately understand are “Off” and “On.” “Zero” and “One.” Those are the only two states a switch can be in, that is what digital computing is, and it only gets two digits.

Maybe someday I’ll write a bit about how the electrons inside do what they do and turn it into intelligible information for humans, but for now suffice it to say that they pretty much only do a few things — input, store, and retrieve data through various devices; allow you to manipulate that data with various software programs; then allow you to re-store and output that data, again through various devices.

The nice thing about graphical user interfaces (GUIs) like Windows, OS, Android, Linux, etc., is that they tend to standardize across programs written in them, so that every program tends to use the same convention for the basics: Open, Close, Save, Save As, Print. Programs of the same type will also follow the same conventions — Format, Spellcheck, View, Layout, etc., for text editors; Image, Layer, Select, Filter, Effect, etc., for graphic design programs; Inset, Formulas, Calculate, Data, Sort, etc., for spreadsheets.

Finally, almost every program will have a Help function, whether it’s invoked via the F1 key, or by some combination of a control/alt/Apple/shift-click plus H move. Help menus, when well-done are great and, guess what? They were basically the hyperlinked documents we’ve all come to know and love via the internet, except that they’ve been around since long before the internet. Most of the time, they’ll answer the question but, if they don’t, you can always google it, as I mentioned above.

How to create job security

You may be wondering, “Okay, if my job is just doing data entry, or writing emails, or accounting, or… etc., why do I need to know so much about the software when no one else does?”

Simple. As the economy moves more and more toward service, knowledge becomes value. If you’re the one in the office who gets a reputation as the computer expert, you will get noticed, and you will save a higher-up’s cookies more than once. You’ll also earn the attention and gratitude of your co-workers if you become the one they come to when “I did something and something happened and I don’t know how to fix it,” and you know immediately upon looking that they accidentally, say, set Word to Web Layout instead of Print Layout. It’s called creating job security by taking that extra simple step that too many people refuse to. Try it!

Image Source: NASA, Apollo 11.