S Anand

Search:

S Anand

Debugging in Excel is another time consuming task. 80% of the trouble is identifying the problem (Error? What error? Where?) as opposed to fixing it (Why's THAT cell showing THAT?). Most of my time is spent chasing three kinds of errors: wrong reference (leading to a #N/A or #REF!), wrong data input, or wrong formula.

Wrong references are easy to spot. You'll see a #N/A or a #REF! sticking out. But on large sheets, even that's tough to spot. I always have a SUM (or some kind of total) function that covers EVERY cell in EVERY table, even if I don't need that information. If ther SUM shows a #N/A or #REF!, I can use Trace Error (Alt-T-U-E) to see where the problem is coming from.

If you know there's a wrong reference in a cell (say A1), and want to ignore it, use a new cell with the formula =IF(ISERROR(A1),0,A1). You can substitute an entire row, column or table this way.

Wrong data input is best avoided upfront. Before I hand my Excel sheets over for large scale data entry, I do three things:

Let them enter data only in input areas. Unprotect the cells the user can enter data in (Ctrl-1, Protection, remove the tick against 'Locked'), and protect the whole sheet (Alt-T-P-P).
Validate the data. Turn on data validation (Alt-D-L) on all editable cells, and specify the validation criteria.
Make it easy to spot errors. If there are percentages that should add to 100%, show the total in a cell that turns green if the total is 100%, and red otherwise. (Use conditional formatting - Alt-O-D). If data about 20 people must be entered, show the number of people about whom data is entered, and mark it red until it's 20. Make sure all criteria are captured. When the spreadsheet is filled, it should be impossible to make errors.

Wrong formulae (like + instead of -) are tough to spot. The best way to check for this is to do the same calculation in a different way, and compare this with the result. Whenever I create complex tables, I always have an error row at the bottom. I compare the totals on the table with the totals calculated in a simpler way and check the difference. This warns me when I miss out some elements, or double count something, in the table.

. A panorama of Paris.

? Hardy tries to get one -- and succeeds. But... is it worth it? Pretty good economic analysis.

: simple instructions on how to get a web site up and running.

Convert . This is the only way I know of converting something from PDF into another format. Unfortunately, it doesn't work for GoogleMail in the UK.

is Sir Tim Berners-Lee's blog. The pay homage to the father of the WWW. via LazyGeek

, a video for IBM zooming 10 times every 10 seconds through the universe.

A corollary of Rule 3: Never type the same formula twice. Design the formula so that if you cut and paste it elsewhere, it works correctly. The $ symbol and the F4 key for cell references help in 90% of the cases. For complex requirements and large data, 5 functions come in handy: INDIRECT, OFFSET, ADDRESS, ROW and COLUMN.

I once did a survey, and had data spread across 300 sheets (same format on all sheets). I needed cell D3 across all sheets in a column, to summarise the results. The image explains what I did. INDIRECT returns the value of a cell. INDIRECT("Sheet2!D3") is the value of cell D3 in Sheet2. And INDIRECT(CONCATENATE(A2,"!D3")) will give you the value of cell D3 in whatever sheet A2 specifies! I created a list of sheet names in column A, and column B had "D3" in each of those sheets. In effect, INDIRECT can transpose sheets into columns. (Getting a list of sheet names on to a column is tough, however.)

If you need multiple cells from a sheet, say D3:Z3, use the ADDRESS, ROW and COLUMN functions. ROW(D3) returns 3, and COLUMN(D3) returns 4 -- the respective row and column. If you copy ROW(D3) to multiple rows, you will see ROW(D3), ROW(D4), ROW(D5), ... which are 3, 4, 5 respectively. Similarly for COLUMN. It's a useful way of linking values to position.

ADDRESS does the opposite. ADDRESS(ROW(D3),COLUMN(D3)) = ADDRESS(3,4) = "$D$3". ADDRESS(3,4,1,1,"Sheet2") returns "Sheet2!$D$3". (See help for the ,1,1 in the middle, and just put it in always.) To cells D3:Z3 from all the sheets, copy the formula INDIRECT(ADDRESS(3,COLUMN(D3),1,1,$A2)) to the entire range. The INDIRECT, ADDRESS, ROW/COLUMN combination can slice contiguous data across sheets in any way you want.

Another useful function is OFFSET. OFFSET(D3,2,1) returns the value in cell E5. It shifts the reference D3 down by 2 rows and right by 1 column. OFFSET can be used instead of the INDIRECT and ADDRESS when multiple sheets are not involved. OFFSET can also return a range. OFFSET(D3,0,0,2,2) returns the range D3:E4, which is the 2x2 range starting from 0,0. So SUM(OFFSET(D3,0,0,2,2)) is the same as SUM(D3:E4). With OFFSET, you can specify a range with variable position and variable size (which you can't with $ references).

Once, we were modelling a leasing company's accounts. (Warning: this is a complex example.) We knew the volume of loans they would disburse over the next 3 years. The monthly interest rate is, say, 1%. What would be their interest income every month? Well, it's not just 10% of what they've lent out. Customers pay back in equal monthly installments (EMIs). The EMI includes the principal and the interest. Initially, the EMI has a large interest component and very little principal, because there's a lot left to repay. Towards the end, the balance dies down and so does the interest; it's mainly the balance principal that's being repaid. The interest income is not the same every month even for a single lease.

The calculation is conceptually simple. The IPMT function tells you the monthly interest each month. Let's say all leases are for 36 months. So, to calculate the March interest income, take the January disbursals and multiply it by the third month interest component: IPMT(1%,3,36,-1). Take the Feb disbursals and multiply it by IPMT(1%,2,36,-1). Take the March disbursals and multiply it by IPMT(1%,1,36,-1). And add them up. For April, you'd add 4 terms. And so on. Mathematicians call this a convolution. It's like a SUMPRODUCT of a series with another series in reverse.

Cell E4 on the image alongside does exactly that for month 3 (March). There are 5 columns:
A: Month
B: Amount disbursed that month
C: Months in reverse
D: Interest component for month in reverse
E: Interest income for month
E4 is the sumproduct of B2 to B4 (the Jan, Feb and March disbursals) with something else -- an OFFSET. The offset says, from D1, move down C4 (34) rows and select A4 (3) cells further down. This has the interest components for the first, second and third months in reverse. So, the disbursal for Jan is multiplied with the 3rd month's interest, Feb with the 2nd month's interest, and Mar with the 1st month's interest. That's exactly what we wanted.

It may look complex. But remember: you have to type this complex formula only once, not 36 times. (And in my case, I had 18 product types.) Also, you're less likely to make typing errors when cutting and pasting. So this saves you debugging time as well.

Hilarious anecdotes from an American .

. This one is very true. And I used to think computers just started working better when I was around...

and the . With the home page API, Google Base, and Blogger, looks like Google has bypassed conventional web hosting and become a Web 2.0 host.

Quote by :

What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.

It sounds quite like the Scientific American article which says that after a point, more choice causes unhappiness. The satisfaction of picking the best choice is less, because the second best is almost as good. And you're more likely to not pick the best -- because there are so many choices -- and will regret it more often.

Research suggests that .

When they showed the chimpanzees how to retrieve the food, the researchers added some unnecessary steps. Those chimps could see that the scientists were wasting their time sliding the bolt and tapping the top. None followed suit. They all went straight for the door. The children could see just as easily as the chimps that it was pointless to slide open the bolt or tap on top of the box. Yet 80 percent did so anyway.

... humans are hard-wired to learn by imitation, even when that is clearly not the best way to learn. As human ancestors began to make complicated tools, figuring out goals might not have been good enough anymore. Hominids needed a way to register automatically what other hominids did, even if they didn't understand the intentions behind them. They needed to imitate. Not long ago, many psychologists thought that imitation was a simple, primitive action compared with figuring out the intentions of others. But that is changing.

An article on NYT about : can this man reprogram Microsoft?

Rule #3: Avoid manual labour (continued)

Reconciling data is where I spend most of my time on Excel. Say you have a list of branches by city from 2 banks. You want to know where both banks have branches. Excel doesn't know that Kolkata is Calcutta. There are 500 cities, and you have 30 minutes.

Use VLOOKUP for a start. If Bank A's cities are in column A (say 2-500) and Bank B's cities in column B (say 2-400), in C2 type VLOOKUP(A2, B$2:B$400, 1, 0) (read Excel help -- all I'll say is, don't miss out the 0 at the end: otherwise you get approximate match, and that's not good). Copy the formula to down to C500. Similarly, in D2 type VLOOKUP(B2, A$2:A$500, 1, 0). Copy the formula down to D400.

You'll see the #N/A where there's no match. #N/A in Column C means Bank A has a branch here, but Bank B does not. Column D has the converse. But we're not done yet. There could be spelling mistakes. Using two VLOOKUPs simplifies that problem considerably. We just need to match the cities having #N/A on both lists to check for alternate spellings of cities -- which is a lot less work! So prepare a separate list: unmatched cities from Bank A, and unmatched cities from Bank B. (See the section on to simplify this.)

Sort both the lists while remembering the original order. You'll want to remember the original order often -- so just add a column, number it sequentially (1,2,3... use Alt-E-I-S), and sort the city names along with the numbers. When you want to get back the original order, just sort by the numbers again. To avoid distraction, you can move or hide these numbers. Now, you have a sorted list of unmatched cities. Notice that you can visually match many of these cities. There's nothing easier to search (visually) than a sorted list.

Finally, when you've mapped all your columns, the ones that are remaining are the ones where there is no overlap.

How I (lazily) listen to music

I have a large MP3 collection (Tamil and Hindi films). I don't like selecting songs to listen to. Too much effort.

I rated all songs I had listened to (650 songs x 5-10 seconds = 1-2 hrs) and created 7 SmartViews. I just go to one of these and play them in order. Here are my views, in descending order of their use.

Most played. Sorted by Play Count. Songs I play the most. Plays stuff I listen to usually.
Not heard recently. Played Last before 3 months ago AND Rating >= 3. Plays good songs I haven't heard recently.
Not played much or recently. Played Last before 1 month ago AND Play Count <= 2 AND Rating >= 3. Plays good songs I haven't heard often enough.
Recent hits. Last updated after 3 months ago AND Play count >= 3. Plays songs recently added and liked.
Recently played. Sorted by Last Updated. Often, I like to listen to songs I listened to yesterday.
Top rated. Sorted by Rating. My best songs. (Suprisingly, I don't use this view much.)
Recently added. Sorted by Played Last. Plays songs I just downloaded.

But WinAmp's not good enough. For example, I can't find out what songs I played at least thrice last month. ? Fortunately, there are a few . I installed , which produces a log file that can be analysed. I did this two weeks ago, and don't have enough data. When I do, I'll modify two views

Not heard much or recently. I'll change this to "Not heard much recently" - Rating >= 3, Play Count > 5, Play Count = 0 last month.
Recent hits. Modify it to show songs played at least thrice last month.

I don't like hunting for new songs either. Too much effort.

External recommendations like help, but not much. I usually like only 1 of the top 10.

I don't really know the recent music directors. But many interesting songs I've heard recently (like Ondra Renda in Kakka Kakka, Vaseegara in Minnale, and Kaadhalikkum in Chellame) are by Harris Jayaraj. So maybe if I can find the music directors I like, other songs by them would be good recommendations.

I have an automated way to find the music director for a movie. First, I spent a few hours renaming my MP3s to a Movie.Song.mp3 filename format (using Excel and Perl liberally). After that, I wrote a Perl program that reads movie names and the movie directors from and matches the Raaga movie names with my movie names. (Raaga has all but 5 movies whose songs I've heard.) Then I rate music directors based on my songs' ratings.

Unfortunately, the matching worked only for 45% of my 273 movies. The rest were spelt differently on my list and Raaga. I checked if there was a way to match Tamil words roughly. The closest was , but Jesse, the author, mailed me saying that was "like slicing your bread with a chainsaw".

Vowel rules
AEdhiri -> Edhiri
kadhal kondEIn -> kadhal kondEn
chellamEY at end-> chellamE
sachIEn -> sachIn
marupadIUm -> marupadIYUm
OI, OY, OVI, OYI are all the same
AAthma -> Athma
azhagiya thEEye -> azhagiya thIye
abOOrva ragam -> abUrva ragam
Ignore H. It is redundant.

Consonant rules
arasakTCHi -> arasakSHi
CHippikkul muthu -> Sippikkul muthu
thenNDRal -> thenNRal
devar maHan -> devar maGan
bagaWathi -> bagaVathi
avvai shanmuGi -> avvai shanmuKi
konJi pesalam -> konCHi pesalam
anDha 7 naatkal -> anTha 7 naatkal
aBoorva sagodharargal -> aPoorva sagodharargal
agni natchaTHIRam -> agni natchaTHRam

So I developed these rules. The -> arrow below is to be read as "is also spelt as". By just applying them sequentially, I matched 33% more movies.

The remaining movies either had spelling mistakes (e.g. Kilipethcu Ketkavaa) or had structural differences (Ilamai Oonjal Aadugiradhu vs Ilamai Oonjal Aadudhu). By permitting approximate matches using , I was able to match 12% more, making my total accuracy ~90%.

Though this is good enough for identifying music directors, I'm working on improving the approximate matching rules. I hope to have 98% accuracy, and then I can match individual songs -- and know who the singers are. Hopefully, this can be extended to other sites like , and who knows -- maybe even .

tries to match Google Maps. But Google Maps is just too fast. As I mentioned earlier (): the reason I like Google is largely speed.

: the story of a server that would not send e-mail beyond 500 miles.

: how to spot fake data.

-- books, movies, etc.

from O'Reilly. Speaking of which, it looks like you can .

Hilarious post on .

Microsoft's IT department has used the .

Will go live today?

shows recent trends in del.icio.us bookmarks.

Firefox 1.5's . A first-player shooter type with textures. Impressive. (You need Firefox 1.5 to view this).

, like a riot, plane crash or car without brakes. Also includes an alien abduction scenario.

. You may prefer it to Virtual Dub.

Home . 2005 . 2004 . 2003 . 2002 . 2001 . 2000 . 1999 . London (2005) . London (2000)