realbasic-nug
[Top] [All Lists]

Re: O(N), impracticality, and obfuscated C

To: REALbasic NUG <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: O(N), impracticality, and obfuscated C
From: William Squires <wsquires at satx dot rr dot com>
Date: Sat, 30 Oct 2004 18:18:35 -0500
Cc:
Delivered-to: realbasic-nug at lists dot realsoftware dot com
References: <20041030170021 dot C764951C667 at lists dot realsoftware dot com> <39D9B205-2A9E-11D9-B034-000A27B1C8AE at elfdata dot com>

On Oct 30, 2004, at 1:05 PM, Theodore H.Smith wrote:

Does anyone know much about the field of string similarity?

I have a distaste for the academia, because:

1) If you look at the pioneers of modern computing, few came from the
academia.

2) Most of what they write is over-complexified, yet impractical.
"O(N)" notations and really bad source code writing skills.

The "O(N)" notation is a simple, expressive way to describe the
asymptotic growth rate of a function.  I'm happy to explain it, if you
like.

No need to explain it, not to me as its simple enough to grasp. To me its like bad C. C is a simple language, correct? But C can be used to write awful things that no human should understand (obfuscated C). To me, O(N) is just like that. Simple, but not nice looking. A step towards obfuscation.

O(N) notation does sense and is useful, its just unnecessarily mathematical... Why not just express it like "linear cost", or "quadratic cost". I suppose when you have functions whose cost really IS an equation, like n*log(n) then it makes sense, but I'd really only use that as a last resort, if there wasn't a better way of expressing the costs, without resorting to obfuscation.

Also, it doesn't mention the length of time that O takes. O(N) for example, may be slower than O(N^2) for low values of N, if O in the first case is high, but O in the second case is low.

  Say what?!?
While it is true that - if O() were a pure mathematical function - O(N) > O(N^2) | 0 < N < 1, but N can only take on integral values; how can you have 1/2 of a data point? There is another factor that many programmers "conveniently" forget; and that's that Big-O notation simplifies certain aspects of the problem reducing the "evaluation" down to the worst-case scenario (usually w/respect to large data sets "N"), so you can have cases where an insertion or bubble-sort does better than a binary tree search, or quicksort, for small values of N. This is why these are just tools, not laws. They can help you evaluate your code (algorithms), but they can't say they're the best for all situations! As an example. Say you have a somewhat small data set (N < 10000). These need to be read into an array for use as a lookup table. For speed, the array needs to be kept in sorted order, but insertions and deletions are comparatively rare events. Some programmers will go get a book on algorithms and see that insertion sorts have a worst case performance of O(N^2), and a binary tree (or quicksort*) has a performance of O(log2(N)) and conclude that you should always use a binary tree or quicksort for such cases. But what they don't read is the fine print; "...the performance of a binary tree degrades to O(N) when the input data is already in sorted order (or is nearly sorted), leading to a highly unbalanced tree structure." An insertion sort, on the other hand, performs much better when the input data is already sorted order (it only has to append the data - during the initial insertion - at the end of the array, without the need to move all the array elements down one slot)! And a simple tweak to the algorithm - using that knowledge - will help you gain O(log2(N)) for searches and O(1) performance for the initial insertions (when you're reading the sorted items from your data file)! You'll still have O(N) performance for "normal" insertions, but - as stated in the problem - insertions and deletions are relatively rare. And why do I mention the bit about the list being in sorted order? Consider the above mentioned binary tree. It works nicely for unsorted data (the 1st time you construct the tree from the unsorted file). But what do you think happens the first time someone goes and inserts (or deletes) an item from the list? That's right; the code marks the data set 'dirty', which causes (or should cause, anyway) the data set to be saved back to the file... in sorted order. Oops!

• IIRC, quicksort has a performance of O(N x log2(N)), a bit worse then O(N), but not as bad as O(N^2)!

Of course when N gets high the speed of the operation becomes irrelevant, but for certain ranges the speed of the operation does make a difference. For example, what if in the first case, O is 1000x slower? And what if the values of N, only can range from 10-30? In this case, the second function is better.

Things like that, make me wonder why use such notations in the first place. The real world can't get summarised into mathematics so neatly, much of the time. I'd only use such notation as a last resort, not in preference to more natural notation.

--
    Theodore H. Smith - Software Developer.
    http://www.elfdata.com

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://www.realsoftware.com/listarchives/lists.html>


William H Squires Jr
wsquires at satx dot rr dot com dot nospam <- remove the .nospam
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://www.realsoftware.com/listarchives/lists.html>

<Prev in Thread] Current Thread [Next in Thread>