Suppose you have an array of floating point numbers with each index into the array being an id number corresponding to some external data structure. You want to sort this array, but in doing so you would destroy the references to the id numbers, since the indexes of the array would no longer correspond to the correct id numbers in the external data structure. For example, let’s say we are dealing with customers of a store and each value in the array is their current balance.
balances = {25.61, 13.45, 89.75, 21.2, 96.50}
Each index in the balances array corresponds to an index in some other array, say:
names = {"Marjory Stewart-Baxter", "Hubert Cumberdale", "Barbara Logan-Price", "Jeremy Fisher", "Mable"}
If we were to merely sort the balances array in descending order, the result would be that the balance of $96.50 would no longer correspond to Mable but to Marjory Stewart-Baxter. Marjory hates Mable enough already, we wouldn’t want her having to pay her bills, as well. Of course, an easy solution would be to use a Map with the name of the person as the key and the balance as the value. The problem comes in that maps are slower to access and consume more memory than arrays, so could be less desirable in situations where speed and memory are important.
In Python, it is easy to sort two lists of this kind. Make each value in the list a tuple (balance, name) and then use the built in sort function:
balnames = zip(balances, names) balnames.sort()
However, in Python a list does not an array make. Memory consumption can get messy if you are using complicated structures (especially dictionaries). Even if you’re using only tuples of primitives, you consume more memory than a simple array would. I’ve been doing some Java coding over the past few days and ran into this problem. My solution is an implementation of merge sort that sorts both the array of values and keeps track of the indexes in a separate array. Memory consumption is still not optimal, since merge sort is O(n) in memory and keeping track of the indexes doubles that again.
/**
* Modified merge sort algorithm for sorting an array
* in descending order while keeping track of indexes.
*
* @param indexes array of the original indexes for items in the data array
* @param data array of the values that the array will be sorted according to (doubles)
*/
public static void mergeSort(int [] indexes, double [] data) {
if (indexes.length <= 1)
return; // stopping condition
int split = indexes.length / 2;
int mod = indexes.length % 2;
int [] indexes_l = new int[split];
int [] indexes_r = new int[split + mod];
double [] data_l = new double[split];
double [] data_r = new double[split + mod];
// split the arrays into two parts
for (int i = 0; i < indexes.length; i++) {
if (i < split) {
indexes_l[i] = indexes[i];
data_l[i] = data[i];
} else {
indexes_r[i - split] = indexes[i];
data_r[i - split] = data[i];
}
}
// recursive step
mergeSort(indexes_l, data_l);
mergeSort(indexes_r, data_r);
// now handle merge
int idx_l = indexes_l.length - 1;
int idx_r = indexes_r.length - 1;
while (idx_l >= 0 && idx_r >= 0) {
// add the smaller item to the right of the array
// this produces descending order, in case of tie
// right side goes in first
if (data_r[idx_r] > data_l[idx_l]) {
data[idx_r + idx_l + 1] = data_l[idx_l];
indexes[idx_r + idx_l + 1] = indexes_l[idx_l];
idx_l--;
} else {
data[idx_r + idx_l + 1] = data_r[idx_r];
indexes[idx_r + idx_l + 1] = indexes_r[idx_r];
idx_r--;
}
}
while (idx_l >= 0) {
data[idx_l] = data_l[idx_l];
indexes[idx_l] = indexes_l[idx_l];
idx_l--;
}
while(idx_r >= 0) {
data[idx_r] = data_r[idx_r];
indexes[idx_r] = indexes_r[idx_r];
idx_r--;
}
}
Find any bugs or a better way of doing it? Please let me know.



AWESOME! I just now caught the references in your example array…
Maybe your next code sample can have something to do with nettles?
hehe perhaps. :)
In Java, the best solution is to use a class which is a tuple (i.e. it has member fields that store the various values – a float and a String in your example).
You can then create a Comparator that only looks at one of the values (in your example, its implementation of compare() could just delegate to Float.compare() to compare the balances). For speed you could eliminate function calls and implement compare() using ‘if’ expression. Also for speed, access public fields instead of using accessor methods.
Then an array of that class can easily be sorted using java.util.Arrays.sort() – which uses a slightly modified merge-sort.
If you absolutely cannot create a class to store the various values, you could also wrap the pair (or n-tuple) of arrays in an concrete subclass of java.util.AbstractList (which implements set()) and use java.util.Collections.sort().
The major downside to Collections.sort() is that makes a copy of the list and then sorts. It does this to insure constant random access time (otherwise sorting a LinkedList would be really slow). They really need to add a Collections.sortInPlace() that skips this step – then programmers could choose one or the other based on the concrete implementation of List they are using…
Another solution could be to not even sort the data and use a cross-reference array that is sorted. This would definitely speed up the sort operation but incurs slight overhead (memory and time to create the cross-reference array). But it involves accessing less data. It would have a slight hit on sorted iteration because you have double-indirection to access an element:
———————————————————————————————–
import java.util.Arrays;
import java.util.Comparator;
public class ArrayTupleTest {
public static Integer[] getSortedIndices(final double[] data) {
Integer ret[] = new Integer[data.length];
for (int i = 0; i < data.length; i++)
ret[i] = i;
Arrays.sort(ret, new Comparator() {
public int compare(Integer o1, Integer o2) {
return Double.compare(data[o1], data[o2]);
}
});
return ret;
}
public static void main(String args[]) {
String names[] = {“Marjory Stewart-Baxter”, “Hubert Cumberdale”, “Barbara Logan-Price”, “Jeremy Fisher”, “Mable”};
double balances[] = {25.61, 13.45, 89.75, 21.2, 96.50};
Integer order[] = getSortedIndices(balances);
for (int index : order) {
System.out.println(“Balance for “+names[index]+” is “+balances[index]);
}
}
}
I guess I forgot to use HTML tags to preserve indention in my code sample…
BTW, I also just realized the name reference. Demented…
Hey – I also meant to ask: what plugin are you using to do the code formatting stuff?
Another (final) note: merge sort is also used by java collections framework, despite the memory overhead, because it has guaranteed runtime performance. In-place O(n log n) sort algorithms, like quick-sort, can have worst-case run-time of O(n^2).
I saw my face up there four times in a row. And despite suggesting my previous comment would be my last, I couldn’t help myself. Now there are five of me!
:)
Yeah I chose merge sort for the better worst case runtime, even though in most cases quicksort runs faster. Also, I haven’t coded quicksort since my algorithms class three years ago, whereas merge sort is something I’ve had to do a few times for various reasons.
I like your idea to just sort the indexes. The upside is that I would create half the arrays. I think the downside would be the function call overhead that would be incurred at each compare, which would be a fair bit more than just a compare of two primitives (a > b). It’d be worth doing some testing here to see how java handles it.
Oh yeah, I use the built-in plugin that wordpress.com offers (i’m not running the wordpress software myself, using the hosted version). There are some problems with it, like if I want to edit this post, I have to recopy and paste the code with
to hardcode the spaces. Kind of annoying, but definitely better than editing the html myself to make the color codes. Also, some of my symbols are screwed up like < is<and other craziness.http://faq.wordpress.com/2007/09/03/how-do-i-post-source-code/
Also, a minor correction in your code, the line
Arrays.sort(ret, new Comparator() {needs to be
Arrays.sort(ret, new Comparator<Integer>() {at least in java 5 and 6.
I did use java 5 to compile. I just ignored the warning about an unchecked type cast…
BTW, here is another class that uses a different approach. I copied the source from java.util.Arrays.sort() for merge-sort. It differs from your implementation in that it has only a single memory allocation (instead of one in each recursive call). It allocates another array that is the entire length of the first and then uses partitions of that array as the merge buffer.
The class I wrote (hyperlinked above) is similar to zipping arrays a la python.
Another approach I thought of was to use a TreeMap (always sorted because it uses a red-black BST instead of a hash table). You could extend it so that it also implemented List interface and internally used an ArrayList for fast random access. Iterating over the entrySet() would always be in order. The drawbacks to this approach are two-fold: data structure incurs more memory overhead, and it is only sorted by the map’s key. If you wanted more than two pieces of correlated data per entry, you could not sort the set based on any arbitrary piece of data.
Ah, sorry, my compiler didn’t give me an option about the unchecked type cast and wouldn’t compile at all. I’m using 6 though, so maybe that’s the problem. Or else something creeping up from the dark not-quite-platform-independent side of java (ie the different platform-specific implementations of the compiler).
I really like the class you linked to. Very different way of approaching it than I would have thought of.
[...] Posted by Jason Adams in algorithms, code, collections, java, programming, sorting. Leave a Comment I’m always a little annoyed I have to implement sorting Map keys by their values myself in Java. It seems like they should be a part of the standard Collections library or something. Maybe they are and I just haven’t seen it? Here’s my solution, based on feedback from Josh in the comments to a previous post. [...]
Excellent!
I have scoured the net to find a decent way to keep track of the indexes to no avail until now.
However is there a way to do this without recursion and more importantly without a call-by-reference?
I ask this because in Python you cannot control the way arguments are passed and since this function does not return sorted array but rather sort the arrays themselves it can get a bit messy.
Here is the code Jason posted but for Python. I tested and it looks like it works fine…
def mergeSort(indexes,data):
noIndexes=len(indexes)
#A quick check to ensure equal sized data and indexes
if noIndexeslen(data):
print “The length of the indexes and data must be equal”
return None
if noIndexes==1: #End of the recursive run
return None
#Find the integer division and modulo for the data length
split=noIndexes//2
mod=noIndexes%2
#Split both arrays into 2 arrays
indexes_l=indexes[0:split]
data_l=data[0:split]
indexes_r=indexes[split:noIndexes]
data_r=data[split:noIndexes]
#Sort each half of the array-recursion
mergeSort(indexes_l,data_l)
mergeSort(indexes_r,data_r)
#actual merging
idx_l=len(indexes_l)-1
idx_r=len(indexes_r)-1
while (idx_l>=0 and idx_r>=0):
#add the smaller item to the right of the array
#this produces descending order, in case of tie
#right side goes in first
if data_r[idx_r]>data_l[idx_l]:
data[idx_r+idx_l+1]=data_l[idx_l]
indexes[idx_r+idx_l+1]=indexes_l[idx_l]
idx_l-=1
else:
data[idx_r+idx_l+1]=data_r[idx_r]
indexes[idx_r+idx_l+1]=indexes_r[idx_r]
idx_r-=1
while idx_l>=0:
data[idx_l] = data_l[idx_l]
indexes[idx_l] = indexes_l[idx_l]
idx_l-=1
while idx_r>=0:
data[idx_r] = data_r[idx_r]
indexes[idx_r] = indexes_r[idx_r]
idx_r-=1