Imagine working in a galaxy far far away for Lord Vader. One day he gives you a long list of plain text file and you need to sort them quickly or he will force choke you. All you have is a terminal. What would you do?
Luckily, the force is with you and your terminal probably has a sort
method to sort data. Here is what you will learn:
- Basic sort
- Basic sort rules
- Reverse sort
- Numerical sort
- Month sort
- Random sort
- Unique sort
- Key sort
- Combining sort with other commands
- Conclusion
Basic sort
Suppose you have a file with droid names (droid.txt):
AZI-3
2-1B
Buzz Droid
IG-88
Battle Droid
Droideka
C-3PO
R2-D2
To do basic sort, you do:
sort droid.txt
On terminal it returns:
2-1B
AZI-3
Battle Droid
Buzz Droid
C-3PO
Droideka
IG-88
R2-D2
If you notice, sort
doesn't actually change droid.txt
. If you want to mutate the original file, use -o
(for output):
sort droid.txt -o droid.txt
Personally, I don't like mutating any file. I like to keep the result separate from original file, so I would save it inside a different file:
sort droid.txt -o droid-sorted.txt
Basic sort rules
Let's discuss how sort
sort things. There are probably more rules than I have below, but these should be enough to get started.
- Lowercase letters have higher priorities than the same the letters that are uppercased.
- Letters are sorted in alphabetical order, example: "a" comes before "b", "c" comes before "x".
- Numbers come before letters.
- When first characters on both lines are similar, sort will sort them on the next difference. Meaning if you have:
racekar
racecar
The first 4 characters are the same: "r-a-c-e". The 5th letters are not: "k" vs "c". Sort will use this difference to determine who goes first.
Reverse sort
To reverse the order, you can use -r
(reverse) option.
sort -r droid.txt
You'll get:
R2-D2
IG-88
Droideka
C-3PO
Buzz Droid
Battle Droid
AZI-3
2-1B
Numerical sort
Suppose you have number.txt
containing:
234
39
1000
59
7
When you run sort number.txt
, you get:
1000
234
39
59
7
This is probably not what you expect. When sort
looks at the first character on each line, it sees "2,3,1,5,9". It sorts them into "1,2,3,5,7", which are technically the correct sorting order. You need to tell sort to sort them numerically with -n
(numeric sort) option:
sort -n number.txt
This is more like it:
7
39
59
234
1000
Alternatively, you can use its longform option:
sort --sort=numeric number.txt
Month sort
Sort is also capable of sorting months. For example, if you have months.txt
:
February
December
June
August
January
Normal sort
would just sort based ontheir first characters. You need to tell them that you have month objects that needed sorting with -M
(month) or --sort=month
:
sort -M months.txt
Result:
January
February
June
August
December
Random sort
Sort can also create randomness instead of order. You can use -R
or --sort=random
. Suppose you have abc.txt
containing already sorted list:
At-At Walker
Boba Fett
C3-P0
Darth Vader
Ewok
If you run:
sort -R abc.txt
You get a randomized result:
Boba Fett
At-At Walker
Ewok
Darth Vader
C3-P0
Running it multiple times will give different results each time.
Unique sort
Sometimes you get duplicate items. Sort has an option to remove duplicates: -u
(unique).
Suppose you have this list with duplicates:
2-1B
AZI-3
Jar Jar Binks
Battle Droid
Buzz Droid
C-3PO
Jar Jar Binks
Droideka
IG-88
Jar Jar Binks
R2-D2
The list has 3 Jar Jar Binks. Let's remove the duplicates:
sort -u droid.txt
You'll get:
2-1B
AZI-3
Battle Droid
Buzz Droid
C-3PO
Droideka
IG-88
Jar Jar Binks
R2-D2
Much better. There is only 1 Jar Jar Binks now.
Key sort
One powerful sort
feature is that it can sort based on column "key".
Suppose you have a list of legendary basketball players and their jersey number inside basketball.txt
:
Shaquille O'Neal 34
Kobe Bryant 8
Magic Johnson 32
Kareem Abdul-Jabbar 33
Michael Jordan 23
Stephen Curry 30
Running regular sort:
sort basketball.txt
You get a list sorted by first name:
Kareem Abdul-Jabbar 33
Kobe Bryant 8
Magic Johnson 32
Michael Jordan 23
Shaquille O'Neal 34
Stephen Curry 30
What if you want to sort them based on last name? To tell sort to use the second column (or "field") as the sorting basis, use -k
option. Since the last names are on the second column, you can use -k 2
.
sort -k 2 basketball.txt
You get:
Kareem Abdul-Jabbar 33
Kobe Bryant 8
Stephen Curry 30
Magic Johnson 32
Michael Jordan 23
Shaquille O'Neal 34
Holy flexibility! That's awesome. These options can be stacked. What if you need to sort them based on their jersey number? Don't forget, you can't just do sort -k 3 basketball.txt
, because sort
can't tell that you need to sort them numerically. Instead, you tell it to "sort based on 3rd column numeric values":
sort -k 3n basketball.txt
You get:
Kobe Bryant 8
Michael Jordan 23
Stephen Curry 30
Magic Johnson 32
Kareem Abdul-Jabbar 33
Shaquille O'Neal 34
There you go. Much better. Rest in peace, Kobe. You are the legend.
By the way, if you want to sort them based on jersey number, but in reverse, you can do:
sort -k 3nr basketball.txt
Extra: more on keys
If you read the man page (man sort
), you'll notice that on -k
, it says:
-k field1[,field2]
What is this field1
and field2
from manual page?
In the context of -k field1,field2
, it allows us to use field1
as starting position and field2
as ending position. When I first read it, it didn't ring a bell - until I see it in action. Let's do some examples. Suppose you have this list misc.txt
:
Hullo a4 almost s1me
Hello c2 almost s3me
Hillo b3 almost s2me
Hallo d1 almost s4me
Let's run a different sort
commands and observe what they return:
sort misc.txt
Returns:
allo d1 almost s4me
Hello c2 almost s3me
Hillo b3 almost s2me
Hullo a4 almost s1me
It sees the "a,e,i,u" in "Hello, Hello, Hillo, Hullo" and sort them.
sort -k 2 misc.txt
Returns:
Hullo a4 almost s1me
Hillo b3 almost s3me
Hello c2 almost s2me
Hallo d1 almost s4me
It sorts based on second column. "a,b,c,d" in "a4, b3, c2, d1".
sort -k 3 misc.txt
Returns:
Hullo a4 almost s1me
Hillo b3 almost s2me
Hello c2 almost s3me
Hallo d1 almost s4me
It sorts based on 3rd column as the start. Since 3rd column is the same all across ("almost"), it goes to the next column and found that "s1me, s3me, s2me, s4me" are not in order. It sorts them based on that.
sort -k 2,3 misc.txt
Returns
Hullo a4 almost s1me
Hillo b3 almost s2me
Hello c2 almost s3me
Hallo d1 almost s4me
It sorts based on columns 2 and 3 only. If you notice, the first differences are "a,b,c,d", so it sorts based on those.
Combining sort with other commands
Just like any good Unix tool, sort
can be chained with other terminal commands. For example, what if you want to sort your basketball.txt
based on jersey numbers, but display only the last names and jersey number, with even alignments?
You can use awk
to grab their last names and column
to do the alignments:
sort -k 3n basketball.txt | awk '{print $2 "\t" $3}' | column -t
Returns:
Bryant 8
Jordan 23
Curry 30
Johnson 32
Abdul-Jabbar 33
O'Neal 34
Saves you hours of manual editing. Man, I love terminals.
Conclusion
Sort is useful to sort a structured data from terminal. In this article, you learned how to do basic sort, save your sorted data into a new file, reverse-sort, and sort based on X column. You also learn how to sort numbers and months. Sort can also be used to create random order. Finally, these options can be stacked together to create a more complex operation.
Thanks for reading.
May the sort be with you.