Unix tools amaze me on a daily basis. Maybe I’m easy to amaze… Today I was doing a pretty good size data migration. I had to pull some 30,000 products based on two differing criteria: items selling at once in the last two years, plus any products added in the last 60 days or so, regardless of sales. Two pretty wicked SQL queries later and I had the data I was looking for dumped out to CSV. One last step was to merge the two csv files, filtering out the uniques between the two and sorting.

I could have gone back and done this by creating temp tables, some more joins, pinch of black magic, but it turns out it was much faster to pump them through:

cat file1.csv file2.csv | sort | uniq > file3.csv

This little one liner takes the contents of both file1 and file2, sorts them, removes duplicates, and pumps the resulting uniques out to file3. On files containing some 30k lines, this finishes in seconds. Seconds.