uniq.py
The need to find unique elements within columns of different files also very common.
Thus when you install the bio package another script called uniq.py is also installed.
It is a tool that prints the unique elements from a column.
Using comm.py
If file 1 contains:
A
B
C
A
B
then the command:
cat foo | uniq.py
will print:
A
B
C
The flag -c used as:
cat foo | uniq.py -c
will print:
2           A
2           B
1           C
Why does uniq.py exist?
We could use the UNIX construct
sort | uniq -c | sort -rn
or the entrez direct tool called:
sort-uniq-count-rank
I don’t usually advocate rewriting UNIX tools, in this case, writing a better uniq makes a lot of sense.
Usage
uniq.py -h
usage: uniq.py [-h] [-f 1] [-d ''] [-c]
optional arguments:
  -h, --help         show this help message and exit
  -f 1, --field 1    field index (1 by default)
  -d '', --delim ''  delimiter (guess by default)
  -c, --count        produce counts