uniq.py

The need to find unique elements within columns of different files also very common.

Thus when you install the bio package another script called uniq.py is also installed. It is a tool that prints the unique elements from a column.

Using comm.py

If file 1 contains:

A
B
C
A
B

then the command:

cat foo | uniq.py

will print:

A
B
C

The flag -c used as:

cat foo | uniq.py -c

will print:

2           A
2           B
1           C

Why does uniq.py exist?

We could use the UNIX construct

sort | uniq -c | sort -rn

or the entrez direct tool called:

sort-uniq-count-rank

I don’t usually advocate rewriting UNIX tools, in this case, writing a better uniq makes a lot of sense.

Usage

uniq.py -h
usage: uniq.py [-h] [-f 1] [-d ''] [-c]

optional arguments:
  -h, --help         show this help message and exit
  -f 1, --field 1    field index (1 by default)
  -d '', --delim ''  delimiter (guess by default)
  -c, --count        produce counts