uniq.py: find unique elements
The need to find unique elements within columns of different files is very common.
Thus when you install the
bio package another script called
uniq.py is also installed.
This software prints the unique elements from a column.
If file 1 contains:
A B C A B
then the command:
A B C
-c used as:
uniq.py -c file_1.txt
2 A 2 B 1 C
uniq.py can be used from standard input:
cat file_1.txt | uniq.py -c
We could use the UNIX construct:
sort | uniq -c | sort -rn
the problem with the above is that the columns it prints are not tab separated. We may also use the entrez direct tool called:
but for that
entrez-direct must be installed.
uniq.py can read different columns of a file and the delimiter may be changed as well. Read the second columns of three comma separated files:
uniq.py -c -d , -f 2 file1 file2 file3
I don’t usually advocate rewriting UNIX tools, in this case, writing a better
uniq makes a lot of sense.
usage: uniq.py [-h] [-f 1] [-d ''] [-c] [fnames [fnames ...]] positional arguments: fnames file names optional arguments: -h, --help show this help message and exit -f 1, --field 1 field index (1 by default) -d '', --delim '' delimiter (tab by default) -c, --count produce counts