samefile man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

SAMEFILE(1)			      JS			   SAMEFILE(1)

NAME
       samefile - find identical files

SYNOPSIS
       samefile [-g size] [-l | -r] [-s sep] [-0aiqVvx]

DESCRIPTION
       samefile	 reads a list of filenames (one filename per line) from stdin.
       For each filename pair with identical contents, a  line	consisting  of
       six  fields  is output: The size in bytes, two filenames, the character
       ``='' if the two files are on the same device, ``X'' otherwise, and the
       link counts of the two files.  The output is sorted in reverse order by
       size as the primary key and the filenames as the secondary key.

OPTIONS
       -0     Indicates that the input list of file names is  NUL  terminated,
	      for example as generated by implementations of find(1) that sup‐
	      port the -print0 option.	Without this option,  the  file	 names
	      are assumed to be newline terminated.

       -a     Do not sort files with same size alphabetically.

       -g size
	      Compare only files with size greater than size bytes. Default is
	      0.

       -i     Allow files with the same device/i-node pair to be added to  the
	      binary  tree.  This  might  be useful if output will be fed into
	      some other program.  If this option is used, the statistics dis‐
	      played  when using -v will not contain the ``You have a total of
	      x bytes in identical files'' line because	 -i  prohibits	proper
	      calculation of this value.

       -l     Do  not  check  if  files with identical contents are hard links
	      created by ln(1).	 By default, samefile  checks  if  files  with
	      identical	 contents  are	hard linked and, if they are, does not
	      write a name pair to stdout. A slight  speedup  is  gained  when
	      using  this  option.   This  option  is incompatible with the -r
	      option.

       -q     Do not issue warning messages  when  open(2)  fails.   When  you
	      encounter such a warning, open probably failed due to a 'permis‐
	      sion denied' error on files or directories for which you have no
	      read permission.	Useful if you are not root and want to compare
	      your files against files in a system directory like /etc

       -r     Report whether identical files are hard linked.	The  separator
	      string  followed	by  the	 [bracketed] link count is appended to
	      each name pair if they are hard links  created  with  ln.	  This
	      option  is  incompatible with the -l option. Note that this kind
	      of output has only four fields and will appear  unsorted	before
	      the actual output of samefile.

       -s sep Use  string sep as the output field separator, defaults to a tab
	      character. Useful if filenames contain tab characters and output
	      must be processed by another program, say awk(1).

       -V     Print the version information and exit.

       -v     verbose mode. Write some statistical messages about memory usage
	      and work reduction as well as the sum of the sizes of all	 iden‐
	      tical files to stderr.

       -x     Switch  off  intelligence.  This	option	prevents samefile from
	      being smart. If files file1, file2 and file3 are	identical,  it
	      will  do	3 comparisons instead of just the two needed and write
	      more output. See the discussion under INTERNALS why  this	 could
	      be  useful.   If	this  option is used, the statistics displayed
	      when using -v will not contain the ``You have a total of x bytes
	      in  identical  files'' line because -x prohibits proper calcula‐
	      tion of this value.

INTERNALS
       samefile uses two stages to give optimum performance.

       In the first stage,  all	 non-plain  files  are	skipped	 (directories,
       devices,	 FIFOs,	 sockets,  symbolic  links) as well as files for which
       stat(2) fails and files that have a size less than or  equal  to	 size.
       Output of the first stage (the filenames) is written into a binary tree
       with one node for every file size.  It is  also	at  this  early	 stage
       where  checks  for hard links are done. If hard links are found, and -r
       is requested, the name pairs are output immediately.  The whole list of
       hard  linked  name pairs will therefore appear before any output of the
       second stage.

       For any i-node only one filename will  be  added	 to  the  binary  tree
       (unless -i was requested.)

       In the second stage all files having the same size are compared against
       each other. The rules of mathematical logic are applied to reduce  work
       and  output  noise  (unless -x is requested): if files a, b, and c have
       the same size and samefile finds that a = b and a = c then it will  not
       compare	b  against c (and will not output a line for b and c) but only
       for a = b and a = c. Note however, that because only the first filename
       per  i-node gets into the second stage, the output for a group of iden‐
       tical files with different i-node numbers is  also  minimized.  Suppose
       you  have six identical files of size 100 in an i-node group consisting
       of the three i-nodes with numbers 10,  20  and  30  (the	 term  'i-node
       group' has nothing to do with the i-node group notion of some file sys‐
       tems - it merely refers to a set of i-nodes addressing files with iden‐
       tical contents):

       $ ls -i
	  10 file1     20 file4	    30 file6
	  10 file2     20 file5
	  10 file3
       $ ls | samefile
       100     file1   file4   =       3       2
       100     file1   file6   =       3       1

       The  sum	 of  the sizes in the first column is the amount of disk space
       you could gain by making all 6 files links to only one file  or	remove
       all  but	 one  of  the files. To be precise, disk space is allocated in
       blocks - you will probably gain two blocks here, rather than 200 bytes.
       Note  that  it  is not enough to just remove file4 and file6 (you would
       gain only 100 bytes because file5 still exists.) The proper way	is  to
       use the -i option.  The output will look like

       100     file1   file2   =       3       3
       100     file1   file3   =       3       3
       100     file1   file4   =       3       2
       100     file1   file5   =       3       2
       100     file1   file6   =       3       1

       Removing	 all  files  listed  in the third field will leave only file1.
       Making all files hard links to file1 is easy. If the fourth field is  a
       ``=''  do  a  forced hard link.	If you need to know about all combina‐
       tions of identical files, then you use both the -i and -x option.  This
       produces

       $ ls | samefile -ix
       100     file1   file2   =       3       3
       100     file1   file3   =       3       3
       100     file1   file4   =       3       2
       100     file1   file5   =       3       2
       100     file1   file6   =       3       1
       100     file2   file3   =       3       3
       100     file2   file4   =       3       2
       100     file2   file5   =       3       2
       100     file2   file6   =       3       1
       100     file3   file4   =       3       2
       100     file3   file5   =       3       2
       100     file3   file6   =       3       1
       100     file4   file5   =       2       2
       100     file4   file6   =       2       1
       100     file5   file6   =       2       1

EXAMPLES
       Find all identical files in the current working directory:

       $ ls | samefile

       Find  all  identical  files in my HOME directory and subdirectories and
       also tell me if there are hard links:

       $ find $HOME -type f | samefile -r

       Find all identical files in the /usr directory  tree  that  are	bigger
       than  10000 bytes and write the result to usr.dups (that one is for the
       sysadmin folks, you may want to 'amp' - put it in the  background  with
       the ampersand & - this command because it takes a few minutes.)

       $ find /usr -type f | samefile -g 10000 >usr.dups

DIAGNOSTICS
       You will see a short usage message if you use an invalid option.

       malloc - free = xxxx
	      I	 didn't	 free  the  memory I've malloc(3)ed.  You found a bug.
	      Please report it to the author.

       Allocation failed for 'expr' ...
	      Oops! You ran out of virtual memory. You must have  a  real  big
	      filename	list.  Try  to use a smaller one or increase resources
	      available to your processes.  For more information see ulimit(1)
	      or your similar shell builtin.

SEE ALSO
       ln(1), find(1), rm(1), df(1)

BUGS
       There are no known bugs. The source has been lint(1)ed and all possible
       care has been taken while coding. If you find a bug (or miss a feature)
       please contact the author.

HOME
       The official samefile home page www.schweikhardt.net/samefile/ is main‐
       tained by the author Jens Schweikhardt - schweikh at  schweikhardt  dot
       net

				 7 AUGUST 2005			   SAMEFILE(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net