FILEPRUNE(1)							  FILEPRUNE(1)



1mNAME0m
       fileprune - prune a file set according to a given age distribution

1mSYNOPSIS0m
       1mfileprune  22m[1m-n22m|1m-N22m|1m-p22m]  [1m-c	 4m22mcount24m|1m-s  4m22msize24m[1mk22m|1mm22m|1mg22m|1mt22m]|1m-a  4m22mage24m[1mw22m|1mm22m|1my22m]] [1m-e0m
       4mbase24m|1m-g 4m22mstandard24m 4mdeviation24m|1m-f22m] [1m-t a22m|1mm22m|1mc22m] [1m-FK22m] 4mfile24m ...
       1mfileprune -d -n22m|1m-N [-c 4m22mcount24m1m|-a 4m22mage24m1m[w|m|y]] [-e 4m22mbase24m1m|-g 4m22mstandard24m 4mdevia-0m
       4mtion24m1m|-f] [-FK] 4m22mdate24m ...

1mDESCRIPTION0m
       4mFileprune24m  will  delete  files from the specified set targeting a given
       distribution of the files within time as well as size, number, and  age
       constraints.  Its main purpose is to keep a set of daily-created backup
       files in manageable size, while still providing	reasonable  access  to
       older versions.	Specifying a size, file number, or age constraint will
       simply remove files starting from the oldest, until the	constraint  is
       met.   The  distribution specification (exponential, Gaussian (normal),
       or Fibonacci) provides finer control of the files to  delete,  allowing
       the  retention of recent copies and the increasingly aggressive pruning
       of the older files.  The retention schedule specifies the age intervals
       for which files will be retained.  As an example, an exponential reten-
       tion schedule for 10 files with a base of 2 will be

	      1 2 4 8 16 32 64 128 256 512 1024

       The above schedule specifies that for the interval of 65	 to  128  days
       there  should  be  (at least) one retained file (unless constraints and
       options override this setting).	Retention schedules are always	calcu-
       lated  and  evaluated  in integer days.	By default 4mfileprune24m will keep
       the oldest file within each day interval allowing files to migrate from
       one  interval to the next as time goes by.  It may also keep additional
       files, if the complete file set	satisfies  the	specified  constraint.
       The  algorithm used for pruning does not assume that the files are uni-
       formly distributed; 4mfileprune24m will successfully prune file	collections
       stored at irregular intervals.


1mOPTIONS0m
       1m-n	22mDo  not  delete  files;  only  print	 file  names  that would be
	      deleted.

       1m-N	22mDo not delete  files;  only	print  file  names  that  would	 be
	      retained.

       1m-p	22mDo  not  process  files.  Print the specified schedule for 4mcount0m
	      elements.

       1m-c 4m22mcount0m
	      Keep 4mcount24m files.

       1m-s 4m22msize0m
	      Keep files totaling 4msize24m bytes.  The 4msize24m argument can	be  fol-
	      lowed  by	 a  1mk22m,  1mm22m,  1mg22m,  or  1mt 22muppercase or lowercase suffix to
	      express quantities from kilobytes to terabytes.

       1m-a 4m22mage24m Keep files up to the specified 4mage24m.  The  4mage24m	 argument  can	be
	      followed	by  a  1mw22m,	1mm22m, or 1my 22msuffix to specify weeks, months, or
	      years.

       1m-e 4m22mbase0m
	      Use an exponential distribution of  the  specified  4mbase24m  4mb24m  for
	      pruning.	Each successive interval 4mn24m will end at 4mbn24m. As an exam-
	      ple, a base of 2 will retain 10 files in a period of 1024	 days.
	      To  determine  the exponent for keeping 4mn24m files in a period of 4md0m
	      days use the formula 4mexponent24m=4me24mln4m_0m

       1m-g 4m22msd24m	Use a Gaussian (normal) distribution  with  the	 given	4mstandard0m
	      4mdeviation24m  for  the	pruning	 schedule.  The height of the curve

	      with a standard deviation of  is given by the formula 4mf24m(4mx24m)=4m_e24m4m_24m4m_24m4m_0m
	      All  intervals  from  4ma24m  to	4mb24m	are  calculated to have the same
	      4maf24m(4mx24m)4mdx24m The standard deviation is specified in day units;	as  a
	      rule  of	a  thumb  the oldest file retained will have an age of
	      twice the standard deviation.

       1m-f	22mUse a Fibonacci distribution	 for  the  pruning  schedule.	The
	      Fibonacci sequence starts with 1, 1, and each subsequent term is
	      the sum of the two previous ones.

       1m-t a22m|1mfP|c0m
	      For determining a file's age use its  access,  modification,  or
	      creation time.  By default the modification time is used.

       1m-F	22mForce  file pruning even if the size or count constraint has not
	      been exceeded.

       1m-K	22mKeep files scheduled in each pruning interval, even if the  size
	      or count constraint has been exceeded.

       1m-d	22mUse	a list of ISO dates rather than files as an argument of the
	      pruning schedule.	 Each date argument must be of the form	 4mYYYY-0m
	      4mMM-DD24m  [4mhh24m[:4mmm24m[:4mss24m]]].  This option must be used with one of the
	      1m-N 22mor 1m-n 22moptions, and cannot be	 combined  with	 the  1m-t  22mor  1m-s0m
	      options.


1mEXAMPLE0m
       ssh remotehost tar cf - /datafiles >backup/`date +'%Y%m%d'`
       fileprune -e 2 backup/*
       Backup  4mremotehost24m,	 storing  the  result  in a file named with today's
       timestamp (e.g. 20021219).  Prune the files in the backup directory  so
       that  each  retained  file's age will be double that of its immediately
       younger neighbor.

       fileprune -N -d -e 1.2 -c 40 *
       Keep at most 40 files.  This particular	distribution  will  result  in
       daily  copies  for  the	first  fortnight, at least weekly for the next
       month, and almost monthly for the first year.

       fileprune -g 365 -c 30 *
       Keep at most 30 files with their ages  following	 a  Gaussian  (normal)
       distribution with a standard deviation of one year.

       fileprune -e 2 -s 5G *
       Prune  the specified files following an exponential schedule so that no
       more than 5GB are occupied.  More than one  file	 may  be  left	in  an
       interval,  if  the  size	 constraint  is	 met.  Alternatively, some old
       intervals may be emptied in order to satisfy the size constraint.

       fileprune -F -e 2 -s 5G *
       As above, but leave no more than one file in each scheduled interval.

       fileprune -K -e 2 -s 5G *
       As in the first example of the %g-constrained series, but leave exactly
       one  file  in  each  interval,  even if this will violate the size con-
       straint.

       fileprune -a 1m -f
       Delete all files older than one month use; use a Fibonacci distribution
       for pruning the remaining ones.

       SNAPSHOTS=/tmp/snapshots.$$
       ec2-describe-snapshots --filter status=completed |
       awk '$1 == "SNAPSHOT" {print $2, substr($5, 1, 10)}' |
       sort -k2 >$SNAPSHOTS
       fileprune -n -d -e 1.2 -c 40 `awk '{print $2}' $SNAPSHOTS` |
       sort |
       join -1 1 -2 2 -o 2.1 - $SNAPSHOTS |
       xargs -n 1 ec2-delete-snapshot
       rm -f $SNAPSHOTS
       Prune AWS-hosted daily snapshots to leave 40.

1mSEE ALSO0m
       newsyslog(8)

1mAUTHOR0m
       (C) Copyright 2002-2013 Diomidis Spinellis.

1mBUGS0m
       The  Gaussian  (normal) distribution is calculated by trying successive
       increments of the normal function's distribution function.  If the file
       number  or count is large compared to the specified standard deviation,
       the calculation may take an exceedingly long time.  To get results in a
       reasonable  time,  day increments are bounded at 10 times the increment
       of the previous interval and a total age of 100 years.  It is advisable
       to first calculate and print the pruning schedule with a command like
       fileprune -g 100 -p -c 20
       to ensure that the schedule can be calculated.



				8 January 2013			  FILEPRUNE(1)
