2.3.18. vacumm.misc.file – File utilities

Functions:

2.3.18.1. File utilities

This module provides various file related features:
  • filesystem traversal with depth support
  • file search, wildcard or regex based
  • file rollover (backup)
  • size parsing and formatting
  • directory creation without error on existing directory
efind(*args, **kwargs)[source]

Build a list from the xefind() generator

find(*args, **kwargs)[source]

Build a list from the xfind() generator

mkdirs(d)[source]

Create a directory, including parents.

Params:
  • d: the directory, or list of directories, that may be created
Return:
  • created: For a single directory: d if directory has been created, ‘’ otherwise (already exists). For a list of directories, the list of directories which have been created.
mkfdirs(f)[source]

Create a file directory, including parents. This may be used before writing to a file to ensure the parent directories exists.

Params:
  • f: the file, or list of files, for which the directory may be created
Return:
  • created: For a single file: f directory if it has been created, ‘’ otherwise (already exists). For a list of files, the list of f directories which have been created were created
rollover(filepath, count=1, suffix='.%d', keep=True, verbose=False)[source]

Make a rollover of the specified file. Keep a certain number of backups of a file by renaming them with a suffix number.

Params:
  • filepath: the file to make a backup of
  • count: maximum number of backup files
  • suffix: suffix to use when renaming files, must contain a ‘%d’ marker which will be used to mark backup number
  • keep: whether to keep existing file in addition to the backup one
Return:

True if a backup occured, False otherwise (count is 0 or filepath does not exists)

strfsize(size, fmt=None, unit=None, si=False, suffix=True)[source]

Format a size in bytes using the appropriate unit multiplicator (Ko, Mo, Kio, Mio)

Params:
  • size:
    the size in bytes
  • fmt:
    the format to use, will receive size and unit arguments, if None formats “%(size).3f %(unit)s” or “%(size)d %(unit)s” will be automatically used.
  • unit:
    use an auto determinated unit if None, or the given one among K, M, G, T, P, E, Z, Y
  • si:
    whether to use SI (International System) units (10^3, …) or binary units (2^10, …)
Return:

a string

strpsize(size, si=False)[source]

Parse a size in Ko, Mo, Kio, Mio, …

Params:
  • size: the size string (eg. “1Ko”, “1Kio”, “2 Mo”, ” 10 Go”
  • si: when unit does not ends with ‘io’ force interpretation as
    International System units (10^3, …) instead of binary units (2^10, …)
Return:

the float number of bytes

tfind(regex, path=None, fmt='%Y-%m-%dT%H:%M:%SZ', min=None, max=None, group=None, getdate=False, getmatch=False, xmin=False, xmax=True, **kwargs)[source]

Find timestamped paths (e.g. files having a date string in their paths)

See:func:find for regex, path and kwargs arguments.

The regex regular expression must define at least one group which describe the date string location in paths.

Params:
  • fmt: (python) date format
  • min: minimum date filter: a datetime object or a date string in fmt format. None means no max date filtering.
  • max: maximum date filter: a datetime object or a date string in fmt format. None means no max date filtering.
  • group: the regex group(s) number(s) or name(s): one or a list of string or integer. None means all groups.
  • xmin: if True, min is exclusive
  • xmax: if True, max is exclusive

The group(s) can be specified either by their number or name. These group will be concatenated to form the date that will be parsed.

Examples:

Assuming we are lokking for the follwing files:

  • path/to/data/data_2010-01-01T00H.nc
  • path/to/data/data_2010-01-01T12H.nc
  • path/to/data/data_2010-01-02T00H.nc
  • path/to/data/data_2010-01-02T12H.nc

The commands below will have the same result:

>>> items = tfind('data_(.*)\.nc', 'path/to', '%Y-%m-%dT%HZ', depth=2)
>>> items = tfind('data_(....-..-..T..Z)\.nc', 'path/to/data', '%Y-%m-%dT%HZ')

Same but more precise / advanced examples:

>>> items = tfind('data_([0-9]{4}-[0-9]{4}-[0-9]{4}T[0-9]{2}Z)\.nc', 'path/to/data', '%Y%m%dT%HH')
>>> items = tfind('data_([0-9]{4})-([0-9]{4})-([0-9]{4})T([0-9]{2})Z\.nc', 'path/to/data', '%Y%m%d%H')
>>> items = tfind('(data)_(?P<y>[0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2})Z\.nc', 'path/to/data', '%Y%m%d%H', group=('y',3,4,5)))
Return:

Depending on getdate and getmatch, a list in the form:

  • If getdate=False and getmatch=False: [path1, path2, …]
  • If getdate=False and getmatch=True: [(path1, match1), (path2, match1), …]
  • If getdate=True and getmatch=False: [(path1, datetime1), (path2, datetime2), …]
  • If getdate=True and getmatch=True: [(path1, matchobj1, datetime1), (path2, matchobj2, datetime2), …]
walk(top, topdown=True, onerror=None, followlinks=False, depth=None, onfile=None, ondir=None, _depth=0)[source]

New implementation of os.walk with depth support to avoid unnecessary large scans. This yield a supplementary depth value for each walk (top, dirs, nondirs, depth)

Params:
  • depth: Limit the depth of walk:
    • None: no limit
    • 0: limited to top directory entries
    • 1: limited to first directory under the top directory
    • N: limited to Nth directory under the top directory

Warning

Do not use the _depth attribute as it is used to track the current depth in the yield processing

See:os.walk() for more details on other parameters.
xefind(regex, path=None, depth=0, files=True, dirs=False, matchall=False, abspath=True, exclude=None, followlinks=False, expandpath=True, onerror=None, onfile=None, ondir=None, onmatch=None, getmatch=False, rexflags=None, xrexflags=None)[source]

Find paths matching the regex regular expression.

Params:
  • regex: the file regular expression
  • path: if not None, entries are searched from this location, otherwise current directory is used
  • depth: if not None, it designate the recursion limit (0 based, None for no limit, see walk function)
  • files: if False, file entries will not be returned
  • dirs: if False, directory entries will not be returned
  • matchall: if False, only file/directory names are evaluated, entire path otherwise
  • abspath: if True, returned paths are absolute
  • exclude: if not None, it designate a regular expression which will be used to exclude files or directories
  • getmatch: if True, return a list of (path, match_object) couples
  • followlinks: if True, symbolic links will be walked (see walk function)
  • regexflags: if not None, it will be used as regex compile flags
  • xregexflags: if not None, it will be used as exclude regex compile flags
  • expandpath: if True, environment variables and special character ~ will be expanded in the passed search path
Example:
>>> find('.*\.nc', '/path/to/data')
['/path/to/data/data_2010-01-01.nc', '/path/to/data/data_2010-01-02.nc', ...]
>>> filelist = find('data_([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})\.nc', 'data', getmatch=True, abspath=False)
>>> for filepath, matchobj in filelist:
>>>     print filepath, ':', matchobj.groups()
data/data_2010-01-1.nc : ('2010', '01', '1')
data/data_2010-01-10.nc : ('2010', '01', '10')
xfind(pattern, path=None, depth=0, files=True, dirs=False, matchall=False, abspath=True, exclude=None, followlinks=False, expandpath=True, onerror=None, onfile=None, ondir=None, onmatch=None)[source]

Find paths matching the pattern wildcard.

Params:
  • pattern: pattern or list of patterns using special characters *,?,[seq],[!seq] (see standard module fnmatch)
  • path: if not None, entries are searched from this location, otherwise current directory is used
  • depth: if not None, it designate the recursion limit (0 based, None for no limit, see walk function)
  • files: if False, file entries will not be returned
  • dirs: if False, directory entries will not be returned
  • matchall: if False, only file/directory names are evaluated, entire path otherwise
  • abspath: if True, returned paths are absolute
  • exclude: if not None, it designate a pattern or list of patterns which will be used to exclude files or directories
  • followlinks: if True, symbolic links will be walked (see walk function)
  • expandpath: if True, environment variables and special character ~ will be expanded in the passed search path
Example:
>>> find('*.nc', '/path/to/data')
['/path/to/data/data_2010-01-01.nc', '/path/to/data/data_2010-01-02.nc', ...]
>>> find(('*.nc', '*.grb'), '/path/to/data', depth=1, exclude=('*-01.nc', '*02.grb'))
['/path/to/data/data_2010-01-02.nc', '/path/to/data/grib/data_2010-01-01.grb', ...]