2.3.18. `vacumm.misc.file` – File utilities¶

Functions:

efind()
find()
mkdirs()
mkfdirs()

rollover()
strfsize()
strpsize()
tfind()

walk()
xefind()
xfind()

2.3.18.1. File utilities¶

This module provides various file related features:

filesystem traversal with depth support
file search, wildcard or regex based
file rollover (backup)
size parsing and formatting
directory creation without error on existing directory

efind(*args, **kwargs)[source]¶: Build a list from the xefind() generator

find(*args, **kwargs)[source]¶: Build a list from the xfind() generator

mkdirs(d)[source]¶

Create a directory, including parents.

Params:	d: the directory, or list of directories, that may be created
Return:	created: For a single directory: d if directory has been created, ‘’ otherwise (already exists). For a list of directories, the list of directories which have been created.

mkfdirs(f)[source]¶

Create a file directory, including parents. This may be used before writing to a file to ensure the parent directories exists.

Params:	f: the file, or list of files, for which the directory may be created
Return:	created: For a single file: f directory if it has been created, ‘’ otherwise (already exists). For a list of files, the list of f directories which have been created were created

rollover(filepath, count=1, suffix='.%d', keep=True, verbose=False)[source]¶

Make a rollover of the specified file. Keep a certain number of backups of a file by renaming them with a suffix number.

Params:	filepath: the file to make a backup of count: maximum number of backup files suffix: suffix to use when renaming files, must contain a ‘%d’ marker which will be used to mark backup number keep: whether to keep existing file in addition to the backup one
Return:	True if a backup occured, False otherwise (count is 0 or filepath does not exists)

strfsize(size, fmt=None, unit=None, si=False, suffix=True)[source]¶

Format a size in bytes using the appropriate unit multiplicator (Ko, Mo, Kio, Mio)

Params:	size: the size in bytes fmt: the format to use, will receive size and unit arguments, if None formats “%(size).3f %(unit)s” or “%(size)d %(unit)s” will be automatically used. unit: use an auto determinated unit if None, or the given one among K, M, G, T, P, E, Z, Y si: whether to use SI (International System) units (10^3, …) or binary units (2^10, …)
Return:	a string

strpsize(size, si=False)[source]¶

Parse a size in Ko, Mo, Kio, Mio, …

Params:	size: the size string (eg. “1Ko”, “1Kio”, “2 Mo”, ” 10 Go” si: when unit does not ends with ‘io’ force interpretation as International System units (10^3, …) instead of binary units (2^10, …)
Return:	the float number of bytes

tfind(regex, path=None, fmt='%Y-%m-%dT%H:%M:%SZ', min=None, max=None, group=None, getdate=False, getmatch=False, xmin=False, xmax=True, **kwargs)[source]¶

Find timestamped paths (e.g. files having a date string in their paths)

See:	func:find for regex, path and kwargs arguments.

The regex regular expression must define at least one group which describe the date string location in paths.

Params:

fmt: (python) date format
min: minimum date filter: a datetime object or a date string in fmt format. None means no max date filtering.
max: maximum date filter: a datetime object or a date string in fmt format. None means no max date filtering.
group: the regex group(s) number(s) or name(s): one or a list of string or integer. None means all groups.
xmin: if True, min is exclusive
xmax: if True, max is exclusive

The group(s) can be specified either by their number or name. These group will be concatenated to form the date that will be parsed.

Examples:

Assuming we are lokking for the follwing files:

path/to/data/data_2010-01-01T00H.nc

path/to/data/data_2010-01-01T12H.nc

path/to/data/data_2010-01-02T00H.nc

path/to/data/data_2010-01-02T12H.nc

The commands below will have the same result:

>>> items = tfind('data_(.*)\.nc', 'path/to', '%Y-%m-%dT%HZ', depth=2)
>>> items = tfind('data_(....-..-..T..Z)\.nc', 'path/to/data', '%Y-%m-%dT%HZ')

Same but more precise / advanced examples:

>>> items = tfind('data_([0-9]{4}-[0-9]{4}-[0-9]{4}T[0-9]{2}Z)\.nc', 'path/to/data', '%Y%m%dT%HH')
>>> items = tfind('data_([0-9]{4})-([0-9]{4})-([0-9]{4})T([0-9]{2})Z\.nc', 'path/to/data', '%Y%m%d%H')
>>> items = tfind('(data)_(?P<y>[0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2})Z\.nc', 'path/to/data', '%Y%m%d%H', group=('y',3,4,5)))

Return:

Depending on getdate and getmatch, a list in the form:

If getdate=False and getmatch=False: [path1, path2, …]
If getdate=False and getmatch=True: [(path1, match1), (path2, match1), …]
If getdate=True and getmatch=False: [(path1, datetime1), (path2, datetime2), …]
If getdate=True and getmatch=True: [(path1, matchobj1, datetime1), (path2, matchobj2, datetime2), …]

walk(top, topdown=True, onerror=None, followlinks=False, depth=None, onfile=None, ondir=None, _depth=0)[source]¶

New implementation of os.walk with depth support to avoid unnecessary large scans. This yield a supplementary depth value for each walk (top, dirs, nondirs, depth)

Params:	depth: Limit the depth of walk: None: no limit 0: limited to top directory entries 1: limited to first directory under the top directory N: limited to Nth directory under the top directory

Warning

Do not use the _depth attribute as it is used to track the current depth in the yield processing

See:	`os.walk()` for more details on other parameters.

xefind(regex, path=None, depth=0, files=True, dirs=False, matchall=False, abspath=True, exclude=None, followlinks=False, expandpath=True, onerror=None, onfile=None, ondir=None, onmatch=None, getmatch=False, rexflags=None, xrexflags=None)[source]¶

Find paths matching the regex regular expression.

Params:

regex: the file regular expression
path: if not None, entries are searched from this location, otherwise current directory is used
depth: if not None, it designate the recursion limit (0 based, None for no limit, see walk function)
files: if False, file entries will not be returned
dirs: if False, directory entries will not be returned
matchall: if False, only file/directory names are evaluated, entire path otherwise
abspath: if True, returned paths are absolute
exclude: if not None, it designate a regular expression which will be used to exclude files or directories
getmatch: if True, return a list of (path, match_object) couples
followlinks: if True, symbolic links will be walked (see walk function)
regexflags: if not None, it will be used as regex compile flags
xregexflags: if not None, it will be used as exclude regex compile flags
expandpath: if True, environment variables and special character ~ will be expanded in the passed search path

Example:

>>> find('.*\.nc', '/path/to/data')
['/path/to/data/data_2010-01-01.nc', '/path/to/data/data_2010-01-02.nc', ...]

>>> filelist = find('data_([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})\.nc', 'data', getmatch=True, abspath=False)
>>> for filepath, matchobj in filelist:
>>>     print filepath, ':', matchobj.groups()
data/data_2010-01-1.nc : ('2010', '01', '1')
data/data_2010-01-10.nc : ('2010', '01', '10')

xfind(pattern, path=None, depth=0, files=True, dirs=False, matchall=False, abspath=True, exclude=None, followlinks=False, expandpath=True, onerror=None, onfile=None, ondir=None, onmatch=None)[source]¶

Find paths matching the pattern wildcard.

Params:

pattern: pattern or list of patterns using special characters *,?,[seq],[!seq] (see standard module fnmatch)
path: if not None, entries are searched from this location, otherwise current directory is used
depth: if not None, it designate the recursion limit (0 based, None for no limit, see walk function)
files: if False, file entries will not be returned
dirs: if False, directory entries will not be returned
matchall: if False, only file/directory names are evaluated, entire path otherwise
abspath: if True, returned paths are absolute
exclude: if not None, it designate a pattern or list of patterns which will be used to exclude files or directories
followlinks: if True, symbolic links will be walked (see walk function)
expandpath: if True, environment variables and special character ~ will be expanded in the passed search path

Example:

>>> find('*.nc', '/path/to/data')
['/path/to/data/data_2010-01-01.nc', '/path/to/data/data_2010-01-02.nc', ...]

>>> find(('*.nc', '*.grb'), '/path/to/data', depth=1, exclude=('*-01.nc', '*02.grb'))
['/path/to/data/data_2010-01-02.nc', '/path/to/data/grib/data_2010-01-01.grb', ...]

Table Of Contents

This Page

2.3.18. `vacumm.misc.file` – File utilities¶

2.3.18.1. File utilities¶

2.3.18. vacumm.misc.file – File utilities¶

2.3.18.1. File utilities¶

2.3.18. `vacumm.misc.file` – File utilities¶