| |
- abstract_metadata(spec_json, meta_path)
- Abstract metadata information from a self-contained umbrella spec into a metadata database.
Args:
spec_json: a dict including the contents from a json file
meta_path: the path of the metadata database.
Returns:
If the umbrella spec is not complete, exit directly.
Otherwise, return None.
- add2db(item, source_dict, target_dict)
- Add the metadata information (source format checksum size uncompressed_size) about item from source_dict (umbrella specification) to target_dict (metadata database).
The item can be identified through two mechanisms: checksum attribute or one source location, which is used when checksum is not applicable for this item.
If the item has been in the metadata database, do nothing; otherwise, add it, together with its metadata, into the metadata database.
Args:
item: the name of a dependency
source_dict: fragment of an Umbrella specification
target_dict: fragement of an Umbrella metadata database
Returns:
None
- add2spec(item, source_dict, target_dict)
- Abstract the metadata information (source format checksum size uncompressed_size) from source_dict (metadata database) and add these information into target_dict (umbrella spec).
For any piece of metadata information, if it already exists in target_dict, do nothing; otherwise, add it into the umbrella spec.
Args:
item: the name of a dependency
source_dict: fragment of an Umbrella metadata database
target_dict: fragement of an Umbrella specficiation
Returns:
None
- attr_check(name, item, attr, check_len=0)
- Check and obtain the attr of an item.
Args:
name: the name of the dependency.
item: an item from the metadata database
attr: an attribute
check_len: if set to 1, also check whether the length of the attr is > 0; if set to 0, ignore the length checking.
Returns:
If the attribute check is successful, directly return the attribute.
Otherwise, directly exit.
- cal_new_os_id(sec, old_os_id, pac_list)
- Calculate the id of the new OS based on the old_os_id and the package_manager section
Args:
sec: the json object including the package_manager section.
old_os_id: the id of the original os image without any info about package manager.
pac_list: a list of the required package name.
Returns:
md5_value: the md5 value of the string constructed from binding old_os_id and information from the package_manager section.
install_cmd: the package install cmd, such as: yum -y install python
- cctools_download(sandbox_dir, hardware_platform, linux_distro, action)
- Download cctools
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
linux_distro: the linux distro. For Example: redhat6, centos6.
action: the action on the downloaded dependency. Options: none, unpack. "none" leaves the downloaded dependency at it is. "unpack" uncompresses the dependency.
Returns:
the path of the downloaded cctools in the umbrella local cache. For example: /tmp/umbrella_test/cache/d19376d92daa129ff736f75247b79ec8/cctools-4.9.0-redhat6-x86_64
- check_cvmfs_repo(repo_name)
- Check whether a cvmfs repo is installed on the host or not
Args:
repo_name: a cvmfs repo name. For example: "/cvmfs/cms.cern.ch".
Returns:
If the cvmfs repo is installed, returns the string including the mountpoint of cvmfs cms repo. For example: "/cvmfs/cms.cern.ch".
Otherwise, return an empty string.
- check_parrot_binary_support(host_linux_distro)
- Check whether a parrot binary for the host machine is provided by cctools.
Currently, cctools only provided the parrot binary for redhat5-7 and centos5-7.
If the user the host machine is not any of these, then the user should build their cctools themselves.
Args:
host_linux_distro: the linux distro of the host machine. For Example: redhat6, centos6.
Returns:
None
- chroot_mount_bind(dir_dict, file_dict, sandbox_dir, need_separate_rootfs, hardware_platform, distro_name, distro_version)
- Create each target mountpoint under the cached os image directory through `mount --bind`.
Args:
dir_dict: a dict including all the directory mountpoints needed to be created inside the OS image.
file_dict: a dict including all the file mountpoints needed to be created inside the OS image.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
need_separate_rootfs: whether a separate rootfs is needed to execute the user's command.
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
distro_name: the name of the required OS (e.g., redhat).
distro_version: the version of the required OS (e.g., 6.5).
Returns:
If no error happens, returns None.
Otherwise, directly exit.
- chroot_post_process(dir_dict, file_dict, sandbox_dir, need_separate_rootfs, hardware_platform, distro_name, distro_version)
- Remove all the created target mountpoints within the cached os image directory.
It is not necessary to change the mode of the output dir, because only the root user can use the chroot method.
Args:
dir_dict: a dict including all the directory mountpoints needed to be created inside the OS image.
file_dict: a dict including all the file mountpoints needed to be created inside the OS image.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
need_separate_rootfs: whether a separate rootfs is needed to execute the user's command.
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
distro_name: the name of the required OS (e.g., redhat).
distro_version: the version of the required OS (e.g., 6.5).
Returns:
If no error happens, returns None.
Otherwise, directly exit.
- chrootize_user_cmd(user_cmd, cwd_setting)
- Modify the user's command when the sandbox_mode is chroot. This check should be done after `parrotize_user_cmd`.
The cases when this function should be called: sandbox_mode == chroot
Args:
user_cmd: the user's command.
cwd_setting: the current working directory for the execution of the user's command.
Returns:
the modified version of the user's cmd.
- cleanup(filelist, dirlist)
- Cleanup the temporary files and dirs created by umbrella
Args:
filelist: a list including file paths
dirlist: a list including dir paths
Returns:
None
- collect_software_bin(sw_mount_dict)
- Construct the path environment from the mountpoints of software dependencies.
Each softare meta has a bin subdir containing all its executables.
Args:
sw_mount_dict: a dict only including all the software mounting items.
Returns:
extra_path: the paths which are extracted from sw_mount_dict and needed to be added into PATH.
- compare_versions(v1, v2)
- Compare two versions, the format of version is: X.X.X
Args:
v1: a version.
v2: a version.
Returns:
0 if v1 == v2; 1 if v1 is newer than v2; -1 if v1 is older than v2.
- condor_process(spec_path, spec_json, spec_path_basename, meta_path, sandbox_dir, output_dir, input_list_origin, user_cmd, cwd_setting, condorlog_path, cvmfs_http_proxy)
- Process the specification when condor execution engine is chosen
Args:
spec_path: the absolute path of the specification.
spec_json: the json object including the specification.
spec_path_basename: the file name of the specification.
meta_path: the path of the json file including all the metadata information.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
output_dir: the output directory.
input_list_origin: the list of input file paths.
user_cmd: the user's command.
cwd_setting: the current working directory for the execution of the user's command.
condorlog_path: the path of the umbrella log executed on the remote condor execution node.
cvmfs_http_proxy: HTTP_PROXY environment variable used to access CVMFS by Parrot
Returns:
If no errors happen, return None;
Otherwise, directly exit.
- construct_chroot_mount_dict(sandbox_dir, output_dir, input_dict, need_separate_rootfs, os_image_dir, mount_dict)
- Construct directory mount list and file mount list for chroot. chroot requires the target mountpoint must be created within the chroot jail.
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
output_f_dict: the mappings of output files (key is the file path used by the application; value is the file path the user specifies.)
output_d_dict: the mappings of output dirs (key is the dir path used by the application; value is the dir path the user specified.)
input_dict: the setting of input files specified by the --inputs option.
need_separate_rootfs: whether a separate rootfs is needed to execute the user's command.
os_image_dir: the path of the OS image inside the umbrella local cache.
mount_dict: a dict including each mounting item in the specification, whose key is the access path used by the user's task; whose value is the actual storage path.
Returns:
a tuple includes the directory mount list and the file mount list
- construct_docker_volume(input_dict, mount_dict, output_f_dict, output_d_dict)
- Construct the docker volume parameters based on mount_dict.
Args:
input_dict: the setting of input files specified by the --inputs option.
mount_dict: a dict including each mounting item in the specification, whose key is the access path used by the user's task; whose value is the actual storage path.
Returns:
volume_paras: all the `-v` options for the docker command.
- construct_env(sandbox_dir, os_image_dir)
- Read env_list inside an OS image and save all the environment variables into a dictionary.
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
os_image_dir: the path of the OS image inside the umbrella local cache.
Returns:
env_dict: a dictionary which includes all the environment variables from env_list
- construct_mountfile_cvmfs_cms_siteconf(sandbox_dir, cvmfs_cms_siteconf_mountpoint, parrot_log)
- Create the mountfile if chroot and docker is used to execute a CMS application and the host machine does not have cvmfs installed.
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
cvmfs_cms_siteconf_mountpoint: a string in the format of '/cvmfs/cms.cern.ch/SITECONF/local <SITEINFO dir in the umbrella local cache>/local'
parrot_log: the path of the parrot debugging log
Returns:
the path of the mountfile.
- construct_mountfile_easy(sandbox_dir, input_dict, output_f_dict, output_d_dict, mount_dict, cvmfs_cms_siteconf_mountpoint)
- Create the mountfile if parrot is used to create a sandbox for the application and a separate rootfs is not needed.
The trick here is the adding sequence does matter. The latter-added items will be checked first during the execution.
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
mount_dict: all the mount items extracted from the specification file and possible implicit dependencies like cctools.
input_dict: the setting of input files specified by the --inputs option
cvmfs_cms_siteconf_mountpoint: a string in the format of '/cvmfs/cms.cern.ch/SITECONF/local <SITEINFO dir in the umbrella local cache>/local'
Returns:
the path of the mountfile.
- construct_mountfile_full(sandbox_dir, os_image_dir, mount_dict, input_dict, output_f_dict, output_d_dict, cvmfs_cms_siteconf_mountpoint, parrot_log)
- Create the mountfile if parrot is used to create a sandbox for the application and a separate rootfs is needed.
The trick here is the adding sequence does matter. The latter-added items will be checked first during the execution.
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
os_image_dir: the path of the OS image inside the umbrella local cache.
mount_dict: all the mount items extracted from the specification file and possible implicit dependencies like cctools.
input_dict: the setting of input files specified by the --inputs option
cvmfs_cms_siteconf_mountpoint: a string in the format of '/cvmfs/cms.cern.ch/SITECONF/local <SITEINFO dir in the umbrella local cache>/local'
parrot_log: the path of the parrot debugging log
Returns:
the path of the mountfile.
- create_docker_image(sandbox_dir, hardware_platform, distro_name, distro_version, tag)
- Create a docker image based on the cached os image directory.
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
distro_name: the name of the required OS (e.g., redhat).
distro_version: the version of the required OS (e.g., 6.5).
tag: the tag of the expected docker image. tag is os_id
Returns:
If the docker image is imported from the tarball successfully, returns None.
Otherwise, directly exit.
- create_fake_mount(os_image_dir, sandbox_dir, mount_list, path)
- For each ancestor dir B of path (including path iteself), check whether it exists in the rootfs and whether it exists in the mount_list and
whether it exists in the fake_mount directory inside the sandbox.
If B is inside the rootfs or the fake_mount dir, do nothing. Otherwise, create a fake directory inside the fake_mount.
Reason: the reason why we need to guarantee any ancestor dir of a path exists somehow is that `cd` shell builtin does a syscall stat on each level of
the ancestor dir of a path. Without creating the mountpoint for any ancestor dir, `cd` would fail.
Args:
os_image_dir: the path of the OS image inside the umbrella local cache.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
mount_list: a list of mountpoints which already been inside the parrot mountlist file.
path: a dir path.
Returns:
mount_str: a string including the mount items which are needed to added into the parrot mount file.
- data_dependency_process(name, id, meta_json, sandbox_dir, action, osf_auth)
- Download a data dependency
Args:
name: the item name in the data section
id: the id attribute of the processed dependency
meta_json: the json object including all the metadata of dependencies.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
action: the action on the downloaded dependency. Options: none, unpack. "none" leaves the downloaded dependency at it is. "unpack" uncompresses the dependency.
osf_auth: the osf authentication info including osf_username and osf_password.
Returns:
dest: the path of the downloaded data dependency in the umbrella local cache.
- data_install(data_spec, meta_json, sandbox_dir, mount_dict, env_para_dict, osf_auth, cwd_setting, name=None)
- Process data section of the specification.
At the beginning of the function, mount_dict only includes items for software and os dependencies. After this function is done, all the items for data dependencies will be added into mount_dict.
Args:
data_spec: the data section of the specification.
meta_json: the json object including all the metadata of dependencies.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
mount_dict: a dict including each mounting item in the specification, whose key is the access path used by the user's task; whose value is the actual storage path.
env_para_dict: the environment variables which need to be set for the execution of the user's command.
osf_auth: the osf authentication info including osf_username and osf_password.
cwd_setting: the current working directory for the execution of the user's command.
name: if name is specified, then only the specified item will be installed. All the other items in the software section will be ignored.
Returns:
None
- decide_instance_type(cpu_cores, memory_size, disk_size, instances)
- Compare the required hardware configurations with each instance type, and return the first matched instance type, return 'no' if no matched instance type exist.
We can rank each instance type in the future, so that in the case of multiple matches exit, the closest matched instance type is returned.
Args:
cpu_cores: the number of required cpus (e.g., 1).
memory_size: the memory size requirement (e.g., 2GB). Not case sensitive.
disk_size: the disk size requirement (e.g., 2GB). Not case sensitive.
instances: the instances section of the ec2 json file.
Returns:
If there is no matched instance type, return 'no'.
Otherwise, returns the first matched instance type.
- dep_build(d, name)
- Build the metadata info of a dependency.
Args:
d: a dependency object
name: the name of the dependency
Returns:
If the dependency comes from a local path, return 1 denoting this dependency has been built up.
Otherwise, return 0 denoting nothing is built up.
- dependency_check(item)
- Check whether an executable exists or not.
Args:
item: the name of the executable to be found.
Returns:
If the executable can be found through $PATH, return 0;
Otherwise, return -1.
- dependency_check_list(item_list)
- Check whether any executable in the item_list does not exist.
Args:
item_list: a list of executables.
Returns:
If all the executables in the item_list can be found through $PATH, return 0;
Otherwise, return -1.
- dependency_download(name, url, checksum, checksum_tool, dest, format_remote_storage, action)
- Download a dependency from the url and verify its integrity.
Args:
name: the file name of the dependency. If its format is plain text, then filename is the same with the archived name. If its format is tgz, the filename should be the archived name with the trailing .tgz/.tar.gz removed.
url: the storage location of the dependency.
checksum: the checksum of the dependency.
checksum_tool: the tool used to calculate the checksum, such as md5sum.
dest: the destination of the dependency where the downloaded dependency will be put.
format_remote_storage: the file format of the dependency, such as .tgz.
action: the action on the downloaded dependency. Options: none, unpack. "none" leaves the downloaded dependency at it is. "unpack" uncompresses the dependency.
Returns:
If the url is a broken link or the integrity of the downloaded data is bad, directly exit.
Otherwise, return None.
- dependency_process(name, id, action, meta_json, sandbox_dir, osf_auth)
- Process each explicit and implicit dependency.
Args:
name: the item name in the software section
id: the id attribute of the processed dependency
action: the action on the downloaded dependency. Options: none, unpack. "none" leaves the downloaded dependency at it is. "unpack" uncompresses the dependency.
meta_json: the json object including all the metadata of dependencies.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
osf_auth: the osf authentication info including osf_username and osf_password.
Returns:
mount_value: the actual storage path of one dependency.
- dir_create(filepath)
- Create the directory for it if necessary. If the file already exists, exit directly.
Args:
filepath: a file path
Returns:
Exit directly if any error happens.
Otherwise, returns None.
- ec2_process(spec_path, spec_json, meta_option, meta_path, ssh_key, ec2_key_pair, ec2_security_group, ec2_instance_type, sandbox_dir, output_option, output_f_dict, output_d_dict, sandbox_mode, input_list, input_list_origin, env_option, env_para_dict, user_cmd, cwd_setting, ec2log_path, cvmfs_http_proxy)
- Args:
spec_path: the path of the specification.
spec_json: the json object including the specification.
meta_option: the --meta option.
meta_path: the path of the json file including all the metadata information.
ssh_key: the name the private key file to use when connecting to an instance.
ec2_key_pair: the path of the key-pair to use when launching an instance.
ec2_security_group: the security group id within which the EC2 instance should be run.
ec2_instance_type: the type of an Amazone ec2 instance
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
output_f_dict: the mappings of output files (key is the file path used by the application; value is the file path the user specifies.)
output_d_dict: the mappings of output dirs (key is the dir path used by the application; value is the dir path the user specified.)
sandbox_mode: the execution engine.
input_list: a list including all the absolute path of the input files on the local machine.
input_list_origin: the list of input file paths.
env_para_dict: the environment variables which need to be set for the execution of the user's command.
user_cmd: the user's command.
cwd_setting: the current working directory for the execution of the user's command.
ec2log_path: the path of the umbrella log executed on the remote EC2 execution node.
cvmfs_http_proxy: HTTP_PROXY environment variable used to access CVMFS by Parrot
Returns:
If no errors happen, return None;
Otherwise, directly exit.
- env_check(sandbox_dir, sandbox_mode, hardware_platform, cpu_cores, memory_size, disk_size, kernel_name, kernel_version)
- Check the matching degree between the specification requirement and the host machine.
Currently check the following item: sandbox_mode, hardware platform, kernel, OS, disk, memory, cpu cores.
Other things needed to check: software, and data??
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
sandbox_mode: the execution engine.
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
cpu_cores: the number of required cpus (e.g., 1).
memory_size: the memory size requirement (e.g., 2GB). Not case sensitive.
disk_size: the disk size requirement (e.g., 2GB). Not case sensitive.
kernel_name: the name of the required OS kernel (e.g., linux). Not case sensitive.
kernel_version: the version of the required kernel (e.g., 2.6.18).
Returns:
host_linux_distro: the linux distro of the host machine. For Example: redhat6, centos6.
- env_parameter_init(hardware_spec, kernel_spec, os_spec)
- Set the environment parameters according to the specification file.
Args:
hardware_spec: the hardware section in the specification for the user's task.
kernel_spec: the kernel section in the specification for the user's task.
os_spec: the os section in the specification for the user's task.
Returns:
a tuple including the requirements for hardware, kernel and os.
- extract_tar(src, dest, form)
- Extract a tgz file from src to dest
Args:
src: the location of a tgz file
dest: the location where the uncompressed data will be put
form: the format the tarball. Such as: tar, tgz
Returns:
None
- func_call(cmd, utils_list=None)
- Execute a command and return the return code, stdout, stderr.
Args:
cmd: the command needs to execute using the subprocess module.
utils_list: a list of executables used in the cmd
Returns:
a tuple including the return code, stdout, stderr.
- func_call_withenv(cmd, env_dict, utils_list=None)
- Execute a command with a special setting of the environment variables and return the return code, stdout, stderr.
Args:
cmd: the command needs to execute using the subprocess module.
env_dict: the environment setting.
utils_list: a list of executables used in the cmd
Returns:
a tuple including the return code, stdout, stderr.
- get_linker_path(hardware_platform, os_image_dir)
- Return the path of ld-linux.so within the downloaded os image dependency
Args:
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
os_image_dir: the path of the OS image inside the umbrella local cache.
Returns:
If the dynamic linker is found within the OS image, return its fullpath.
Otherwise, returns None.
- get_tgz_size(path)
- Get the uncompressed size of a tgz file
Args:
path: a tgz file path
Returns:
size: the uncompressed size of a tgz file
- git_dependency_download(repo_url, dest, git_branch, git_commit)
- Prepare a dependency from a git repository.
First check whether dest exist or not: if dest exists, then checkout to git_branch and git_commit;
otherwise, git clone url, and then checkout to git_branch and git_commit.
Args:
repo_url: the url of the remote git repository
dest: the local directory where the git repository will be cloned into
git_branch: the branch name of the git repository
git_commit: the commit id of the repository
Returns:
dest: the local directory where the git repository is
- git_dependency_parser(item, repo_url, sandbox_dir)
- Parse a git dependency
Args:
item: an item from the metadata database
repo_url: the url of the remote git repository
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
Returns:
dest: the path of the downloaded data dependency in the umbrella local cache.
- has_docker_image(hardware_platform, distro_name, distro_version, tag)
- Check whether the required docker image exists on the local machine or not.
Args:
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
distro_name: the name of the required OS (e.g., redhat).
distro_version: the version of the required OS (e.g., 6.5).
tag: the tag of the expected docker image. tag is os_id
Returns:
If the required docker image exists on the local machine, returns 'yes'.
Otherwise, returns 'no'.
- has_source(sources, target)
- Check whether the sources includes a url from the specific target.
Args:
sources: a list of url
target: the specific resource url. For example, s3, osf.
Returns:
If a url from the specific target exists, return True.
Otherwise, return False.
- in_local_group()
- Judge whether the current user's group exists in /etc/group.
Returns:
If the current user's group exists in /etc/group, returns 'yes'.
Otherwise, returns 'no'.
- in_local_passwd()
- Judge whether the current user exists in /etc/passwd.
Returns:
If the current user is inside /etc/passwd, returns 'yes'.
Otherwise, returns 'no'.
- is_dir(path)
- Judge whether a path is directory or not.
If the path is a dir, directly return. Otherwise, exit directly.
Args:
path: a path
Returns:
None
- json2file(filepath, json_item)
- Write a json object into a file
Args:
filepath: a file path
json_item: a dict representing a json object
Returns:
None
- launch_ec2_instance(image_id, region, instance_type, ec2_key_pair, ec2_security_group)
- Start one VM instance through Amazon EC2 command line interface and return the instance id.
Args:
image_id: the Amazon Image Identifier.
region: the AWS region where the AMI specified by image_id belongs to. The instance will be launched within the same region.
instance_type: the Amazon EC2 instance type used for the task.
ec2_key_pair: the path of the key-pair to use when launching an instance.
ec2_security_group: the security group id within which the EC2 instance should be run.
Returns:
If no error happens, returns an EC2.Instance object.
Otherwise, directly exit.
- main()
- md5_cal(filename, block_size=1048576)
- Calculate the md5sum of a file
Args:
filename: the name of the file
block_size: the size of each block
Returns:
If the calculation fails for any reason, directly exit.
Otherwise, return the md5 value of the content of the file
- meta_search(meta_json, name, id=None)
- Search the metadata information of an dependency in the meta_json
First find all the items with the required name in meta_json.
Then find the right one whose id satisfied the requirement.
If no id parameter is problem, then the first matched one will be returned.
Args:
meta_json: the json object including all the metadata of dependencies.
name: the name of the dependency.
id: the id attribute of the dependency. Defaults to None.
Returns:
If one item is found in meta_json, return the item, which is a dictionary.
If no item satisfied the requirement on meta_json, directly exit.
- needCVMFS(spec_json, meta_json)
- For each dependency in the spec_json, check whether cvmfs is needed to deliver it.
Args:
spec_json: the json object including the specification.
meta_json: the json object including all the metadata of dependencies.
Returns:
if cvmfs is needed, return the cvmfs url. Otherwise, return None
- obtain_package(spec_json)
- Check whether this spec includes a package_manager section, which in turn includes a list attr.
Args:
spec_json: the json object including the specification.
Returns:
if a package list is specified in the spec_json, return the package manager name and a list of the required package name.
Otherwise, return None
- obtain_path(os_image_dir, sw_mount_dict)
- Get the path environment variable from envfile and add the mountpoints of software dependencies into it
the envfile here is named env_list under the OS image.
Args:
os_image_dir: the path of the OS image inside the umbrella local cache.
sw_mount_dict: a dict only including all the software mounting items.
Returns:
path_env: the new value for PATH.
- osf_create(username, password, user_id, proj_name, is_public)
- Create an OSF project, and return the project id.
Args:
username: an OSF username
password: an OSF password
user_id: the id of an OSF user
proj_name: the name of the OSF project
is_public: set to 1 if the project is public; set to 0 if the project is private.
Returns:
the id of the OSF project
- osf_download(username, password, osf_url, dest)
- download a file pointed by an OSF url to dest.
Args:
username: an OSF username
password: an OSF password
osf_url: the OSF download url
dest: the destination of the OSF file
Returns:
If the osf_url is downloaded successfully, return None;
Otherwise, directly exit.
- osf_upload(username, password, proj_id, source)
- upload a file from source into the OSF project identified by proj_id.
Args:
username: an OSF username
password: an OSF password
proj_id: the id of the OSF project
source: a file path
Returns:
the OSF download url of the uploaded file
- parrotize_user_cmd(user_cmd, cwd_setting, cvmfs_http_proxy, parrot_mount_file, parrot_ldso_path, use_local_cvmfs, parrot_log)
- Modify the user's command into `parrot_run + the user's command`.
The cases when this function should be called: (1) sandbox_mode == parrot; (2) sandbox_mode != parrot and cvmfs is needed to deliver some dependencies not installed on the execution node.
Args:
user_cmd: the user's command.
cwd_setting: the current working directory for the execution of the user's command.
cvmfs_http_proxy: HTTP_PROXY environmetn variable used to access CVMFS by Parrot
parrot_mount_file: the path of the mountfile for parrot
parrot_ldso_path: the path of the ld.so file for parrot
use_local_cvmfs: use the cvmfs on the host machine instead of using parrot_run to deliver cvmfs
parrot_log: the path of the parrot debugging log
Returns:
None
- path_exists(filepath)
- Check the validity and existence of a file path.
Args:
filepath: a file path
Returns:
Exit directly if any error happens.
Otherwise, returns None.
- prune_attr(dict_item, attr_list)
- Remove certain attributes from a dict.
If a specific ttribute does not exist, pass.
Args:
dict_item: a dict
attr_list: a list of attributes which will be removed from the dict.
Returns:
None
- prune_spec(json_object)
- Remove the metadata information from a json file (which represents an umbrella specification).
Note: the original json file will not be changed by this function.
Args:
json_object: a json file representing an umbrella specification
Returns:
temp_json: a new json file without metadata information
- remove_trailing_slashes(path)
- Remove the trailing slashes of a string
Args:
path: a path, which can be any string.
Returns:
path: the new path without any trailing slashes.
- s3_create(bucket_name, acl)
- Create a s3 bucket
Args:
bucket_name: the bucket name
acl: the access control, which can be: private, public-read
Returns:
bucket: an S3.Bucket instance
- s3_download(link, dest)
- Download a s3 file to dest
Args:
link: the link of a s3 object. e.g., https://s3.amazonaws.com/testhmeng/s3
dest: a local file path
Returns:
None
- s3_upload(bucket, source, acl)
- Upload a local file to s3
Args:
bucket: an S3.Bucket instance
source: the local file path
acl: the access control, which can be: private, public-read
Returns:
link: the link of a s3 object
- separatize_spec(spec_json, meta_json, target_type)
- Given an umbrella specification and an umbrella metadata database, generate a self-contained umbrella specification or a metadata database only including the informationnecessary for the umbrella spec.
If the target_type is spec, then generate a self-contained umbrella specification.
If the target_type is db, then generate a metadata database only including the information necessary for the umbrella spec.
Args:
spec_json: the json object including the specification.
meta_json: the json object including all the metadata of dependencies.
target_type: the type of the target json file, which can be an umbrella spec or an umbrella metadata db.
Returns:
metadata: a json object
- set_cvmfs_cms_siteconf(sandbox_dir)
- Download cvmfs SITEINFO and set its mountpoint.
Args:
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
Returns:
cvmfs_cms_siteconf_mountpoint: a string in the format of '/cvmfs/cms.cern.ch/SITECONF/local <SITEINFO dir in the umbrella local cache>/local'
- software_install(mount_dict, env_para_dict, software_spec, meta_json, sandbox_dir, pac_install_destructive, osf_auth, name=None)
- Installation each software dependency specified in the software section of the specification.
Args:
mount_dict: a dict including each mounting item in the specification, whose key is the access path used by the user's task; whose value is the actual storage path.
env_para_dict: the environment variables which need to be set for the execution of the user's command.
software_spec: the software section of the specification
meta_json: the json object including all the metadata of dependencies.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
pac_install_destructive: whether this is to install packages through package manager in destructive mode
osf_auth: the osf authentication info including osf_username and osf_password.
name: if name is specified, then only the specified item will be installed. All the other items in the software section will be ignored.
Returns:
None.
- source_filter(sources, filters, name)
- Filter the download urls of a dependency.
The reason why this filtering process is necessary is: some urls are not
accessible by the current umbrella runtime. For example, if some urls points to
OSF, but the execution node has no requests python package installed. In this
case, all the download urls pointing to OSF are ignored.
Args:
sources: a list of download urls
filters: a list of protocols which are not supported by the current umbrella runtime.
name: the name of the dependency.
Returns:
If all the download urls are not available, exit directly.
Otherwise, return the first available url.
- spec_build(spec_json)
- Build the metadata information of an umbrella spec
Args:
spec_json: the json object including the specification.
Returns:
count: the count of dependencies whose metadata have been built.
- spec_upload(spec_json, meta_json, target_info, sandbox_dir, cwd_setting, osf_auth=None, s3_bucket=None)
- Upload each dependency in an umbrella spec to the target (OSF or s3), and add the new target download url into the umbrella spec.
The source of the dependencies can be anywhere supported by umbrella: http
https git local s3 osf. Umbrella always first downloads each dependency into
its local cache, then upload the dep from its local cache to the target.
Args:
spec_json: the json object including the specification.
meta_json: the json object including all the metadata of dependencies.
target_info: the info necessary to communicate with the remote target (i.e., OSF, s3)
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
cwd_setting: the current working directory for the execution of the user's command.
osf_auth: the osf authentication info including osf_username and osf_password.
s3_bucket: an S3.Bucket instance
Returns:
None
- specification_process(spec_json, sandbox_dir, behavior, meta_json, sandbox_mode, output_f_dict, output_d_dict, input_dict, env_para_dict, user_cmd, cwd_setting, cvmfs_http_proxy, osf_auth, use_local_cvmfs, parrot_log)
- Create the execution environment specified in the specification file and run the task on it.
Args:
spec_json: the json object including the specification.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
behavior: the umbrella behavior, such as `run`.
meta_json: the json object including all the metadata of dependencies.
sandbox_mode: the execution engine.
output_f_dict: the mappings of output files (key is the file path used by the application; value is the file path the user specifies.)
output_d_dict: the mappings of output dirs (key is the dir path used by the application; value is the dir path the user specified.)
input_dict: the setting of input files specified by the --inputs option.
env_para_dict: the environment variables which need to be set for the execution of the user's command.
user_cmd: the user's command.
cwd_setting: the current working directory for the execution of the user's command.
cvmfs_http_proxy: HTTP_PROXY environmetn variable used to access CVMFS by Parrot
osf_auth: the osf authentication info including osf_username and osf_password.
use_local_cvmfs: use the cvmfs on the host machine instead of using parrot_run to deliver cvmfs
parrot_log: the path of the parrot debugging log
Returns:
None.
- subprocess_error(cmd, rc, stdout, stderr)
- Print the command, return code, stdout, and stderr; and then directly exit.
Args:
cmd: the executed command.
rc: the return code.
stdout: the standard output of the command.
stderr: standard error of the command.
Returns:
directly exit the program.
- terminate_instance(instance)
- Terminate an instance.
Args:
instance_id: an ec2.Instance object
Returns:
None.
- transfer_env_para_docker(env_para_dict)
- Transfer the env_para_dict into the docker `-e` options.
Args:
env_para_dict: the environment variables which need to be set for the execution of the user's command.
Returns:
env_options: the docker `-e` options constructed from env_para_dict.
- trim_list(origin, s)
- Trim the strings in the set s from the origin list.
Args:
origin: a list of string
s: a set of string
Returns:
final: a new list of string
- url_download(url, dest)
- Download url into dest
Args:
url: the url needed to be downloaded.
dest: the path where the content from the url should be put.
Returns:
If the url is downloaded successfully, return None;
Otherwise, directly exit.
- validate_meta(meta_json)
- Validate a metadata db.
The current standard for a valid metadata db is: for each item, the "source" attribute must exist and not be not empty.
Args:
meta_json: a dict object representing a metadata db.
Returns:
If error happens, return directly with the error info.
Otherwise, None.
- validate_spec(spec_json, meta_json=None)
- Validate a spec_json.
Args:
spec_json: a dict object representing a specification.
meta_json: a dict object representing a metadata db.
Returns:
If error happens, return directly with the error info.
Otherwise, None.
- verify_kernel(host_kernel_name, host_kernel_version, kernel_name, kernel_version)
- Check whether the kernel version of the host machine matches the requirement.
The kernel_version format supported for now includes: >=2.6.18; [2.6.18, 2.6.32].
Args:
host_kernel_name: the name of the OS kernel of the host machine.
host_kernel_version: the version of the kernel of the host machine.
kernel_name: the name of the required OS kernel (e.g., linux). Not case sensitive.
kernel_version: the version of the required kernel (e.g., 2.6.18).
Returns:
If the kernel version of the host machine matches the requirement, return None.
If the kernel version of the host machine does not match the requirement, directly exit.
- which_exec(name)
- The implementation of shell which command
Args:
name: the name of the executable to be found.
Returns:
If the executable is found, returns its fullpath.
If PATH is not set, directly exit.
Otherwise, returns None.
- workflow_repeat(cwd_setting, sandbox_dir, sandbox_mode, output_f_dict, output_d_dict, input_dict, env_para_dict, user_cmd, hardware_platform, host_linux_distro, distro_name, distro_version, need_separate_rootfs, os_image_dir, os_image_id, cvmfs_cms_siteconf_mountpoint, mount_dict, sw_mount_dict, meta_json, new_os_image_dir, cvmfs_http_proxy, needs_parrotize_user_cmd, use_local_cvmfs, parrot_log)
- Run user's task with the help of the sandbox techniques, which currently inculde chroot, parrot, docker.
Args:
cwd_setting: the current working directory for the execution of the user's command.
sandbox_dir: the sandbox dir for temporary files like Parrot mountlist file.
sandbox_mode: the execution engine.
output_f_dict: the mappings of output files (key is the file path used by the application; value is the file path the user specifies.)
output_d_dict: the mappings of output dirs (key is the dir path used by the application; value is the dir path the user specified.)
input_dict: the setting of input files specified by the --inputs option.
env_para_dict: the environment variables which need to be set for the execution of the user's command.
user_cmd: the user's command.
hardware_platform: the architecture of the required hardware platform (e.g., x86_64).
distro_name: the name of the required OS (e.g., redhat).
distro_version: the version of the required OS (e.g., 6.5).
need_separate_rootfs: whether a separate rootfs is needed to execute the user's command.
os_image_dir: the path of the OS image inside the umbrella local cache.
os_image_id: the id of the OS image.
cvmfs_cms_siteconf_mountpoint: a string in the format of '/cvmfs/cms.cern.ch/SITECONF/local <SITEINFO dir in the umbrella local cache>/local'
mount_dict: a dict including each mounting item in the specification, whose key is the access path used by the user's task; whose value is the actual storage path.
sw_mount_dict: a dict only including all the software mounting items.
meta_json: the json object including all the metadata of dependencies.
new_os_image_dir: the path of the newly created OS image with all the packages installed by package manager.
cvmfs_http_proxy: HTTP_PROXY environment variable used to access CVMFS by Parrot
needs_parrotize_user_cmd: whether the user cmd needs to be wrapped inside parrot.
use_local_cvmfs: use the cvmfs on the host machine instead of using parrot_run to deliver cvmfs
parrot_log: the path of the parrot debugging log
Returns:
return_code: the return code of executing the user command
If critical errors happen, directly exit.
|