The Grid and Cloud User Support Environment (gUSE), also known as WS-PGRADE (Web Service – Parallel Grid Run-time and Application Development Environment) /gUSE, is an open source science gateway framework that enables users to access grid and cloud infrastructures. gUSE is developed by the Laboratory of Parallel and Distributed Systems (LPDS) at Institute for Computer Science and Control (SZTAKI) of the Hungarian Academy of Sciences.
A relevant requirement in the development of gUSE was to enable the simultaneous handling of a very large number of jobs, even in the range of millions, without compromising the response time at the user interface. In order to achieve this level of concurrency, the workflow management back-end of gUSE is implemented based on the web service concept of Service Oriented Architecture (SOA).
Science Gateway Framework
There are many user communities who would like to access several DCIs in a transparent way but they don't want to learn the peculiar features of the used DCIs. They want to concentrate their scientific application - for them using a Science Gateway (SG) is the solution. An SG provides an interface between a scientist (or community) and the distributed computing infrastructures (DCIs). An SG framework, like gUSE, provides a specific set of enabling technologies as well as frontend and backend services that together build a generic gateway. SG frameworks are not specialized for a certain scientific area and hence scientists from many different areas can use them. An enabling technology such as gUSE provides the required software stack to develop SG frameworks and SG instances (provide a simplified user interface that is highly tailored to the needs of the given scientific community). Typical examples of such enabling technologies are: web application containers (Tomcat, Glassfish, etc.), portal or web application frameworks (Liferay, Spring, etc.), database management systems (MySQL, etc.), workflow management systems (gUSE itself, MOTEUR, etc.)
SGs can have varying goals. In general, researchers who use gateways can focus on their scientific goals and less on assembling the e-Infrastructure that is required. An important goal is to make it easier for scientists to use (national) computing and storage resources, while creating and using collaborative tools for sharing data.
The SG framework can be used by National Grid Initiatives (NGIs) to support small user communities who cannot afford to develop their own customized SG. The gUSE SG framework also provides two Application Programming Interfaces (APIs), namely the Application-Specific Module API and the Remote API, to create application-specific SGs according to the needs of different user communities.
Features
The gUSE provides with WS-PGRADE a graphical user interface to create and execute workflows on various Distributed Computing Infrastructures (DCIs).
Among many other features, the main five capabilities of gUSE are as follows: (1) gUSE is a general-purpose SG framework under which users can access more than twenty different DCIs via the DCI Bridge service, and six different data storage types (HTTP, HTTPS, GSIFTP, S3, SFTP, and SRM) via the Data Avenue service. Both DCI Bridge and Data Avenue were developed as part of the WS-PGRADE/gUSE service stack, but they can also be used as independent services enabling their use from other types of gateways and workflow systems. (2) WS-PGRADE/gUSE is a workflow-oriented system. It extends the Directed Acyclic Graph (DAG)-based workflow concept with advanced parameter sweep (PS) features by special workflow nodes, condition-dependent workflow execution, and workflow embedding support. Moreover, gUSE extends the concrete workflow concept with the concepts of abstract workflow, workflow instance, and template. (3) WS-PGRADE/gUSE supports the development and execution of workflow-based applications. Users of gUSE define their applications as workflows. They can share their applications among each other by ex-porting them to the internal Application Repository. Other users can import such applications and execute or modify them in their user space. (4) gUSE supports the fast development of SG instances by a customization technology. gUSE can serve different needs, according to the community requirements about the computational power, the complexity of the applications, and the specificity of the user interface to fit the community needs and to meet its terminology. (5) The most important design aspect of gUSE is flexibility. Flexibility of gUSE is expressed
- in exploiting parallelism: gUSE enables parallel execution inside a workflow node as well as among workflow nodes. It is possible to use multiple instances of the same workflow with different data files.
- in the use of DCIs: gUSE can access various DCIs: clusters, cluster grids, desktop grids, supercomputers, and clouds.
- in data storage access: gUSE workflow nodes can access different data storage services in different DCIs via the Data Avenue Blacktop service. Therefore, the file transfer among various storages and workflow nodes can be handled automatically/transparently.
- in security management: For secure authentication it is possible to use users’ personal certificates or robot certificates.
- in cloud access: A large set of different clouds (Amazon, OpenStack, OpenNebula, etc.) can be accessed by WS-PGRADE/gUSE either directly (see Chap. 4) or via the CloudBroker Platform.
- of supported gateway types: gUSE supports different gateway types: general-purpose gateways for national grids (e.g., for Greek and Italian NGIs), general-purpose gateways for particular DCIs (e.g., EDGI gateway), general-purpose gateways for specific technologies (e.g., SHIWA gateway for workflow sharing and interoperation) and domain-specific science gateway instances (e.g., Swiss proteomics portal, MoSGrid gateway, Autodock gateway, Seizmology gateway, and VisIVO).
- in use of workflow systems: Users can access from the SHIWA Workflow Repository many workflows written in various workflow languages and use these workflows as embedded workflows inside WS-PGRADE workflow nodes.
Architecture
The main goal of designing the multitier architecture of WS-PGRADE/gUSE was to enable versatile access to many different kinds of DCIs and data storage by different kinds of user interfaces. This access can be technically performed through the DCI Bridge job submission service which is in the bottom within the gUSE architectural layers, and via the Data Avenue Blacktop service that is an independent service provided by SZTAKI.
DCI Bridge is a web service-based application providing standard access to various DCIs. It connects through its DCI plug-ins to the external DCI resources. When a user submits a workflow, its job components are submitted transparently into the various DCI systems via the DCI Bridge service using its standard OGSA Basic Execution Service 1.0 (BES) interface. As a result, the access protocol and all the technical details of the various DCI systems are totally hidden behind the BES interface. The job description language of BES is the standardized Job Submission Description Language (JSDL). The DCIs supported by DCI Bridge are the following:
- Clusters (PBS, LSF, MOAB, SGE)
- Grids (ARC, gLite, GT2, GT4, GT5, UNICORE, the Extreme Science and Engineering Discovery Environment)
- Supercomputers (e.g., via UNICORE)
- Desktop grids (BOINC)
- Clouds (via CloudBroker Platform, GAE, as well as EC2- and OCCI-based Cloud Access)
The middle tier of the gUSE architecture contains the high-level gUSE services. The Workflow Storage stores every piece of information that is needed to define a workflow (graph structure description, input files pointers, output files pointers, executable code, and target DCI of workflow nodes) except the input files of the workflow. The local input files and the local output files created during workflow execution are stored in the File Storage. The Workflow Interpreter is responsible for the execution of workflows, which are stored in the Workflow Storage. the Information System holds information for users about workflows running and job status. Users of WS-PGRADE gateways work in isolated workspace, i.e., they see only their own workflows. In order to enable collaboration among the isolated users, the Application Repository stores the WS-PGRADE workflows in one of their five possible stages. (Physically all the five categories are stored as zip files.) The five categories of stored workflows are as follows, and the collaboration among the gateway users is possible via all these categories:
- Graph (or abstract workflow) containing information only on the graph structure of the workflow.
- Workflow (or concrete workflow) containing information both on the graph structure and on the configuration parameters (input files pointers, output files pointers, executable code and target DCI of workflow nodes).
- Template: a workflow containing information on every possible modifiable parameter of the workflow if they can be changed by the users or not. These play an important role in the automatic generation of executable workflows in the end-user mode of a WS-PGRADE/gUSE gateway.
- Application is a ready-to-use workflow that contains all the embedded workflows, too. It means that all the information needed to execute this workflow application is stored in the corresponding zip file.
- Project is a workflow that is not completed yet and can be further developed by the person who uploaded it into the Application Repository or by another person (so collaborative workflow development among several workflow developers is supported in this way).
At the top of the three-tier structure, the presentation tier provides WS-PGRADE, the graphical user interface of the generic SG framework. All functionalities of the underlying services are exposed to the users by portlets residing in a Liferay portlet container, which is part of WS-PGRADE. This layer can be easily customized and extended according to the needs of the SG instances to be derived from gUSE. The next section introduces the essential user-level elements of WS-PGRADE.
Science Gateways based on gUSE
GUSE provides framework for more European SGs:
- agINFRA Gateway
- Autodock Portal
- AMC e-BioInfra Gateway
- HELIOGate Portal
- MoSGrid Portal
- Verce SG
- VisIVO Gateway
Projects with gUSE
gUSE gives one of the underlying workflow development infrastructures to a great number of research activities in numerous EU FP7 projects. Ongoing EU- and national projects with gUSE:
- VIALACTEA
- agroDAT
- cloudSME
- SCI-BUS