Difference: WilliamBreadenMaddenSandbox (2 vs. 3)

Revision 32014-07-29 - WilliamBreadenMadden

Line: 1 to 1
 
META TOPICPARENT name="Main.WilliamBreadenMadden"

parallel file validation

threading and multiprocessing

Changed:
<
<
  • thread: stream of instructions in a process
  • multithreading: multiple threads running within a process
  • multiprocessing: multiple processes instantiated by the operating system (an error in one process can not bring down another process)

  • multithreading versus multiprocessing: An error in one thread can bring down all the threads of a process while an error in a process can not (easily) bring down another process.
>
>
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. A thread is a stream of instructions in a process and multithreading consists of multiple threads running within a process concurrently. Multiprocessing is the instantiation of multiple processes by the operating system. An error in one thread can bring down all the threads of a process while an error in a process can not (easily) bring down another process.
 

Python Global Interpreter Lock

Line: 123 to 119
  The parallel job processor is a central part of the service task. It is, as its name suggests, the system that manages the processing of jobs in parallel. Core to its functioning is the Python multiprocessing module. Multiprocessing is a Python module that supports the spawning of processes using an interface similar to the threading module. In effect, multiprocessing offers an approach to bypassing the Python Global Interpreter Lock by using subprocesses instead of threads.
Changed:
<
<
Here is described visually a non-exhaustive illustration of the parallel job processor and associated concepts.

parallel job processor

Figure~\ref{figure:PJP_step_0} illustrates the basic structure of the parallel job processor. Of particular note are the objects

>
>
Here is described visually a non-exhaustive illustration of the parallel job processor and associated concepts. The figure below illustrates the basic structure of the parallel job processor. Of particular note are the objects
 
  • function,
  • Job,
Line: 137 to 127
 
Changed:
<
<
There are a number of steps involved in the typical usage and operation of the parallel job processor.
>
>
parallel job processor
 
Changed:
<
<
\subsection{Job and JobGroup definitions}
>
>
There are a number of steps involved in the typical usage and operation of the parallel job processor.
 
Changed:
<
<
\begin{figure}[H] \begin{center} \includegraphics[width=\textwidth]{figures/2.jpg} \end{center} \caption{parallel job processor: Job and JobGroup definitions} \label{figure:PJP_step_1} \end{figure}
>
>

Job and JobGroup definitions

  A Job for the parallel job processor consists of
Changed:
<
<
\begin{itemize} \item a work function, \item its arguments and \item other information (such as its timeout specification). \end{itemize}
>
>
  • a work function,
  • its arguments and
  • other information (such as its timeout specification).
 
Changed:
<
<
\subsection{submission}

\begin{figure}[H] \begin{center} \includegraphics[width=\textwidth]{figures/3.jpg} \end{center} \caption{parallel job processor: submission} \label{figure:PJP_step_2} \end{figure}

>
>

submission

  A Job or a JobGroup is submitted to the parallel job processor. The parallel job processor submits all jobs to the process pool it created on initialisation, recording the submission time.
Changed:
<
<
\subsection{checking for results, timeouts and failures}

\begin{figure}[H] \begin{center} \includegraphics[width=\textwidth]{figures/4.jpg} \end{center} \caption{parallel job processor: checking for results, timeouts and failures} \label{figure:PJP_step_3} \end{figure}

>
>

checking for results, timeouts and failures

  Submissions are monitored for timeouts and failures, with exceptions being raised when appropriate. Timeouts are measured at the JobGroup level, rather than at the Job level.
Changed:
<
<
\subsection{getting results}

\begin{figure}[H] \begin{center} \includegraphics[width=\textwidth]{figures/5.jpg} \end{center} \caption{parallel job processor: getting results} \label{figure:PJP_step_4} \end{figure}

>
>

getting results

  Results of jobs returned by the process pool are propagated to the Jobs and to the JobGroup as they become available.
Changed:
<
<
\subsection{return results}

\begin{figure}[H] \begin{center} \includegraphics[width=\textwidth]{figures/6.jpg} \end{center} \caption{parallel job processor: return results} \label{figure:PJP_step_5} \end{figure}

>
>

return results

  Results of all jobs are stored in order in the results list of the JobGroup. This list is returned by the parallel job processor.
Changed:
<
<
\subsection{usage}

\begin{figure}[H] \begin{center} \includegraphics[width=\textwidth]{figures/7.jpg} \end{center} \caption{parallel job processor: job specification in Python} \label{figure:PJP_job_specification} \end{figure}

>
>

usage

 
Changed:
<
<
Figure~\ref{figure:PJP_job_specification} depicts an example job group for the parallel job processor. In this case, it is a unit test for the system. Here, a job group of two jobs is defined. One of the jobs specifies a hello world function with a sleep time argument and the other specifies a multiplication function with multiplicand arguments. There is a timeout specified for each job.
>
>
The code below depicts an example job group for the parallel job processor. In this case, it is a unit test for the system. Here, a job group of two jobs is defined. One of the jobs specifies a hello world function with a sleep time argument and the other specifies a multiplication function with multiplicand arguments. There is a timeout specified for each job.
 
Changed:
<
<
\newpage
>
>
%CODE{"python"}% ## @brief unit test for working functions # @detail This method is a unit test of the parallel job processor # testing the processing of two simple, working functions. def test_working(self): msg.info("\n\n\n\nPARALLEL JOB PROCESSOR WORKING TEST") jobGroup1 = JobGroup( name = "working functions test", jobs = [ Job( name = "hello world function", workFunction = helloWorld, workFunctionKeywordArguments = {
'sleepTime'
1, }, workFunctionTimeout = 10 ), Job( name = "multiplication function", workFunction = multiply, workFunctionKeywordArguments = {
'multiplicand1'
2,
'multiplicand2'
3 }, workFunctionTimeout = 10 ) ] ) parallelJobProcessor1 = ParallelJobProcessor() parallelJobProcessor1.submit(jobSubmission = jobGroup1) results = parallelJobProcessor1.getResults() self.assertEquals(results, ['hello world', 6])
%ENDCODE%
 
Changed:
<
<
\noindent Figure~\ref{figure:PJP_terminal_output_1} illustrates example (abbreviated) parallel job processor terminal output (for the parallel validation of typical physics data).
>
>
The figure below illustrates example (abbreviated) parallel job processor terminal output (for the parallel validation of typical physics data).
  parallel job processor logging
Changed:
<
<
\section{challenges}
>
>
The video below illustrates some benefits of the use of the parallel job processor -- in this case, for the purpose of making a Monte Carlo estimation of pi.

in a pickle: the current difficulties of serial communications between processes

 
Changed:
<
<
The Job Transforms infrastructure was written before there were standardised ways of reliably parallelising processes in Python, so, naturally, there were challenges in adapting Job Transforms to parallisation. One problem encountered involved communications between processes breaking down in a confusing, seemingly inconsistent way. Pools use certain signals to receive termination commands etc.~and Job Transforms was using the same signals. The solution in this case was to render Job Transforms silent while the parallel processor is running. Another problem encountered was that certain objects in Job Transforms cannot be ``pickled'' (serialised in inter-process communications). Serialisation of objects for the purposes of communications is provided by the \href{http://docs.python.org/2/library/pickle.html}{\textcolor{black!100}{Python pickle module}}. Certain objects important to file validations cannot be pickled. Specifically, the legacy validation functions were not workable. The solution was to make new validation functions. While this extended the service task somewhat, it proved to be the most obviously reliable and validated approach -- and was successful.
>
>
Serialisation is the process of converting an object to a bytestream.
 
Changed:
<
<
\section{service task status}
>
>
Serialisation of objects for the purposes of communications is provided by the Python pickle module. Certain objects important to file validations cannot be pickled. Specifically, the legacy validation functions were not workable. The solution was to make new validation functions. While this extended the service task somewhat, it proved to be the most obviously reliable and validated approach -- and was successful.
 
Changed:
<
<
The main activities in the service task were as follows:
>
>
The serialisation limitations are likely to be addressed in due course in the shorter term with more advanced serialisation methods such as those offered by dill (https://pypi.python.org/pypi/dill) and in the longer term with improvements in Python parallelisation and GIL handling.
 
Changed:
<
<
\begin{itemize} %\small \item familiarisation with Job Transforms basic usage and code \item starting documentation for new Job Transforms \item general support (lots of coding tasks to support new development) \item systematic investigation of parallel processing methods in Python, principally methods associated with the Python multiprocessing module \item coded multiprocessing code library \item coded \href{https://svnweb.cern.ch/trac/atlasoff/browser/Tools/PyJobTransforms/trunk/python/trfUtils.py}{\textcolor{black!100}{parallel job processor}} \item coded \href{https://svnweb.cern.ch/trac/atlasoff/browser/Tools/PyJobTransforms/trunk/test/test_trfUtilsParallelJobProcessor.py}{\textcolor{black!100}{unit tests for parallel job processor}} (validation for parallel job processor) \item coded new validation functions and associated exception handling procedures \item high-level controls on how much of the CPU resources the parallel job processor may request \item validation of final unit tests (almost complete) \item paper \emph{ATLAS Job Transforms: A Data Driven Workflow Engine}~\cite{Stewart_1} \end{itemize} %\item[\XSolidBrush] bar %\item[] bar
>
>
Dill provides the user with the same interface as pickle and also includes additional features. Its primary use is to send Python objects across a network as a bytestream. It is quite flexible and allows arbitrary user-defined classes and functions to be serialised. Dill is part of pathos, a Python framework for heterogeneous computing. Dill is in the early development states.

In order to address the limitations of pickle, separate data validation functions, such as returnIntegrityOfPOOLFile, are provided by the module PyJobTransforms.trfFileValidationFunctions. The integrity functions defined in this package are associated with the arg classes for the various data types in the module PyJobTransforms.trfArgClasses. Specifically, the name of the validation function appropriate for a type of data are defined as data members of the class for the data type. In the case of POOL files, the specification is as follows:

%CODE{"python"}% class argPOOLFile(argAthenaFile):

integrityFunction = "returnIntegrityOfPOOLFile" %ENDCODE%

  \section{characterising impact}
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback