Multiprocessing in Python – Forking a process

Multiprocessing in Python - Forking a process

Introduction

In this article you will see how to implement the Forking in Python, one of the important and fundamental aspects of the Linux programming. This operation is very important when you need to handle multiple processes simultaneously through a single programming. So forking is an important topic to know regarding the multiprocessing in Python.

Multiprocessing con Python - Forking a process

Forking

In the programming world, you have a fork when a process creates a perfect copy of itself in memory. The calling process is generally called parent process, while the process will be copied for the child process .

Forking is done in memory, creating a new address space, in which all the memory segments are completely copied from parent process. From that point onwards, the performance of the parent and child processes become completely independent of each other, so as to have two different PID (Process IDentifier).

Forking multiprocessing

Fork in Python

In Python you can apply forking thanks to the fork() function belonging to the os module (see more here).

When a Python process invokes fork () function, this creates a copy of the process. This copy (child process) gets all the data and the code directly from the parent process, to then be carried out as a completely independent process getting its own PID from the operating system.

However, the first thing you need to take into account when you use the fork() function, is that the program is split into two perfect copies, each of which will perform the same operations. Generally this is certainly not what you want, indeed, you need that each of the two processes perform slightly different operations. To do this you can leverage the value returned by fork() function.

The following example shows you the operation of the fork function, getting pid of parent and child processes from terminal.

import os

print("PID parent {}".format(os.getpid()))
pid = os.fork()
print("PID child {}".format(pid))

In fact, thanks to the return value of the fork () function you can find out which process will be finding. If the return value is 0 then it means you are in the child process. If it is a positive value, then you will find yourself in the parent process, the positive value obtained is simply the PID of the child process. Finally, if the return value is negative, then it means that something went wrong and you must somehow handle the error.

In this case then you can differentiate depending on the program behavior if you are in the child or parent process. You can do this using just the value returned by fork().

import os

pid = os.fork()

if pid > 0:
    print("This is written by the parent process {}".format(os.getpid()))
else:
    print("This is written by the child process {}".format(os.getpid()))

From this example you can see how to divide the tasks to be performed, using an IF-ELSE construct.

Esempio

Here is a more complex example. For example, you want to start more child processes. In this case you have to introduce the function

os._exit (0)

This is called in the child process to avoid that in turn generate other child processes to end up in uncontrolled cycles.

import os

NUM_PROC  = 5

for process in range(NUM_PROC):
    pid = os.fork()
    if pid > 0: 
        print("This is the parent process {}".format(os.getpid()))
    else: 
        print("This is the child process {}".format(os.getpid()))) 
        os._exit(0)
print("Parent process is closing")

By running the code you will see that the parent and child will generate all the required 5 child processes and then will close, in the meantime 5 child processes will run their operations and then will close themselves.

But as you can see the parent process is the first to close. If you wanted to make sure that the parent process remains on hold while all its child processes perform their task and will close. How can you make sure that the parent process closes last?

There is a special function in Python

os.waitpid(pid, 0)

that awaits the termination of a process (identified by pid) before passing to subsequent instructions.

Enter it in the code above

import os

NUM_PROC  = 5
children = []

for process in range(NUM_PROC):
    pid = os.fork()
    if pid > 0: 
        print("This is the parent process {}".format(os.getpid()))
        children.append(pid)
    else: 
        print("This is the child process {}".format(os.getpid()))) 
        os._exit(0)

for i, proc in enumerate(children):
    os.waitpid(proc, 0)

print("Parent process is closing")

This time you needed an array that contained all of the pid of the child process values. By running the code you the desired result.

forking code python 02

Conclusions

In this article you’ve seen how to build and manage multiprocessing from a single code, thanks to the forking technique. This technique is one of the possible techniques used in Python to get the multiprocessing.[:



Leave a Reply