[OSTEP] study note-Virtualization : Process

yong
4 min readDec 9, 2020

Background Knowledge

CPU
Central processing unit

Register
Component of CPU, kinda temporary work station of CPU

‘Array of pointers’ vs ‘Pointer to an array’
Array of pointers
Array that consists of pointers of data type
[data type] * [name of array][nb of elements]

Pointer to an array
A pointer to array of datatype
[data type] (*[name of array]) [nb of elements]

The way UNIX manages file descriptors
Determines the file descriptor to output, the first free file descriptor starts from zero(=STDOUT FILENO)

Kernel pipes
consist of queue data structure

Virtualization

Seemingly OS runs multiple programs simultaneously, in fact, runs process one by one in only one CPU(or few CPUs). It’s called time sharing, there’s tradeoff between performance and number of concurrent processes.

For virtualization OS needs both low-level machinery called mechanisms and high-level intelligence called policies.

Definition of Process

Abstraction of running program
It’s constituted by component of machine state which is all of what a program can read or update while running.

Common process APIs of modern OS are as follows
: Create, Destroy, Wait, Status, Miscellaneous control(deal with problem of process other way than just kill it)

Process Creation

  1. Load code and static data into address space of the process
    (Early OSes all stuff is loaded at once; Modern OSes currently needed pieces of code or data while execution is loaded)
  2. Allocate memory for run-time : stack and heap
  3. Other initialization tasks, particularly I/O(input, output) initialization
  4. Start the program running at the entry point, namely main() and OS transfers control of CPU to the newly-created process.

Process State

Running
A process is running on a processor

Ready
Ready to run but OS has chosen not to run

Blocked
Make the process not to run until some other event takes place.
It’s used when a process initiates I/O, the other process uses CPU for resource efficiency.

Scheduled / Descheduled
A process is moved from ready to running / from running to ready

ZOMBIE (final) state
State of process has exited but hasn’t been cleaned up yet.
Called zombie state in UNIX-based system

Data Structure

Process list
Whole list of current process, aka task list

PCB
Process Control Block, individual structure of information of a process, also called process descriptor

Process Optimization

Two policies :

  1. Switch to the other process when current process issues I/O, for CPU utilization.
  2. After I/O done, run the process again immediately.
    Since CPU utilization is key for efficiency(less time for the whole processes done), the faster a process is done, the more memories (that can be freely used for scheduling) we get.

Process APIs

Three APIs(system calls) to create new process
fork(), exec(), wait()

  1. fork
    fork creates new process(child process), which starts from the line calls fork();
  2. exec
    exec is used when wanna run a different program in a certain process.
    Overwrites code with loaded code called by exec, address space of the process re-initialized with newly-loaded code.
    But does not create new process.
  3. wait
    wait(), waitpid() let the process called wait();(usually parent) wait until the other process(usually child) done.

Reason for use fork, exec, wait

Why separate creation function to fork and exec?
When creation of new process and setting task for it done separately, can handle the process environment so that enables various features(e.g. change file descriptor).
Also, separation lets shell run code after fork(), before exec() so that can utilize useful tools such as pipes(ch 5, p.6)
With such kind of handling above output of child to stdout(e.g.printf) is redirected to designated path or file descriptor.

But some other opinions exist.
For example, a recent paper by systems researchers from Microsoft, Boston University, and ETH in Switzerland details some problems with fork(), and advocates for other, simpler process creation APIs such as spawn()

Why use wait(), waitpid()?
CPU scheduler schedules process seemingly in non-determinism manner, because of complexity of scheduler. So wait(), waitpid() are needed.

Process control in UNIX

In UNIX shells, some commands(signals subsystem) deliver a specific signal to current running process for convenience.(And process should use signal() system calls to catch them.)

User
For the systems that many people using at the same time, needs to restrict signal to control process appropriately.
So use the concept of user : who exercises full control over their own process to prevent malicious signal by others.
And generally, system needs who can administer it. It’s called superuser or root user in UNIX-based systems.

Useful tools for process managing

ps
allows you to see which processes are running

top
displays the processes of the system, how much CPU, other resources they are eating up.

kill, killall
send arbitrary signals to processes to kill process

CPU meters
to get a quick glance understanding of the load on your system

--

--