Remote System Call

From Virtualsquare
Revision as of 18:24, 27 December 2012 by Renzo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Introduction

The system calls are special functions provided by the Operating System (OS) to the applications; the latter can use these functions to interact with the resources and services offered and managed by the SO.

The basic idea about Remote System Call (RSC) is to permit to a process to execute locally but divert all its system calls to a remote machine, where they are really executed. To do so, the executable of the application has not to be changed, it has to remain intact; in other words, the developer has not to rewrite his applications to use this functionality.

Possible applications of RSC

Have been identified two applications of the RSC:

  1. Use of remote resources and services: the system calls permit to an application to interact with the resources and services of the SO, if these functions are executed in remote machine, the application can interact with the resources and services of the remote SO.
  1. Use of local applications as if they were installed into the remote machine: Think about a user that needs to use a particular application into the remote machine, but this application isn't installed and the user doesn't have the minimum privileges necessary to install it; on the contrary the user can install this application on his local machine. Using the RSC, the user can execute the application locally and it behaves as if it was in execution on the remote machine.

Project Structure

To execute remotely a system call are necessary two steps:

  1. Intercept the system call generated by a process.
  2. Execute remotely the system call intercepted.

The RSC doesn't want to force the developer to change his applications, the executable must remains intact, so is necessary to intercept the system call from the outside of the executable. When they are intercepted, they can be executed into another machine in the second step. For the interception of the system call the RSC project use UMView, in this way was possible to handle all the first step without write a single row of code.

For the second step has been developed three components:

  • module RSC: is a UMView module that interface the RSC library functions with the core of UMView, allowing the remote execution of the system call generated by the processes and intercepted by the core.
  • remote server: is a simple server that receive the requests sent by the RSC module and pass them to the server-side functions of the RSC library.
  • RSC library: is the core of the project. It implements the Remote Procedure Call (RPC) model for the system call functions. The library is used both by the module and the server and it takes care of all the aspects of the system call remote execution.

The figure above shows UMView and the three component developed and how these four elements interact. A process inside UMView generate a system call which is intercepted by the core and, through the RSC module the system call specific stub (provided by the library) is called. The stub is similar the the RPC's stub, it create the request, send it and waits the server's answer. When a request reaches the server, the latter read it and pass it to the server-side stub provided by the library. This stub unmarshall the request, invoke the system call (which is execute by the remote Operating System) and create a response that is returned to the server. The latter send it back to the client stub, which read it and unmarshall its contents; the data are passed, through the RSC module, to the UMView core which copy them to the memory process. The process thinks to have executed a system call locally, but it was totally remotized and executed into another machine.

Usage

The section is going to explain how to use the RSC module and the server to execute the system call remotely. The name of the RSC module is um_rsc.so and the name of the server is rsc_server.

First Step

The first step is to execute the server into the remote machine:

[prompt]$ rsc_server

Without options, the server binds its address to localhost and waits connections on 8050 and 8051 ports (the RSC library needs two separate connections to work correctly). It's possible to change this values using the following options:

  • -a ADDRESS, --address ADDRESS. Binds the server address to ADDRESS (default value is localhost)
  • -p PORT, --port PORT. Changes the port for the normal traffic to PORT (default value is 8050)
  • -e PORT, --es_port PORT. Changes the event subscription port to PORT (default value is 8051)

Second Step

Now that the server is ready, it's possible to load the module into UMView. Inside UMView type:

[prompt inside UMView]$ um_add_service um_rsc.so

After the loading, the module try to contact the server in execution locally, trying to establish two connection, one at port 8050 and one at port 8051. When the connections are established, all the system calls intercepted by UMView are managed by the module and are executed on the server machine.

As for the server, also for the module is possible to change the default values of the ports and of the server address; the options are:

  • sa=ADDRESS. The Server Address option permits to set the address of the server to ADDRESS (default value localhost).
  • sp=PORT. The Server Port option permits to set the server port for normal traffic to PORT (default value 8050)
  • essp=PORT. The Event Subscription Server Port option permits to set the server port for event subscription to PORT (default value 8051)

For example, if the server address is example.com and the server waits for normal traffic on port 9090 and use the default port for event subscription traffic, the module can be loaded in the following way:

[prompt inside UMView]$ um_add_service um_rsc.so,sa=example.com,sp=9090

The options are listed after the module without spaces between them and are separated by commas. Each option has the following format:

<option name>=<option value>

without any space between the option name and the equal sign and between the equal sign and the option value.

The RSC Library

The RSC library is most important element of the project, it implements the model RPC for a small set of procedures, the system calls. The library was designed and developed specifically for this project, the main reason is the need of good performance: a simple application, like ls or echo, generate more than 100 system call and all of them need to be managed by the library. There was the need of a fast, light and optimized library to do this work, a generic RPC implementation wouldn't have been the right choice.

Features

The most important features of the library are:

  • Three services. The library provides three different services:
    • system call remote execution. This is the main service offered by the library, the latter takes care of all the remotization aspects like creation of request/response, data marshalling/unmarshalling and execution of the system call. The library manage about seventy system call which are the system call managed by UMView.
    • event subscription and blocking system call management. The blocking system calls are functions that execute read/write blocking operation on file descriptor. These system calls can be dangerous for the correct execution of the client and the server, so they need a special management.
    • ioctl request management. The system call ioctl needs a particular management by the library. The expression ''ioctl request'' doesn't refer to a request message that can be sent by a peer but refers to the name of the second argument taken in input by the system call.
  • Communication protocol independence. The library doesn't specify which communication protocol to use, this decision is leaved to the users of the library. For example, RSC module and server use the TCP protocol.
  • Support for four different architectures. The library support four different architectures: x86, x86_64, PowerPC and 64bit PowerPC. To support the dialog between these architectures, it's necessary to convert the data representation from one architecture to another; for this reason a specific library was written for this project, its name is Architecture Conversion Library (LibAConv).

Structure

The following figure represent the internal structure of the library. The library structure is divided into two different parts depending on the role inside the communication: the client is the subject who request the execution of a system call remotely (in the specific case is the RSC module), the server is the remote subject who execute the system call.

Each side is divided into three levels, the strict division depicted into the figure is used to simplify the explanation, but in the implementation step it's partially lost for performance reasons.

For each client, there are two communication channel with the server, one is used to transmit the data of the system call remote execution and ioctl request services, the second is used for event subscription service; why the latter needs a separate communication channel is going to be explained into the specific section.

Both sides have the same number of levels which are, from the top to the bottom:

  • services level: it contains the implementations of the three services offered by the library.
  • marshalling/unmarshalling level: it contains the marshalling/unmarshalling functions used to create the request and response messages, to serialize/deserialize the system call arguments and to convert the data from one architecture to another. For the conversion of the system call arguments between two different architectures the Architecture Conversion Library is used.
  • communication/dispatching level: from the client side this level contains the communication functions used to read/write the responses/requests from the communication channel. From the server side the communication functions are extracted from the library and are located outside of it, into the server; the server's third level contains the dispatching functions: they take the data read by the server and sent them to the correct service handler. There are two dispatching functions, one for each communication channel with the client.

From the client side the entry point of the library is located at the service level, so the interface permits to the client to interact directly with the tree services; on the contrary, from the server side the interface is located to the dispatching level, the reason is that the data read by the server don't have any meaning for it, so it needs to pass them to the dispatching functions which can inspect their structure and locate the right service to menage them.

As said before, the strict division into the three levels depicted in the figure is not truly respected in the internal representation, so it's difficult to describe each level separately in a deep way. For this reason, the explanation of the library is not going to be developed describing the three layers but describing the problems that the library has met and how it try to solve them. These problems are

  • UMView client-side limits
  • Support of different architectures
  • Blocking system call and event subscription
  • Ioctl requests
  • System call remotization

Interface

In this section will be shown the interface provided by the library to the user. The interface is divided into two groups of functions: the client-side and the server-side functions.

Client-side

The client-side functions share all the same prefix "rscc_" to underline that they belong to this side of the communication. The following list show them:

/*   INITIALIZATION   */
int rscc_init(int client_fd, int event_sub_fd, struct reg_cbs **rc, enum arch c_arch, enum arch s_arch);

/*   RSC STUBS  */
int rscc_accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
...
int rscc_bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);
int rscc_chdir(char *path);
int rscc_chmod(char *path, mode_t mode);
int rscc_chown(char *path, uid_t owner, gid_t group);
...
int rscc_link(char *oldpath, char *newpath);
int rscc_listen(int sockfd, int backlog);
int rscc_lseek(int fildes, off_t offset, int whence);
int rscc_lstat64(char *path, struct stat64 *buf);
int rscc_mkdir(char *pathname, mode_t mode);
int rscc_mount(char *source, char *target, char *filesystemtype, unsigned long int mountflags, void *data);
...
int rscc_ioctl(int d, int request, void *arg);

/*   EVENT SUBSCRIPTION   */
int rscc_es_send_req(struct reg_cbs *reg_cbs, int server_fd, int event_sub_fd, 
                     int how, void (* cb)(), void *arg);

/*   IOCTL SUPPORT   */
u_int32_t rscc_check_ioctl_request(int request);

The interface is included into the file rsc_client.h; an application that wants to interface it-self with the library must include only this header.

Initialization function

The rscc_init() function permits to initialize the client-side module of the library. It takes in input the following arguments:

  • client_fd: it's a file descriptor for the connection between module and server. It's used for:
    • send the system call remote execution requests and receive the concerning responses.
    • interrogate the server about the support of the specific ioctl request.

    In conclusion, this file descriptor permits to the library to interact with the communication medium configured externally by the library users.

  • event_sub_fd: it's a file descriptor for the communication channel needed for the right management of the blocking system calls and the event subscription. As for the previous descriptor, its initialization is done outside the library.
  • rc: it's a pointer to a struct reg_cbs. This structure stores the informations about the subscribed events for the registered system calls. The structure is passed to the rscc_init() because the latter initializes it, it this way its pointer can be passed to the event subscription functions.
  • c_arch and s_arch: are two constants representing the client and server architecture. These constants are required for the correct marshalling and unmarshalling of the data. The accepted values are four and are defined by the LibAConv: ACONV_X86, ACONV_X86_64, ACONV_PPC ACONV_PPC_64. The library doesn't obtain these constants by it-self but must be provided by the caller, so it's a task of the module and the server to obtain their architecture and the one of the other peer. LibAConv provides a function called aconv_get_host_arch() which returns the architecture of the caller.
System call remote execution functions

In the previous functions list wasn't possible to insert all the functions of this type, because there is one function for each system call, so the list would be very long. The interface was developed to meet a UMView need; the latter requires to their modules that the virtualization function must have the same interface of the original ones; for this reason the declarations of these functions is equal to the system call one for return type and number and type of input arguments. The name of these functions is rscc_<system call name>().

Their tasks is to fully manage the remotization process, on the one hand they create the request, serialize the data and send the request, on the second hand they read the response and manage it. In conclusion these functions implement the client stubs; the user can call these functions as they'd be the original system calls, passing them the same arguments.

Event subscription and ioctl request functions

The function for the event subscription is rscc_es_send_req(); it's task is to that the descriptor event_sub_fd is ready for the event how; if the answer is positive, the function returns the how value (which is positive), otherwise zero. In the latter case, the descriptor is monitored by the server until the event isn't occurred; when this happens, the function cb is called at client side, passing to it the argument arg. The argument server_fd is the file descriptor of the connection with the server; the reg_cbs data structure is used internally to store the information about the registered descriptors.

The function for the ioctl request support is rscc_check_ioctl_request(). It permits to query the server to know if it's support the given ioctl request. The return value can be: zero if an error happens, IOCTL_UNMANAGED if the request is not supported by the server, a positive integer if the request is supported. The positive value returned codes both the size and the role inside the system call of the third argument taken in input by ioctl; its format is compatible with the one required by UMView.

Server-side

The server-side functions share all the same prefix "rscs_"to underline that they belong to this side of the communication. The interface is defined into the header rsc_server.h and is much smaller than the client-side one, all the interfaces is composed by five functions:

/*   INITIALIZATION   */
int rscs_init(enum arch server_arch);

/*   RSC and IOCTL REQUESTS   */
struct iovec *rscs_manage_request(int client_arch, void *request);

/*   IOCTL SUPPORT   */
void rscs_ioctl_register_request(int request, u_int32_t rw, u_int32_t size);

/*   EVENT SUBSCRIPTION   */
struct rsc_es_ack *rscs_es_manage_msg(int esfd, void *data);
struct rsc_es_resp *rscs_es_event_occurred(int esfd, int mfd, int event); 


Initialization function

The initialization function is simpler compared with the client-side one, in fact it stores the server architecture constant server_arch and initializes the internal data structures, nothing more. The functions doesn't take in input the client architecture because the server could manage more than one client, each one with a different architecture; besides it doesn't take in input the file descriptors of the connections with the client since the I/O isn't managed by the library.

System call remote execution functions

On the contrary of the client-side interface, here there is only one single function for the management of all the system call remote execution requests; it's name is rscs_manage_request(). The idea was to provide a solution that would be the cleaner and simpler for the user of the RSC library. The function takes in input the request message received (request) and the client architecture (client_arch); what it returns is a pointer to a struct iovec with only one element, the response message. The using of this data type permits to the caller to access both the data and its size. The server-side interface is very simple, it doesn't require to the user to know the internal structure of the messages.

The rscs_manage_request() manages two kind of requests: the system call remote execution one and the ioctl request interrogation one; so it manages all the requests coming from one of the two communication channel with the client. For the first type of messages, the function detects the specific system call, calls the right unmarshalling function, execute the system call and creates the response, the latter is returned to the caller. For the second type of messages, the function detects if the specific ioctl request is supported or not by the server and returns a response message with the result of the interrogation.

Event subscription and ioctl request functions

At server-side there are two functions for the event subscription:

  • rscs_es_manage_msg() it's the most important function of event subscription. It takes in input the client message (data) and the file descriptor of the connection with the client (esfd); it executes the test on the file descriptor contained into the request and returns the right acknowledgment message to send to the client.
  • rscs_es_event_occurred() must be called by the server when the event occurs for the file descriptor mfd; the file descriptor esfd describes the connection with the client. The function updates its internal informations and returns the message to send to the client.

The ioctl request messages sent by the client are managed by the function rscs_manage_request(), as has been described in the previous section. The interface provides another function to support the ioctl request: rscs_ioctl_register_request(), which must be used by the server to register the ioctl request that wants to support. It must be used during the initialization function, but after the invocation of rscs_init(), because the latter initialize the list used internally to keep trace of the registered request. One invocation of the function register one ioctl request; it takes in input the ioctl request (request), the byte size of the third argument (size) and its role inside the system call (rw); the latter argument can assume the following values: IOCTL_R, IOCTL_W and IOCTL_RW.

UMView client-side limits and communication functions

The limits imposed by UMView are underlined into the third level of the library architecture. This level is the only one that differ in the two sides of the communication: into client-side there are the communication functions used to write and read the data to/from the communication channel, on the contrary into the server side these functions are located outside the library, into the server. There reason for this difference is due to a limit imposed by UMView. From the client side, when a system call is captured by the UMView core, the latter calls the specific function provided by the RSC module to manage that system call; this function is the specific stub provided by the RSC library; so the UMView core calls, through the RSC module, the stub provided by the library. The UMView core expects that the system call is fully managed by the module function, so the stub has to create the request, to send it completely and to wait the answer before return the control to the core. The stub cannot create the request and returns to the core saying "call me when I can sent the data without blocking my-self" or, after has sent the request, "call me when there are incoming data for me"; if the stub returns, the core thinks the management of that system call is terminated. For this reason on the client side of the RSC library blocking communication functions are used. The scenario is different on server side, in fact the server was developed expressly for this project, so there was more freedom during the design step. The server cannot use blocking communication function, because during read/write it is blocked on this action and cannot serve others clients; for this reason it's important that the server manages this I/O operations as it like at the best. The server read the data from the channel and write back the data returned by the library, it doesn't know the internal structure of these data, they are opaque; the task of the dispatching functions offered by the library is to inspect the structure of the data and to send them to the right service manager. There are two dispatching functions, one for each communication channel with the client.

Client side communication functions

The communication functions used are an improved version of the default read/write and readv/writev functions. They are blocking functions, they take in input the data to send/read and its size and they don't return until all the data is sent/read. The four functions are defined into the src/utils.c file and their declaration is:

 int write_n_bytes(int fd, void *buffer, int nbytes);
 int read_n_bytes(int fd, void *buffer, int nbytes);

 int readv_n_bytes(int fd, struct iovec *vector, size_t count, int nbytes)
 int writev_n_bytes(int fd, struct iovec *vector, size_t count, int nbytes)

The write_n_bytes() and read_n_bytes() are very simple functions, they take in input the same number and type of arguments taken by the read() and write() functions: the file descriptor fd of the connection, the buffer with the data and the size nbytes of the data to read/write. The functions call cyclically the read()/write() function until all the data have been received/sent completely.

The writev_n_bytes() and readv_n_bytes() are similar to the other two but they call cyclically the writev()/readv() functions instead of the read()/write() ones. They take in input: the file descriptor fd of the connection, the vector where to read/write data, the number of elements inside the vector (count) and the total amount of data to read/write expressed in number of bytes (nbytes). The latter information can be calculated by the function iterating on each element and adding its size but for removing an iteration, this information is given from the caller.

On the contrary of write_n_bytes() and read_n_bytes(), write_n_bytes() and read_n_bytes() are two macros, there is only one function that implement both of them:

typedef int (*rwv_fun )(int filedes, const struct iovec *vector, size_t count);
int rwv_n_bytes(rwv_fun fun, int fd, struct iovec *vector, size_t count, int nbytes);

#define readv_n_bytes(fd, vector, count, nbytes)  rwv_n_bytes((rwv_fun)readv, fd, vector, count, nbytes)
#define writev_n_bytes(fd, vector, count, nbytes) rwv_n_bytes((rwv_fun)writev, fd, vector, count, nbytes)

This function is rwv_n_bytes(), it takes in input the same arguments of write_n_bytes() and read_n_bytes() plus rwv_fun; this argument is a pointer to the I/O function used to transmit/receive the data, for readv_n_bytes() is readv, for writev_n_bytes() is writev.

Support of different architectures

The RSC library needs to support the four architectures supported by UMView:

  • x86
  • x86_64
  • PowerPC
  • 64bit PowerPC

These architectures differ them-self for different aspects, the most important for the development of the RSC library are four and can be grouped in two distinct sets:

  • Representation of the system calls:
    • each system call can be identified by a different numeric constant in different architectures
    • the number and type of system calls can differ in two different architectures.
  • Data representation:
    • different endianess
    • different size of some data types

In this section we are going to see how the RSC try to manage and solve these problems.

Representation of the system calls

Each system call is identified inside a specific architecture by a numeric constant defined into the unistd.h header file; this file list all the constants for each system call supported by the architecture, these constants have the following name format:

__NR_<system call name>

so, for example, the constant for the system calls read, getpid, mkdir, and so on, are, respectively __NR_read, __NR_getpid and __NR_mkdir. In the rest of this documentation these constants are going to be called __NR_*.

The problem is that the numeric value associated to a specific system call changes from one architecture to another, for example the value of __NR_mount in a x86 architecture is 21 and in a x86_64 is 165; for this reason the first problem met was how to identify a given system call in each of the four architectures. The solution was to develop a list of architecture-independent constants and to use them to identify the system calls. These constants are called __RSC_* and have the same name format of the __NR_* ones:

__RSC_<system call name>

so, using the same example used for __NR_*, there is a __RSC_read, a __RSC_getpid, a __RSC_mount, and so on.

A second problem was the heterogeneity of the system calls sets in different architectures, in particular:

  • some system calls are defined in some architectures but not in others. Into x86 architecture, for some system calls, exist two different versions of the same function, one at 32bit and one at 64bit; for example stat and stat64, getdents and getdents64, statfs and statfs64, and so on. On x86_64 exist only the 64bit version of these system calls; so if a x86 client try to execute remotely a stat into a x86_64 remote machine, this is impossible. The solution is given by UMView and so, was not necessary to manage the problem in the library. UMView provide ad unification mechanism that transforms the 32bit system call generated by the process into the 64bit one; so the module have to manage always the 64bit version, the problem is resolved at the source.
  • some system calls are defined differently in two distinct architectures. The problem concerns the network system calls; into x86 or PowerPC architectures exists only one system call for each network operation(socket, bind, accept, connect,...): the __NR_socketcall. This system call takes as first input argument a constant that define the specific network operation to execute, these constants have the following name format: SYS_<network operation>; some examples are SYS_SOCKET, SYS_BIND, SYS_ACCEPT, SYS_CONNECT, and so on. On x86_64 exists a different __NR_* constant and system call for each network operations: __NR_socket, __NR_bind, __NR_accept, __NR_connect,... The RSC library takes the same approach used by x86_64 architecture and define a different __RSC_* constant for each network system call. The conversion functions provided takes care of this problem during the conversion from __NR_* to __RSC_* and vice versa.

These constants are defined into the include/rsc_consts.h header file, inside the rsc_constant enumeration; this enumeration defines also others three constants: __RSC_FIRST identify the first __RSC_* of the enumeration, __RSC_LAST the last one and __RSC_ERROR to represent a invalid constant. The library provides also the functions to convert the __RSC_* constants into __NR_* and vice versa

enum rsc_constant nr2rsc(int32_t nr_const, int32_t sys, enum arch arch);
struct nr_and_sys *rsc2nr(enum rsc_constant rsc_const, enum arch arch);

The function nr2rsc() takes in input a __NR_* constant nr_const of architecture arch and returns the correspond __RSC_* constant. If arch is x86 or PowerPC and nr_const is __NR_socketcall is necessary to introduce the correct SYS_* constant to get the right __RSC_* value, if arch or nr_const are different from these values, the constant NO_VALUE (defined into the header) can be used for sys.

The function rsc2nr() does the opposite operation, it convert the __RSC_* constant rsc_const into the the __NR_* constant of architecture arch. The return value is a couple:

struct nr_and_sys {
  int32_t nr;
  int32_t sys;
};

If the __NR_* constant into the nr field is __NR_socketcall, the sys field contains the appropriate SYS_* constant, otherwise its value is NO_VALUE.

To speedup the conversion process, the two functions use some arrays defined into src/include/nr_to_rsc.h and src/include/rsc_to_nr.h headers files. Each file contains an array for each architecture, the arrays inside the first file are indexed by the __NR_* constants and the arrays in the second file by the __RSC_* constants.

Data representation

The four architectures have different:

  • endianess: can be big-endian or little-endian.
  • architecture bit: can be 32 or 64 bit.

Based on these criterion it's possible to classify the four architectures in the following way:

  • x86: little endian, 32 bit.
  • x86 64: little endian, 64 bit.
  • PowerPC: big endian, 32 bit.
  • 64bit PowerPC: big endian, 64 bit.

The enadianess problem is well-known so wasn't difficult to manage it, instead the architecture bit problem is less known, so was necessary to study it better, in particular was important to understand how the different number of bits influenced the data representation. Linux systems use the LP64 standard[1] which lists the sizes of the different data types in a 64bit architecture; the unique data types that changes their size compared to 32bit architectures are the long and pointer types, the others remain unchanged.

To guarantee the content of the data don't changed between two different architectures is necessary to use a library that converts the data in the right way. The library was developed for this project and its name is Architecture Conversion Library (LibAConv). Before develop a specific library for this purpose, some researches were made to find a already existing library for this task. The library take in account was the XDR implementation used by Sun RPC.

External Data Representation (XDR)

XDR is the acronym of External Data Representation, a data representation standard explained into RFC 1832. The standard defines a canonical data representation format, that is architecture-independent, and provides the functions necessary to translate the data between the two forms: from canonical to architecture-dependent and vice versa; for each data type there is one function that does this job, some examples are:

bool_t xdr_char(XDR ∗xdrs , char ∗cp)
bool_t xdr_u_ char(XDR ∗xdrs , unsigned ∗ucp)
bool_t xdr_int(XDR ∗xdrs , int ∗ip)
bool_t xdr_u_int(XDR ∗xdrs , unsigned ∗up)
bool_t xdr_long(XDR ∗xdrs , long ∗lip)
bool_t xdr_u_long(XDR ∗xdrs , ulong ∗lup)
bool_t xdr_short(XDR ∗xdrs , short ∗sip)
bool_t xdr_u_short(XDR ∗xdrs , ushort ∗ sup)

Each function takes in input two arguments: a structure storing the informations necessary for the conversion (xdrs) and the data to convert; the return value is a boolean: true if all goes ok, false otherwise. The user can combine these functions, in a recursive way, to support his own data structures. To locate where saving the converted data or from where take the canonical data, the XDR library use the idea of "stream". There are three streams:

  • Standard I/O stream. Permits to code/decode data to/from a file descriptor. It use the standard I/O functions for read/write operations.
  • Record (TCP/IP) stream. It's similar to the previous stream, but it's specific for TCP connections. It permits to code/decode data to/from TCP connections, it's possible to specify the size of in and out buffers and the callback functions to call when data arrive and are sent.
  • Memory stream. Permits to code/decode data to/from a block of memory allocated by the user.

For each stream there is a specific creation function that returns a struct XDR given in input to the conversion functions seen before. This structure stores some informations about the direction of the stream (coding or decoding) and its internal state.

Problems and Limits of XDR solution

It was not possible to use the XDR library inside the RSC project for the following reasons:

  • Double conversion overhead. The use of a canonical format has many advantages: it permits to send/receive data without worry about the architecture on the other communication side and it permits to extend the library to new architecture in a easy way. The major problem is that a data, to be usable, needs two conversions (from sender architecture to canonical format and from the latter to the receiver architecture) and this is a considerable overhead when there are many data to be converted, as happens in RSC.
  • Lack of an appropriate 32-to-64 bit conversion and vice versa. The conversion function of long data type doesn't manage the 32-to-64 and 64-to-32 bit conversion in the right way. The xdr_long() function treats the long data type only as a 32 bit value; the library provides a function called xdr_hyper() to manage 64 bit integer, but this function manage only 64 bit long without converting them into 32 bit. Have been made some tests trying to convert in a canonical form a 64 bit long using xdr_hyper() and to decode it using xdr_long() but didn't work; the 64 bit value was shorter than 232-1, but wasn't correctly decoded. In conclusion the XDR library doesn't provide a function that manage the 32-to-64 and 64-to-32 conversions.
  • Inadequacy of the stream proposed. The stream proposed by the library weren't good for the needs of the RSC library. The TCP and I/O streams force to combine the marshalling/unmarshalling step with the sending/receiving step, in addiction the first stream force to use a specific communication protocol (TCP), but the RSC library want to be independent to the specific protocol used. The only choice remained is the memory stream that permits to specify the memory block to use, separating the two phases seen before. The major problem of this stream is that isn't possible to determinate the right amount of memory to allocate; in fact the representation of the data type in the canonical form can occupy more space than the architecture-dependent one, but the XDR library doesn't provide the tools to know this difference: can be allocated less memory than needed (in this way, the conversion process fails) or more memory than needed (there is a waste of space).


How LibAConv try to resolve XDR problems and limits

The section is going to show how the Architecture Conversion Library tries to resolve the problems underlined in the previous section:

  • Double conversion overhead. The LibAConv convert the data directly from the representation in source architecture to the one in destination architecture, without using a canonical format; in this way the data, to be usable in the destination architecture, needs only one conversion step (and not two). The drawback are the need to know the architecture on the other side of the communication and a more difficult extension of the library to other architectures; these drawback were considered less important than a performance speedup gained by single step conversion.
  • Lack of an appropriate 32-to-64 bit conversion and vice versa. The LibAConv manage correctly the conversion of long, unsinged long and pointer types from 64 to 32 bit and vice versa. In the case of 64 to 32 bit conversion, the library manages these types in the following way:
    • long: if the value fits in a 32 bit long, that value is copied into the 32 bit long. If the value is positive and is grater than the maximum positive value that can be represented in a 32 bit architecture, the latter is being used. If the value is negative and is smaller than the minimum negative value that can be represented in a 32 bit architecture, the latter is being used.
    • unsigned long: if the value fits in a 32 bit long, that value is copied into the 32 bit long. If the value is grater than the maximum unsigned value that can be represented in a 32 bit architecture, the latter is being used.
    • pointer: it's decreased the dimension from 64 to 32 bit, but no operation is made on the address, because it hasn't any meaning in a different address space.
    The opposite case (32 to 64 bit conversion) is simpler, the LibAConv extends only the number of bits from 32 to 64 and in the unsigned long case extendeds the sign.
  • Inadequacy of the stream proposed. The LibAConv library doesn't use the XDR stream concept. For each data data there are two functions: the conversion function and the size function. The first takes in input a pointer to the data to convert, the source and destination architectures and a pointer to the destination memory where store the converted data. The destination memory is allocated by the caller, in this way he can control where to put the data. The right dimension of the memory to allocate is given by the size function, the latter takes in input the source and destination architectures and returns the size in byte of that data type in the destination architecture; in this way the memory can be allocated exactly, without waste.
Architecture Conversion Library (LibAConv)

The Architecture Conversion Library permits to convert a data from the representation in a source architecture to the one in a destination architecture, without waste memory or using an intermediate format.

The library code is stored inside two files: the header (librsc/include/aconv.h) and the source file (librsc/src/aconv.c). The architectures supported are the ones supported by UMView: x86, x86_64, PowerPC and 64 bit PowerPC; they are represented inside the library by four constants:

#define ACONV_32BIT         0x20
#define ACONV_64BIT         0x40

#define ACONV_LITTLEE         0x01
#define ACONV_BIGE            0x02

enum arch {
  ACONV_ARCH_ERROR    = -1,
  ARCH_FIRST    = (ACONV_32BIT | ACONV_LITTLEE),
  ACONV_X86     = ARCH_FIRST,
  ACONV_X86_64  = (ACONV_64BIT | ACONV_LITTLEE),
  ACONV_PPC     = (ACONV_32BIT | ACONV_BIGE),
  ACONV_PPC_64  = (ACONV_64BIT | ACONV_BIGE),
  ARCH_LAST = ACONV_PPC_64
};

As you can see, each constant store the information about endianess and bit number of the specific architecture.

The library supports 47 data types (complete list), these type are the one used by the system call supported by the RSC library. For each data type, two functions are provided:

  • data type size function
  • data type conversion

The functions of the first type calculate the right size occupied by the data in the destination architecture; the user can use this value to allocate a block of memory where to store the converted data without waste space. Some examples of this kind of functions are:

int aconv_char_size(enum arch from, enum arch to);
int aconv_short_size(enum arch from, enum arch to);
int aconv_int_size(enum arch from, enum arch to);
int aconv_long_size(enum arch from, enum arch to);
int aconv_longlong_size(enum arch from, enum arch to);
...
int aconv_mode_t_size(enum arch from, enum arch to);
int aconv_loff_t_size(enum arch from, enum arch to);
...
int aconv_struct_stat64_size(enum arch from, enum arch to);
...
int aconv_string_size(char *s, enum arch from, enum arch to);
int aconv_array_size(enum arch from, enum arch to, int elnum, aconv_size_fun size_fun);
int aconv_bytes_size(int bytenum, enum arch from, enum arch to);

The name format is the following: aconv_<type name>_size. The return value is the number of bytes occupied by the data type into the destination architecture. The arguments taken in input are usually two: the source and destination constants describing the architectures; there are three functions that need more than two arguments:

  • aconv_string_size: calculates the size of a string, so needs in input it.
  • aconv_array_size: calculates the size of an array. The latter can contain elements of each type, so is necessary to take in input the size function for the element type (size_fun) and the number of elements (elnum).
  • aconv_bytes_size: calculates the size of an array of bytes, so it needs to take in input the number of bytes inside the array.

The conversion functions permit to convert the data type from a source to a destination architecture, some examples are:

int aconv_char(char *c, enum arch from, enum arch to, void *p);
int aconv_int(int *i, enum arch from, enum arch to, void *p);
int aconv_short(short *i, enum arch from, enum arch to, void *p);
int aconv_long(long *l, enum arch from, enum arch to, void *p);
int aconv_longlong(long long* l, enum arch from, enum arch to, void *p);
...
int aconv_mode_t(mode_t *n, enum arch from, enum arch to, void *p);
int aconv_loff_t(loff_t *n, enum arch from, enum arch to, void *p);
...
int aconv_struct_stat64(struct stat64 *s, enum arch from, enum arch to, void *p);
...
int aconv_string(char *s, enum arch from, enum arch to, void *p);
int aconv_array(void *a, enum arch from, enum arch to, int elnum, void *p, 
                aconv_size_fun size_fun, aconv_fun aconv_fun);
int aconv_bytes(void *b, enum arch from, enum arch to, void *p, int bytenum);

The name format is: aconv_<type name>. The arguments taken in input are four: the pointer to the data to convert, the source (from) and the destination (to) architectures, a pointer to the output memory (p); the latter must be allocated with the correspondent size function. The functions for array and bytes array need more input arguments:

  • aconv_array: needs to know the number of elements (elnum) and how to convert each single element. The latter information is provided by two arguments: the size (size_fun) and the conversion (aconv_fun) functions for that data type.
  • aconv_bytes: can be used to convert a memory block pointed by a void pointer or of which the content cannot be known. This function can be considered as a specific case of the aconv_array function, where the data type of the single element is known and so, the only information needed, is the number of elements (bytenum).

The return value can be one of following three constant:

  • ACONV_OK: the conversion has been successful.
  • ACONV_UNNEC: the conversion is not necessary because source and destination architectures are the same.
  • ACONV_ERROR: an error occurred, so the conversion failed.

There is another function provided by the LibAConv, its name is aconv_get_host_arch(). This function doesn't take in input any argument and return one of the four architecture constant, representing the architecture of the caller machine (or ACONV_ARCH_ERROR if the caller architecture isn't supported).

Blocking system call and event subscription

A blocking system call is a function that works on a file descriptor doing some blocking operations on it, usually these operations are writings or readings. A blocking system call is dangerous both for server and module: the server is blocked on the execution of the call and cannot do other things, like manage the other clients; from client-side all the virtual machine is blocked, in fact the stub is waiting for the server's response and until it will arrive, it won't return the control to the UMView core. In conclusion, the blocking calls aren't only a problem for the RSC library but also for UMView, the latter gives a solution to it; this solution was used and adapted to work with the RSC.

A UMView problem

UMView cannot permit to a module to block it-self inside a system call because all the virtual machine would block, in fact the modules are dynamic loaded libraries so the module blocking function is called by the UMView core. The solution of the problem is the following: the core, before calling the module function, asks to the module if the next system call is going to blocking it-self; if no, the module function is called normally, otherwise the process generating the system call is suspended. The core will wake the process when the module will call the callback function registered by the core; this function warns the core that the file descriptor is ready and the module function can be called without worry. The function used by UMView to query the module is defined into the module interface:

static long event_subscribe(void (* cb)(), void *arg, int fd, int event);

It takes in input a pointer to the callback function (cb), an opaque argument passed to the callback function (arg), the file descriptor to test (fd) for the specific event (event). The event can be reading (CB_R) or writing (CB_W). The core use the function to query the module, the event_subscribe acts as a poll with zero timeout, that is it tests immediately the descriptor for the given event and returns a negative or positive response; in the first case the callback function is registered and it will be called when the descriptor will be ready for that event. The event_subscribe can also be used to unregister a previous registration of descriptor fd for event event, calling it with cb equal to NULL; the descriptor will be unregistered and a positive/negative response is returned by the function, providing the state of the descriptor just before the deregistration.

The RSC solution

The RSC library extends the solution provided by UMView to manage correctly the blocking system call. The latter are executed remotely on the server, so must exist a mechanism that permit to the client to query the server about the file descriptor's state. Have been identified three utilization scenarios:

  1. The module queries the server, the descriptor is ready, the server send back a positive answer and the module communicate this answer to UMView.
  2. The module queries the server, the descriptor is not ready, the server send back a negative answer and the module communicate this answer to UMView. In the meantime, the server monitors the descriptor and, when the latter will be ready, it will send back a response to the module; the module registers the callback function, the latter will be called when it will receive the response from the server.
  3. The module wants to unregister the file descriptor, it communicates this intention to the server which tests the descriptor and send back the response; this result is returned by the module to UMView.

These three scenarios describes two different type of communication between client and server:

  • synchronous. This kind of interaction happens when the module send a query or a deregistration request; it sent that request and wait the server response, the module is blocked during the waiting of the answer. The exchange of these messages happens inside the event_subscribe() which needs to return a positive/negative answer to the UMView core, so it needs to wait the server response before return the control to UMView.
  • asynchronous. This kind of interaction happens in the second scenario: the file descriptor is not ready, so the server needs to monitor it and send a response when it will be ready. From client side, this response is not received by the event_subscribe() because it has already returned a negative answer.

The synchronous interaction is easy to manage, it's sufficient to use a block reading into event_subscribe(), the function waits until the server response arrives. The asynchronous interaction is more problematic:

  • UMView doesn't provide a mechanism to receive the response from the server; UMView doesn't know of the existence of this server, it is disguised by the module interface.
  • The module is only a set of functions called by the UMView core, doesn't exist a function that can continuously wait the server response; if it would exist, it would be invoked by the UMView core, but this would block the core until the server's answer reception.

The solution is to adopt a client-side thread which can wait the arrive of the response and call the callback function. The thread is independent from the UMView core, so can block it-self without compromise the virtual machine working. The use of a thread that continuously try to read data from the communication channel imply:

  • it's necessary to use a different communication channel from that used by "normal traffic", that is remote system call invocation and ioctl requests. If a single channel is used, all the response sent back by the server would be read by the thread and not by the specific system call stub, so the latter would be blocked for ever and couldn't correctly finish the system call (or ioctl request) management.
  • the event_subscribe() can only send request but cannot read the server responses, otherwise this reading would be in competition with the thread one. So the event_subscribe() send only the requests but all the responses are read by the thread; a synchronization mechanism between the function and the thread is necessary.

In conclusion, the analysis of the three scenarios has underlined the existence of the subjects: the module, the server and the thread. In the next session it will introduced the protocol used by the RSC library to manage the blocking system call problem.

The Protocol Messages

There are four different kind of messages used by it, they are defined inside the include/rsc_messages.h; this header defines all the kind of messages used by the RSC library. The messages are:

  • the query message is used by the module to query the server to know if the file descriptor fd is ready for the event how:
    struct rsc_es_req {
      RSC_ES_COMMON_FIELDS
      int fd;
      int how;
    }__attribute__((packed));
    
  • the deregistration message is used by the module to unregister a previous registered event how for file descriptor fd.
    struct rsc_es_dereg {
      RSC_ES_COMMON_FIELDS
      int fd;
      int how;
    }__attribute__((packed));
    
  • the acknowledgement message is used by the server to answer immediately a query or deregistration message sent by the client. The message contains the result response of the test made on file descriptor fd for event how.
    struct rsc_es_ack {
      RSC_ES_COMMON_FIELDS
      u_int8_t response;
      int fd;
      int how;
    }__attribute__((packed));
    

    The values authorized for the filed response are four:

    enum event_sub_ack {
      ACK_NOT_INIT = -1,
      ACK_FD_READY = 1,  /* The fd is ready */
      ACK_FD_REG,     /* The fd was not ready, so is monitored by the server */
      ACK_FD_DEREG_READY, /* The fd was ready and it has been deregistered */
      ACK_FD_DEREG_NOT_READY /* The fd wasn't ready and has been deregistered */
    };
    

    Two are used to answer a query message (ACK_FD_READY and ACK_FD_REG) and the other two for a deregistration message(ACK_FD_DEREG_READY and ACK_DEREG_NOT_READY).

  • the callback execution message is used by the server to inform the module that the event how for the file descriptor fd was happened; when the thread receive the message, it can call the callback function.
    struct rsc_es_resp {
      RSC_ES_COMMON_FIELDS
      int fd;
      int how;
    }__attribute__((packed));
    

All the four message structures share a common macro (RSC_ES_COMMON_FIELDS), this macro is:

#define RSC_ES_COMMON_FIELDS u_int8_t type;

It defines the filed containing the type of the message:

enum event_sub_type {
  EVENT_SUB_REQ = 1,
  EVENT_SUB_ACK, 
  EVENT_SUB_RESP,
  EVENT_SUB_DEREG
};
The Protocol

For more clarity the description of the protocol will be divided into two parts: the query part and the deregistration one. From client-side, the module and the thread use a generic list to store the informations about the registered file descriptor, the access to this data structure is controlled by some mutual exclusion mechanism that it's not specified by the protocol; another things that the protocol doesn't specific is the coordination mechanism used by module and server to synchronize their dialog.

The query part

The section is going to show the protocol part describing the query by the module.

  • module: the event_subscribe() is called with file descriptor fd, event event, callback function cb and the argument to pass to the latter function arg.
  • module: controls in its local list if the fd has been registered for event; if yes, it returns a positive answer; otherwise it continues.
  • module: inserts fd, event, cb and arg in its local list.
  • module: creates a query message and send it.
  • module: waits that the thread communicates to it the arriving of the acknowledgment message from the sever.
  • server: receives the query message.
  • server: executes immediately a non-blocking test to see if fd is ready for event (for example using a poll() with a zero timeout).

For more clarity, the two cases (positive and negative answer to the test) are going to be described separately. If fd is ready:

  • server: the result of the test is positive.
  • server: creates a acknowledgment message with response equal to ACK_FD_READY and send it.
  • thread: reads it.
  • thread: accesses the local list and search the entry inserted previously by the module (thanks to the fd and event fields inside the ACK message)
  • thread: inserts the server response inside the found entry.
  • thread: unblocks the module.
  • module: accesses the list element and read the answer.
  • module: returns a positive result to the caller.

In this way, terminates the management of the positive answer of the server test. This case is simpler than the negative one, the latter is explained now:

  • server: the result of the test is negative.
  • server: creates a acknowledgment message with response equal to ACK_FD_REG (to inform the module that the file descriptor has been registered) and send it.
  • server: monitors the file descriptor for event.
  • thread: reads it.
  • thread: accesses the local list and search the entry inserted previously by the module (thanks to the fd and event fields inside the ACK message)
  • thread: inserts the server response inside the found entry.
  • thread: unblocks the module.
  • module: accesses the list element and read the answer.
  • module: returns a negative result to the caller.
  • ...
  • server: fd is ready for event.
  • server: creates a callback execution message and send it.
  • thread: reads the server message.
  • thread: accesses the local list and search the entry inserted previously by the module.
  • thread: gets from that element the callback function cb and the argument arg.
  • thread: changes a flag inside the entry to underline that the callback function for that entry has been called
  • thread: calls cb with argument arg.

From the protocol described, it's possible to see how the thread didn't remove the entry from the local list, but set only a flag to underline that the callback function has been already invoked for that entry. The entry can be removed only by a deregistration; the flags avoids to the thread to execute a second time the callback function if it receives another callback execution message after the first one.

The deregistration part

The section is going to show the protocol part describing the de-registration.

  • module: the event_subscribe() is called with file descriptor fd, event event, and the callback function equal to NULL.
  • module: searches into the local list the entry describing the registration of fd for the event event; if it founds that entry, continues, otherwise returns.
  • module: creates a deregistration message and send it.
  • module: waits that the thread communicates to it the arriving of the acknowledgment message from the sever.
  • server: receives the query message.
  • server: executes immediately a non-blocking test to see if fd is ready for event (for example using a poll() with a zero timeout).
  • server: creates a acknowledgment message with response equal to ACK_FD_DEREG_READY if the test result was positive, otherwise uses ACK_FD_DEREG_NOT_READY.
  • thread: reads it.
  • thread: accesses the local list and search the entry inserted previously by the module (thanks to the fd and event fields inside the ACK message)
  • thread: inserts the server response inside the found entry.
  • thread: unblocks the module.
  • module: accesses the list element and read the answer.
  • module: removes that element from the list.
  • module: returns the server answer to the caller
The RSC library interface

The description of the functions provided by the RSC library will be divided into two parts: client side and server side.

Client side

From the client side there is the code to manage the thread and the messages sent by the module. Module and thread communicate throw a shared list defined into include/rsc_client.h; its structure is the following:

struct reg_cbs {
  struct reg_cb *v;
  int size;
  int nentry;
};

It contains the array (v) of struct reg_cb, the dimension of the latter (size) and the number of entry inside of it (nentry). The array v contains elements which structure is:

struct reg_cb {
  int fd;
  int how;
  void (* cb)();
  void *arg;
  
  int ack; /* Is the value of the ACK received. It's initialized to -1  */
  int cb_executed; /* True if the callback has been already executed, false otherwise */
};

The first four fields are the argument passed in input to event_subscribe(), the last two are the value of the acknowledgment received by the thread from the server and the flag used to underline the fact that the callback function has been already executed or not. The functions to manage this list are defined inside src/registered_callbacks.c and src/include/registered_callbacks.h files; they permit to initialize a list and remove or add elements to it.

Module and thread share this list, to guarantee mutual exclusion during the access to it, it's used a pthread mutex (reg_cbs_mutex). To synchronize the module and the thread, as required by the protocol, a pthread conditional variable is used.

The rscc_es_init() function initialize the client-side part of the event subscription module: it create the shared list, the mutex, the condition variable and the thread. The function is called by the the global client-side initialization function rscc_init() and it's defined inside the src/event_subscription.c; this file define all the functions (both client and server side) for the event subscription management.

The thread is implemented by the function rscc_es_thread() which implements the thread part of the protocol: the body of the function is an infinite loop where, as first operation, there is a blocking read used by the thread to wait incoming data; the data received are managed as described by the protocol.

The functions seen until now are all static functions, there is only one single public function exposed by the event subscription interface: the rscc_es_send_req(). This function implement the module part of the protocol, its definition is:

int rscc_es_send_req(struct reg_cbs *reg_cbs, int server_fd, int event_sub_fd, int how, void (* cb)(), void *arg);

The arguments taken in input can be divided into two groups:

  • the shared list (reg_cbs) and the file descriptor of the connection with the server (server_fd)
  • the arguments taken in input by event_subscribe() (event_sub_fd, how, cb and arg)

The functions can be called inside the module event_subscribe() and for this reason it takes in input its arguments.

Server side

The server side offers two public functions defined inside the include/rsc_server.h header file:

struct rsc_es_ack *rscs_es_manage_msg(int esfd, void *data);
struct rsc_es_resp *rscs_es_event_occurred(int esfd, int mfd, int event);

these two functions implements the server part of the protocol. Into server-side, as you seen before, the communication functions are outside the RSC library, so these functions takes in input the data read and returns a message to be sent. The first function must be called when a query or deregistration message arrives from the client, the second must be called when a monitored file descriptor is ready.

The rscs_es_manage_msg() takes in input the file descriptor of the connection with the client (esfd) and the data read (data), then analyzes the kind of message contained in data, executes an immediate test using a poll with timeout set to zero, creates a acknowledgment message based on the poll result and returns it.

The rscs_es_event_occured() takes in input the file descriptor of the connection with the client (esfd), the file descriptor monitored (mfd) and the event occurred (event), the creates the appropriate callback execution message and returns it.

Both functions use a list data structure to keep track of the registered descriptor, the name of the list is rscs_es_list, it's initialized inside the server-side event subscription initialization function rscs_es_init() and its type is struct list. The latter type is:

struct list {
  void **v;
  int size;
  int nentry;
};

and it's defined inside the src/include/generic_list.h. The generic list is a generic list data structure; the meaning of the fields is equal to the struct reg_cbs used at client side, with the only difference that the elements inside the array v are generic void pointers. The module provides different functions to create a lists, add or remove elements and to search inside them. The user can use this data structure as he wants, he has only to define the structure of the element and the compare() function used inside the search function to compare the different elements inside the list.

The event subscription module defines the structure element as follow:

struct rscs_es_listel {
 /* event subscriber fd, used to distinguish between two different 
  * event subscriber with same mfd and event */
 int esfd;  
 int mfd; /* monitored fd */
 short event;
 /* Show if the event 'event' is occurred or not */
 u_int8_t state; 
};

Each list entry save the connection and monitored file descriptors, the event and the state of the entry. The state shows if the event has been occurred or not (respectively RSCEM_EV_OCCURRED or RSCEM_EV_NOT_OCCURRED), in fact the entry is not removed by the list if the monitored event occurred, but only if a deregistration message for that file descriptor and event was received by the client.

ioctl request management

The ioctl request management is the third service provided by the library. The expression "ioctl request" doesn't mean a request message that can be sent by one of communication nodes, but refer to the second argument of the system call, called request. Because of this argument, the ioctl system call needs a particular management by UMview and by the library; now we will see why.

The ioctl system call:

int ioctl(int fd, int request, void *arg)

can manipulate the underlying device parameters of special files. The first argument (fd) is the file descriptor of the special file, the second (request) is the request identifying the parameter to manipulate and the third (arg) is an opaque argument associated to the specific request.

The big problem is that there isn't a complete and official list of all the valid values for the argument request, so it's not possible to manage a priori all the specific cases. Connected to this problem there is the problem of the argument arg: its type, its byte dimension and its role inside the system call (it can be an input or output or input/output argument) depends on the specific request. Since there is not a complete list of all the values admitted for request, it's not possible to know a priori neither the dimension nor the role inside the system call, but these informations are necessary to correctly create and understand remote system call execution requests and responses.

A UMView problem

The ioctl request management it's also a UMView problem, the latter needs to know the dimension and the role of arg to correctly copy the data from/to the memory process. UMView adopt the same tecnique used for the event subscription: asks to the module what to do. The function checkfun(), defined in each module, is used by the UMView core to understand if the specific system call is managed by that module or not; in the specific case of ioctl system call, UMView use this function to knows if the module support the specific ioctl request. The module's answer can be positive or negative; in the first case the answer contains a numeric value representing the dimension and the role of the arg associated to the specific request; in this way UMView understand that the module supports the specific ioctl request and have the necessary information to correctly manage the request. A similar solution is used by the RSC library.

The RSC solution

As for the event subscription, also here UMView try to get the necessary informations from the module. Some UMView modules perform specific operation, for example they virtualize the network or the file system and, for this reason, they manage only a small set of ioctl requests. The RSC module provide a different virtualization service, in fact it manages all the system calls and for this reason it would manage all the ioctl requests but, as said before, a complete request list doesn't exist so there is the problem of how manage them.

The solution provided by the RSC library permits to the server to register which ioctl request it wants to support and permits to the client to query the server to know if support, or not, the given ioctl request; as UMView asks to the module, so the RSC module asks to the server. This solution permits to extend the support to new requests, registering them in the server without the need to change the module or the library code. Since this solution needs a remote interrogation, has been decided to support the functionality at library level.

At the contrary of the event subscription problem, for the ioctl request management has not been necessary to develop a specific protocol but when it's necessary to query the server, the client sent the request and wait the response; if the latter is negative, the specific ioctl request isn't manage by the server and so the ioctl call cannot be managed, otherwise the server sent to the client the right informations about the associated arg (its dimension and its role inside the ioctl call); these information can be used to create and understand the request and responses sent and received by the client and the server. The communication between client and server is synchronous, so was possible to use the same communication channel used to send/receive the remote system call messages, in fact while the RSC is waiting the server's response, UMView is blocked and cannot manage other system calls. The set of ioctl request supported by the server cannot change at runtime, so has been possible to introduce a client-side caching system which save the server's response received, minimizing the network traffic.

The messages

There are two types of messages: the query sent by the client and the response sent back by the server. They are defined inside the include/rsc_messages.h. The query is defined as:

struct ioctl_req_header {
  REQ_HEADER 
  int32_t req_ioctl_request;
} __attribute__((packed));

The important field is req_ioctl_request, it's a 32bit integer defining the ioctl request for which the interrogation is done. The macro REQ_HEADER defines the set of fields shared by ioctl request queries and RSC execution requests; it's defined as follow:

#define REQ_HEADER        u_int32_t req_size; int8_t req_type; 

It contains two fields: the size in byte of the request (req_size) and the multiplexing field (req_type) used to distinguish between a ioctl request query (RSC_IOCTL_REQ) and a RSC execution request (RSC_SYS_REQ).

The server's response is defined as:

struct ioctl_resp_header {
  RESP_HEADER 
  u_int32_t resp_size_type;
} __attribute__((packed));

As for the the request, there is a macro (RESP_HEADER) defining the response common fields and the result of the interrogation (resp_size_type). The macro RESP_HEADER contains the same fields of REQ_HEADER:

#define RESP_HEADER       u_int32_t resp_size; int8_t resp_type;

the size in bytes of the response (resp_size) and the multiplexing field (resp_type); the values admitted for the latter are RSC_SYS_RESP for the system call remote execution response or RSC_IOCTL_RESP for the response to a ioctl request interrogation. The The resp_size_type field is a 32bit integer used to store the server's answer to the interrogation done by the client; if the server doesn't manage the specific ioctl request its value is IOCTL_UNMANAGED, otherwise the field codes two information about the third argument of the system call:

  • the role inside the system call: it could be an input argument (IOCTL_R), an output argument (IOCTL_W) or an input/output argument(IOCTL_RW). This information is coded into the 4 most significant bits.
  • it byte dimension: this information is coded into the remaining 28 bits. There is a specific macro, called IOCTL_LENMASK that can be used as bit mask to extract this information from the 32 bit field.

The format of resp_size_type is compatible with the one expected by UMView, so it can be returned to UMView without the need of translate it.

The cache

The cache data structure is implemented into src/rsc_client.c file, it's a simple queue implemented as double linked list. For this first implementation of the cache has not been developed a specific or efficient swapping algorithm to minimize the "cache faults" because the most important aim was to develop a working cache; the choice of the best algorithm can be do in a second time. The cache is implemented as a simple queue, the entries are inserted by the head and, when there is no more room, are removed from the tail. The C definition is the following:

struct ioctl_cache {
  struct ioctl_cache_el *first;
  struct ioctl_cache_el *last;
  int size;
  int nentry;
};

it contains a pointer to the first (head) and last (tail) element of the list (first and last pointers), the maximum size allowed (size) and the number of element inside of it (nentry</tt). The filed <tt>size is introduced to avoid the infinite growth of the queue: when nentry is equal to size the last element is removed. The single cache entry is defined as follow:

struct ioctl_cache_el {
  int request;
  u_int32_t size_type;
  struct ioctl_cache_el *prev;
  struct ioctl_cache_el *next;
};

it stores the ioctl request (request) and the response sent by the server (size_type); the entry contains also two pointers to the previous and next element of the queue.

The cache can be initialized with the function:

struct ioctl_cache *ioctl_cache_init(int size);

it accept in input the maximum size of the queue and return a pointer to the cache. A function to add a new entry to the cache is provided:

void ioctl_cache_add(struct ioctl_cache *cache, int request, u_int32_t size_type);

it creates the new entry and saves in it the request and size_type input arguments, the new entry is then added to cache. If there is no room into the cache the ioctl_cache_add() pop the last element from it. The last function provided, permits to execute searches inside the queue:

u_int32_t ioctl_cache_search(struct ioctl_cache *cache, int request);

it searches into cache for the entry with the request field equal to request, it returns the entry size_type field if it finds the entry, 0 otherwise.

All these functions are static functions, they are used internally by the RSC library and are not provided to the user.

The RSC library interface

From the client side, the RSC library provides a single function:

u_int32_t rscc_check_ioctl_request(int request);

this function can be used by the client to interrogate the server about the support, or not, of the ioctl request given in input. The return value can be the number 0 if an internal error occurred or the field resp_size_type contained inside the response message send back by the server. The function queries the cache before send the request to the server and send the latter if, and only if, there is no entry for the given request into the cache.

Also for the server side, the library provides only one function:

void rscs_ioctl_register_request(int request, u_int32_t rw, u_int32_t size);

this function can be used by the server to register the ioctl request that it wants to support. The function takes in input the ioctl request (request) and the two informations connected with the third argument of the system call:

  • its role inside the system call (rw): input (IOCTL_R), output (IOCTL_W) or input/output argument (IOCTL_RW).
  • its dimension in bytes (size)

These informations are stored inside a generic list data structure (described here) initialized by the function rscs_init(); this function is the server-side initialization function of the RSC library. The management of the interrogation request and response is done by the function rscs_manage_request() which manage both ioctl request and RSC execution messages, so it will be explained in details into the next section, where it will be described the system call remote execution management.

System Call Remote Execution

In this section will be described the internal structures of the stubs and functions that provided by the RSC library to remotely execute the system calls. For each system call there are two different stubs, one for each side of the communication; their aim is to manage all the steps to execute remotely the specific call. The internal structure of all the stubs is very similar for number and types of functions used, what changes is the marshalling and unmarshalling of the specific input arguments: each system call has its own number of input arguments, each one with its own type. For the marshalling and unmarshalling of the different messages, it's possible to divide the different system calls into four groups depending on the presence, or not, of pointers and the role of them into the system call:

  • system calls without pointer arguments: they are the most simple system calls to manage, in fact it's sufficient to fill the message structure with the arguments and send it. There is no marshalling/unmarshalling.
  • system call with input pointer arguments: the input pointer arguments are those arguments which points to a memory area used in input by the system call. That memory isn't change by the system call, it's only read; for this reason these kind of pointers are also called read pointers. The pointed memory must be included into the request message so that can be read by the remote system call.
  • system call with output pointer arguments: the output pointer arguments are those arguments which points to a memory area used in output by the system call. That memory isn't read by the system call, it's only written by it; for this reason these kind of pointers are also called write pointers. It's not necessary to include the pointed memory into the request message but must be included into the server's response.
  • system call with input/output pointer arguments: the input/output arguments are those arguments which points to a memory area used in input and output by the system call. The memory is read by the system call and its contents are changed by it; for this reason these kind of pointers are also called read/write pointers. This case can be seen as a combination of the previous two. It's necessary to include the pointed memory both into the request message and in the response one.

The presence, or absence, of pointer arguments has influenced very much the design choices done, in particular to guarantee efficient marshalling and unmarshalling operations.

In the next section will be presented the message structure, then, in the other sections, will be show the internal structure of the client and server stubs.

Messages format

The request message is used by the client stub to request the remote execution of a specific system call and to provide all the informations needed by the server to execute it. Every system call has its own request message because into these message are present one field for each system call argument; for example the structure of the accept request is:

struct accept_req {
  SYS_REQ_HEADER
  int sockfd;
  struct sockaddr *addr;
  socklen_t *addrlen;
} __attribute__((packed)); 

The structure fields can be divided into two different groups:

  • fields shared by all the request messages. The fields inside this group are defined by the macro SYS_REQ_HEADER:
    #define SYS_REQ_HEADER    REQ_HEADER u_int16_t req_rsc_const;
    
    which, in turn, is defined as another macro (REQ_HEADER ) and an unsigned 16bit integer (req_rsc_const). The req_rsc_const field is a multiplexing field, it store the __RSC_* constant that identify uniquely the specific system call; using this field the server can understand for which system call is requested the remote execution and can correctly understand the system call specific fields. The macro REQ_HEADER has already been described in the ioctl request section; it contains the field in common to system call remote execution request and ioctl request messages.
  • fields specific for the given system call. These is one field for each system call argument, also for the pointer ones. The latter are a waste of memory, because their value have a meaning inside the client address-space, but when the message reach the server these values are meaningless, despite that they are included to make easier the unmarshalling process (as it will be show in the next sections).

The response message is create by the server stub to inform the client stub that the system call has been executed and to send back some informations. On the contrary of the request message, exists only a response message for all the system calls. It's definition is:

struct sys_resp_header {
  RESP_HEADER 
  u_int16_t resp_rsc_const; 
  int32_t resp_retval; 
  int32_t resp_errno;
} __attribute__((packed));

The message contains the macro RESP_HEADER with the fields shared by ioctl request and system call remote execution response messages, these fields have been described into ioctl request section. The other fields are specific to the system call remote execution response message: resp_rsc_const is the multiplexing field, containing the __RSC_* constant representing the system call execute; resp_retval is the value returned by the system call and resp_errno is the error number set by the system call to communicate which kind of error is occurred, if it's occurred.

Messages creation

In this section will be explained the message creation and marshalling/unmarshalling processes for request and response messages.

Request creation

The creation process of a request message consists of two steps: the initialization of the message fields and the marshalling of input or input/output pointed memory (if there are). Therefore the marshalling process is executed only if the system call has input or input/output pointers; this process consist of appending the memory pointed by these arguments at the end of the request message, in this way the complete message sent by the client stub is composed by the message header and all the input and/or input/output pointed memory. The order in which the memory area are appended to the header is the order of the system call input arguments, with one exception: if the size of the pointed memory is saved in another memory part, pointed by a following pointer argument. An example is given by the accept system call: the size of the memory pointed by the second argument (addr) is given by the value pointed by the third argument (addrlen); in this system call if it's marshaled addr and then addrlen, how the server could unmarshal correctly addrlen value if it doesn't know where ends addr memory and begins addrlen one? For this reason addrlen memory is marshaled before addr one, so the server, which knows the size of adrrlen memory (it's sizeof(socklen_t)), can know where adrr memory starts and its size.

The need to introduce the pointer fields into the request structure (as described in the previous section) it's important in the unmarshaling step. The following figure:


shows a request message for the readlink system call, which takes in input three arguments: an input pointer to the string containing the symbolic path (path), an output pointer to the buffer where the system call will save the informations (buf) and an integer with the dimension of the latter argument (bufsiz). When the message reaches the server the pointer values have no more mean, considering that is changed the address space. Their importance is given, not for the value they store but for the space they fill, in fact in the unmarshaling phase their content is updated to point to:

  • the memory appended to the request header, if they are input pointers.
  • newly allocated memory, if they are output pointers.

In this way is not necessary to allocate new memory for auxiliary structures.

The process of marshalling described until now is valid only if the client and server architectures are the same, otherwise is necessary to introduce a conversion steps of the data during the creation of the request. The fields shared by all the requests (req_size and req_rsc_const) are always converted in network byte order using the htonl and htons functions, even if the two architectures are the same. The specific system call fields, instead, are converted using the Architecture Conversion Library (LibAConv); the latter is used only if the two architectures are different, otherwise the data are sent as they are. Using this library the data are in the destination architecture yet, so the server, in its unmarshalling step, needs only to convert the network byte order fields and to update the pointer fields, nothing more.

Response creation

The process of creation of a response message is equal to the one for the request message, there are two steps: the filling of the header fields and the marshalling of the output or input/output pointed memory. As for the request marshalling, the memory areas are appended at the end of the response header. The header fields are all converted in network byte order, even if the sending and receiving architectures are the same; the LibAConv is only used on the memory areas appended if, and only if, the two architectures differs.

Client side

The RSC library client-side interface provides a stub for each system call supported; this stub takes care of all the remotization phases that permit to remotely execute the system call. The structure of a stub is the same for each system call and is shown in the following code, where, for clarity, has been removed the error management code:

int rscc_<system call name>(< ... system call arguments list ...>){
  struct sys_resp_header resp_header;
  int nwrite, nread;
  int nbytes;
  struct iovec *v;
  int iovec_count;

  /* Creates the request message 'v' */
  v = rscc_create_<system call name>_request(&nbytes, &iovec_count, < ... system 
                                               call arguments list ...>);
    
  /* Sends the message... */
  nwrite = writev_n_bytes(rsc_sockfd, v, iovec_count, nbytes);

  /* ... and waits the response */
  nread = read_n_bytes(rsc_sockfd, &resp_header, sizeof(struct sys_resp_header));

  /* Deserializes the response */
  v = rscc_manage_<system call name >_response(&resp_header, &iovec_count, &nbytes, 
                                               < ... system call arguments list ...>);

  /* Reads the remaining data, if there are */
  if(v != NULL) {
    nread = readv_n_bytes(rsc_sockfd, v, iovec_count, nbytes);
  }

  errno = resp_header.resp_errno;
  return resp_header.resp_retval;
}

The steps preformed by the stub are:

  • request creation. The function with this purpose is rscc_create_<system call name>_request(), it:
    • allocates the memory for the request message.
    • fills the header fields, with the conversion of some of them in network byte order (all of them, except the system call arguments).
    • uses the LibAConv per the conversion of the remaining fields and of the data store in the memory areas pointed by the input and input/output pointer arguments, if and only if the two architectures are different.
    • appends the memory areas to the request header.

    The message returned by the function is a struct iovec. The use of this kind of vector permits to build the message incrementally: the first element is a pointer to the request header and each following element is a pointer at one of the memory area to send. If the sending and receving architectures are the same, the use of this structure permits to read the memory areas to send, directly from the input arguments of the stub, without the need to build a monolithic message and copy the memory contents inside it. If the two architectures are different, a monolithic memory area is allocated to save the converted data (system call arguments and memory areas).

    The function takes in input the system call arguments and the two integer pointers nbytes and iovec_count, in which it will save, respectively, the total byte dimension of the data contained into the vector and number of elements of the latter.

  • sending the message. It's used the writev_n_bytes() which executes as writev() calls as all the data is send.
  • waiting the response. The function read_n_bytes() is used; it executes as read() calls as all the response header has been read.
  • unmarshalling of the response. This step is done by the rscc_manage_<system call name >_response() function. The reading done in the previous point has read only the header of the response, not all the message: there could be some output or input/output buffers appended, but their presence and dimension depends on the specific system call. If there are no more data to read, the function converts the header fields from network to local byte order and nothing more; otherwise it creates a struct iovec where each element is the output or input/output pointer given in input to the stub. In this way the readv() done in the following step can read the data from the network directly to the memory area provided by the caller of the stub, without useless allocation of memory. This solution works also if the two architectures are different because the data are converted by the server, so they arrive from the network in the right local representation.
  • reading of the remaining data. If the function on the previous point returns a null value, it means that there are no more data to read, otherwise the readv_n_bytes() function is called, which reads the data directly from the network to the memory areas provided by the stub caller.
  • returning. The errno values is updated with the values inside the response and the return value of the remote system call is returned.

This is the typical structure of a client stub; there is only one stub that is a bit different, is the one for the ioctl system call. This stub adds a call to rscc_check_ioctl_request() before creating the request message, in this way is possible to know if the server supports the given ioctl request, if it doesn't support the request the stub return immediately with the value -1, otherwise the system call is remotized as usual; the value returned by the interrogation function is passed to the request creation and response management functions, so that they can, respectively, create a correct request message and interpret correctly the response message.

Server side

The server-side provides a unique function to menage all the client requests:

struct iovec *rscs_manage_request(int client_arch, void *request);

it manages two kinds of requests: the remote execution one and the ioctl request one; to know with kind of message is arrived, it use the req_type field contained into the header. In both cases the function doesn't do I/O operations, it returns the data to be sent to the caller and the latter will send them as it wishes. The following code shows the internal structure of struct iovec *rscs_manage_request(), for clarity has been removed the error management code:

struct iovec*rscs_manage_request(int client_arch, void *request) {
  struct iovec*ret_data;
  struct req_header *req_hd;
 
  /* I get header  and I convert the size to local byte order */
  req_hd = (struct req_header *)request;
  req_hd->req_size = ntohl(req_hd->req_size);

  /* I identify the kind of request */
  if( req_hd->req_type == RSC_IOCTL_REQ) {
    
    /* I search if the given ioctl request is supported by the server */
    ret_data = rscs_manage_ioctl_request((struct ioctl_req_header *)request);

  } else if( req_hd->req_type == RSC_SYS_REQ) {
    struct sys_req_header *req_hd;
    struct sys_resp_header *resp_hd;
    rscs_pre_exec pre_exec_f;
    int ret;
    rscs_exec exec_f;
    rscs_post_exec post_exec_f;

    req_hd = (struct sys_req_header *)request;

    /* I convert the __RSC_* constant into local byte order */
    req_hd->req_rsc_const = ntohs(req_hd->req_rsc_const);

    /* I get the three management functions for the given system call */
    pre_exec_f = rscs_pre_exec_table[req_hd->req_rsc_const];
    exec_f = rscs_exec_table[req_hd->req_rsc_const];
    post_exec_f = rscs_post_exec_table[req_hd->req_rsc_const];

    /* I execute the unmarshalling and I create the response message */
    resp_hd = pre_exec_f(request, client_arch);

    /* I execute the system call */
    ret = exec_f(request);

    /* I execute the marshalling of the response */
    ret_data = post_exec_f(request, resp_hd, ret, errno, client_arch);
  } else {

    /* Bad request type */
    ret_data = NULL;
  }

  /* I return the response to the caller */
  return ret_data;
}

The steps preformed by the function are:

  • identification of the message type. Thanks to the req_type field inside the request:
    • ioctl request interrogation. The function rscs_manage_ioctl_request() is called. It searches if the request inside the interrogation is supported by the server and builds the response with the right result.
    • system call remote execution request. The steps performed are:
      • identification of the system call to manage. The type of the system call is contained into the req_rsc_const header field, so it's converted to the local byte order.
      • identification of the specific management functions. As on client-side, also in the server-side there is a management function for each system call, with the only different that at server-side there are three functions: one for the unmarshalling of the request, one for the creation of the response and one the execution of the system call. A pointer to each one of these functions is stored in a specific array (respectively rscs_pre_exec_table, rscs_exec_table and rscs_post_exec_table) indexed by the __RSC_* constant.
      • request unmarshalling. This task is done by the function pre_exec_f(), it performs the following steps:
        • updates the input and input/output pointers to point to the data appended after the reqeust header. To do so, it use the function <system call name>_adjust_read_pointers().
        • allocates the space for the response massage. The memory area allocated includes the space for the header plus the space for the memory pointed by output or input/output pointers (if there are).
        • updates the the output and input/output pointers using the function <system call name>_adjust_write_pointers().

        The need to create, in this phase, the response message is an optimization choice. If client and server share the same architecture, the function <system call name>_adjust_write_pointers() updates the output and input/output pointers to point to the memory areas inside the response message just created; in this way, after the execution of the system call, the data written by the latter are just in the right place, ready to be sent. In the pointer is an input/output pointer is also necessary to copy the data from the request to the response so that the system call can read them. If the two architectures are different isn't possible to do this optimization but it's necessary to allocate a new memory area to store the output data, this because the LibAConv uses the response memory to put the converted data.

        When this function returns, the data inside the request are ready to be used and the response is created, a pointer to the latter is returned by the function

      • system call execution. This task is done by the exec_f() function. It calls the syscall() function, which takes in input the arguments contained inside the request message and the appropriate __NR_* constant. The latter is calculated from the __RSC_* constant and the server architecture. The return value of exec_f() is equal to the return value of the system call, the errno value is not explicitly returned but must be obtained from the global variable.
      • response marshalling. The function post_exec_f() takes care of that, what it does is:
        • fills the response header fields and convert them in network order.
        • if the two architectures are equal and there are output or input/output data, the latter have not to be marshalled (thanks to the <system call name>_adjust_write_pointers() function). Otherwise, if the two architectures are different, the LibAConv functions are called and the data are appended after the response header.

        The function returns a pointer to a struct iovec with only one element: the response message and its size.

The response message is returned by the rscs_manage_request() function to its caller. The use of a struct iovec permits to return a unique structure containing both the message to be sent and its size.

IDL and stub compiler

The RSC library is a particular implementation of the Remote Procedure Call model specialized in a particular set of functions: the system calls. There are many differences between the RSC e RPC, one of them concern the capability given to the user to remotized his own functions, thanks the use of two tools provided by the model: the Interface Description Language (IDL) and the stub compiler; the first permits to describe the function to remotize (its declaration, the data type of its arguments, their roles inside the functions, ....), the second take in input the interface description and generate different code, like the stub code. In this way the user can remotize his own functions in a very simple way. The RSC is different, it doesn't provide these tools to the user, the latter can remotize only the system calls supported by the library, if he needs to remotize other unsupported calls he need to work on the library code.

What have been said until now it's not totally true, RSC provide a small IDL and stub compiler that have been developed for the developer of the library and not for its users; in this way it was possible to generate automatically those parts of code repetitive and boring.

All the code of the stub is contained inside the librsc_templates/ directory: the program/ subdirectory contains the program code, the input/ subdirectory the input file used to generate the RSC library.

Usage

The stub compiler is rsc_file_gen.rb, it's a [Ruby] script, so can be invoked as:

[prompt]$ ./rsc_file_gen.rb
or
[prompt]$ ruby rsc_file_gen.rb

The stub compiler takes in input four arguments:

  1. the file containing the Interface Description of the system calls (librsc_templates/input/syscalls_rsc.list).
  2. the directory containing the four unistd.h header files, one for each supported architectures (librsc_templates/input/unistd_files/).
  3. the template directory, containing the templates files (librsc_templates/input/templates/).
  4. the output directory, it's used as base output directory where to save the generated code (librsc/).

For convenience the small makefile librsc_templates/Makefile has been written, so it's only necessary to execute it to call the stub compiler with the right input arguments. The Ruby script generates an output file for each template taken in input, the template specify where the output file must be saved respect of the base directory given in input. For each template, the compiler controls if the output file already exists, if it's so, the compiler saves the new copy only if it's different from the old one, otherwise it doesn't do anything. When the compiler saves a new copy of an output file, the old copy is maintained in the same directory but its name is changed in:

.<original name>_<seconds since epoch>_<day of month>-<month name>-<year>_<hour>-<minutes>

Note: the file in a hidden file (there is a dot as first character).

Interface Description Language (IDL)

The Interface Description Language (IDL) permits to define the informations needed by the stub compiler and templates about the system calls for which generate the code. Some of the system calls defined inside the syscalls_rsc.list file are shown below:

1. __RSC_access | const char *pathname{R}, int mode | unistd.h, sys/syscall.h
2. __RSC_chdir | const char *path{R} | unistd.h, sys/syscall.h
3. __RSC_chmod | const char *path{R}, mode_t mode | sys/types.h, sys/stat.h, sys/syscall.h
4. __RSC_getdents64 | unsigned int fd, struct dirent64 *dirp{W}[count], unsigned int count | unistd.h,  
   linux/types.h, linux/unistd.h, errno.h, sys/syscall.h, dirent.h
5. __RSC_getxattr | const char *path{R}, const char *name{R}, void *value{W}[size], size_t size | sys/types.h, 
   sys/syscall.h
6. __RSC_mount | const char *source{R}, const char *target{R}, const char *filesystemtype{R}, 
   unsigned long int mountflags, const void *data{R}=act_as_a_string= | sys/mount.h, sys/syscall.h
7. __RSC_read | int fd, void *buf{W}[count]<retval>, size_t count | unistd.h, sys/syscall.h
8. __RSC_accept | int sockfd, struct sockaddr *addr{W}[addrlen], socklen_t *addrlen{RW} | sys/types.h, 
   sys/socket.h, sys/syscall.h, linux/net.h
9. __RSC_bind | int sockfd, const struct sockaddr *my_addr{R}[addrlen], socklen_t addrlen | sys/types.h, 
   sys/socket.h, sys/syscall.h, linux/net.h

There must be a row for each system calls but in this list was necessary to split some long line to obtain a good paging, so the numbers at the beginning of each the line doesn't belong to the file but show the original number of lines.

The format of the file is the following:

  • one row for each system call.
  • blank rows are discarded by the compiler.
  • comment rows start with a # at the beginning of the line.

The system call row is divided into three groups separated by the pipe ("|") character, the meaning of the groups are:

  1. The __RSC_* constant representing the system call.
  2. The system call arguments declaration plus some special tags used to describe them.
  3. The headers files, separated by commas, defining the data types of the system call arguments.

The IDL defines some special tags to provide additional informations about the pointer arguments of the system call; these tags must be specified after the variable name and in the following order:

  • {R} or {W} or {RW} (MANDATORY). The pointer can be e read ({R}), write ({W}) or read/write ({RW}) pointer. An example is shown in the first line of the list, for the pathname argument.
  • After the first tag, there are two groups of different tags:
      • [pointed memory size variable] (MANDATORY for some type of pointers). For some pointer types the size of the pointed memory cannot be deduced by their type or because the type doesn't give this information (as the void *) or because the actual size is bigger or smaller than the one specified by the type (for example the size of the accept argument struct sockaddr *addr is given by the third variable addrlen). An example is given at line 8:
        struct sockaddr *addr{W}[addrlen]
        

        the tags specify that addr is a write pointer and that its size is not given by its type but by the value of the argument addrlen.

        For some other types as for example strings, this tag is useless because the length of a string can be obtained automatically using strlen() function.

      • <retval> (OPTIONAL). This tag tells to the stub compiler that the size of the pointed memory after the system call execution is given by its returned value. An example is given by line 7:
        void *buf{W}[count]<retval>
        

        the buf argument of the read system call is a read write pointer, its size is given by the count variable and its size after the system call execution is given by the returned value of the latter. This tag is used for optimization purpose, for example permits to send back only the retval bytes read instead of all the count bytes, if the first value is smaller than the second.

    1. =act_as_a_string= (OPTIONAL). It's used only for data pointer of the system call mount (line 6); this argument is a void pointer but it acts as a sting, so this tags tell to the stub compiler to manage this pointer as a char * instead of void *.

The templates

The templates are stored inside the librsc_templates/input/templates/ directory. The stub compiler searches inside this directory for files with extension ".c" or ".h" and use them as templates; so if you want to write a template you need to give these extension to the file.

A templates is C code mixed with Ruby one: the C code is taken as is and it's copied into the output file, instead the Ruby code is interpreted by the compiler. The concept of the template is very similar to a PHP page, where the HTML code is substituted by the C one and the PHP code with the Ruby one; as in the PHP page the HTML code is outputted as is, so happens for the C code into the template; as in the PHP page the PHP code is interpreted, so happens for the Ruby code.

The Ruby library provides a module called [ERB] which implements a generic Ruby parser; ERB was used as parser for the templates. ERB recognizes the following tags:

  • <% Ruby code %> inserts ruby code inside the template
  • <%= Ruby expression %> the Ruby expression is evaluated and it's value is outputted in the output file.
  • <%# comment %> permits to comment parts of Ruby code, this code isn't evaluated by ERB.

All the code outside these tags is outputted as is by the parser.

A well documented template is the client.c file which generates the librsc/src/rsc_client.c file.

A generic structure of a template is the following:

<% @@librsc_relative_path = "/src/" %>                      +
<% @@filename = "rsc_client.c" %>                           | 1
<% @@overwrite_existing_copy = true %>                      +

<% nr_all.each_umview do |syscall| %>                       +
--- C code ---                                              | 2
<% end %>                                                   +

The group 1 of lines defines some global variables used by the stub compiler to know if and where generate the output file, these variables are:

  • @@librsc_relative_path (MANDATORY): is the relative path where save the file. The path is relative to the output directory given as fourth argument to the compiler.
  • @@filename (MANDATORY): is the name of the outputted file. The name of the template can be different to the one of the outputted file.
  • @@overwrite_existing_copy (OPTIONAL): if it's true the stub compiler doesn't generate the outputted file if it finds an existing copy, even if this copy is different from the new one. The default value of the option is false so generally the outputted file overwrite the existing copy.

After the initialization lines, there is the second group of Ruby code. The stub compiler provides to each template an object of the class Syscall::ConstantList called nr_all; this class implements an array of system call objects called Syscall::Constant. Each of these objects describe a single system call. To create the array, the stub compiler parse the four unistd.h files and create an entry for each system call found, it removes the duplicate entries and then merge these informations with the one provided by the ID file; the result is a list of system calls where some of which are used by UMView (those are present into the ID file) and others not. The each_umview method is an iterator, it iterates over the system call used by UMView, discarding the others; the syscall variable inside the do-end block is the current element extracted by the iterator. Inside the block is possible to write the C code and use the syscall methods and attributes to generate the specific code for that system call.

The stub compiler

The stub compiler is the Ruby script written to generate automatically some RSC library files. It's name is rsc_file_gen.rb and it's contained inside the librsc_templates/program directory. This directory have the following structure:

  • Rakefile.rb: it's a Makefile written in Ruby.
  • rsc_file_gen.rb: it's the stub compiler.
  • /src: this directories contains three modules developed for the compiler: c.rb, file_parser.rb and syscall_constant.rb.
  • /tests: it contains the c_test.rb file which is a test file for the c.rb module.

The Rakefile.rb is a Makefile written in Ruby, the task supported are (to list them, type rake -T):

rake clean    # Remove any temporary products.
rake clobber  # Remove any generated file.
rake doc      # Create the library's documentation
rake test     # Execute the library tests

The default task is doc, so calling rake without any argument generates the module's documentation. To clean the documentation or run the module's tests type, respectively, rake clean or rake test. Running rake without arguments executes the doc task which builds the module's documentation starting from the comments inside the code, like javadoc. The documentation is generated inside the doc/ directory.

The tree modules developed for the stub compiler are:

  • c.rb: defines two classes C::Type and C::Argument. The first one defines a generic C type, and the second one a generic system call argument.
  • file_parser.rb: defines a single class called Parser, it permits to parse the unistd.h and ID files.
  • syscall_constant.rb: defines three classes Syscall::Arch, Syscall::Constant and Syscall::ConstantList. The first one is a very simple class representing a computer architecture; the second one represent a system call and contains all the useful informations needed by the templates; the last one represent a list of Syscall::Constant.

The structure of the stub compiler is very simple, it:

  • parses the input arguments
  • creates three Syscall::ConstantList parsing the three unistd.h files. These lists are nr_x86, nr_x86_64 and nr_ppc. The code to parse the 64bit PowerPC unistd.h must be inserted.
  • merges the three list of the previous point into nr_all
  • sets a back reference to nr_all into the three lists and sets the architecture of the latter
  • parses the ID file and saves the generated list into umview_rscs
  • flags the system call used by UMView inside the nr_all list
  • iterates over the template directory and for each template:
    • reads the template file, creates the ERB parser and parse the template. The result of this parsing is saved into the result variable.
    • controls if an old copy of the output file exists: in both cases saves the parsing result, but in the first one, before the saving operation, renames the existing file to backup it.

The RSC Module and Server

In this section will be presented the other two components of the project: the module and the server. They are described in a unique section because they aren't too complex and two independent sections aren't necessary.

The purpose of the two components is very similar, in fact both use the RSC library functions to provide the remote execution of the system calls: the module interface these functions with UMView, the server manages the different clients and the I/O operations for the library.

Before describing these two applications, in the next section will be described a common topic: the establishment of a connection and the initial handshake.

Connection establishment and handshaking

Module and server communicate thanks to two TCP connections established by the module, one connection for the the event subscription (port 8051) and one for the remaining traffic (port 8050). The choice of this protocol is born from the need to guarantee a minimum level of security for the correct delivery of the data; the RSC library is independent from the particular protocol used and so it doesn't provide any particular service for the packet retransmission, unordered delivery or duplication detection.

After the setup of the connections, a very small handshake is stared by the module to send its architecture to the server and receive the one of the latter. The message exchanged is defined into the file handshake.h saved inside both module and server directory; the structure of this message is:

struct handshake {
  enum arch arch;
};

it contains only one field representing the architecture, this field contains one of the four LibAConv constant. The module sends the handshake message with its architecture, then waits the handshake sent by the server. When the handshaking is completed, the module can initialize the RSC library.

The RSC module

The UMView module developed for this project resides into the module/ directory. The module file is um_rsc.c, the other files are:

  • handshake.h: defines the handshake data structure.
  • parse_arg.c: provides an argument parser for the module.
  • utils.c and utils.h: define and implement the I/O functions. These functions are the same used by the library and seen into the "Client side communication functions" section. They are redefined here because the RSC library doesn't provide these functions to the users, they are for internal use.

In this section will be described the implementation of the module interfaces to see how the module interface UMView with the RSC library. The main function is _um_mod_init() which is called by UMView during the loading of the module and it's responsible of the module initialization and of the filling of the struct service. The tasks done by this function are:

  • parsing of the module options. The options passed to the module are parsed thanks a small parser written expressly for the module. They are passed from UMView to the module as a unique string on which the informations must be extracted.
  • connection with the server and RSC library initialization. The function doing this is init_client(); it establishes the two connections with the server, obtains the host architectures, executes the handshaking and initialize the RSC library. The latter operation is done invoking the rscc_init() function.
  • initialization of the struct service. At this point the module is connected with the server and the RSC library is initialized, now is necessary to fill the struct service; the latter is the interface between the module and UMView, permits to the latter to access the services provided by the first. The initialization operation fill the fields:
    • name and code with the values "Remote System Call" and 0xF9; the group of values 0xFX is for experimental moduels.
    • checkfun with the function rsc_checkfun().
    • event_subscribe with the function rsc_event_subscribe().
    • syscall and socket with the stubs provided by the RSC library. It's necessary to allocate the memory for the two tables and fill them with the library functions.
    At the end of these operations, the function add_service() is called and the service structure is added to the UMView list.

The functions rsc_checkfun() and rsc_event_subscribe() although they have the prefix rsc_, they aren't library functions but are implemented inside the module. Their code is the following (for clarity the debug and error management code has been removed):

static long rsc_event_subscribe(void (* cb)(), void *arg, int fd, int how) {
  return rscc_es_send_req(reg_cbs, event_sub_fd, fd, how, cb, arg);
}

static epoch_t rsc_checkfun(int type, void *arg) {
  if( type == CHECKSOCKET) {
    return 1;
  } else if(type == CHECKPATH) {
    char *path = arg;
    return (strncmp(path, "/lib", 4) != 0 &&
        strncmp(path, "/bin", 4) != 0);
  } else if(type == CHECKIOCTLPARMS) {
    return rscc_check_ioctl_request(((struct ioctl_len_req *)arg)->req);
  } else {
    return 0;
  }
}

The rsc_event_subscribe() function manage the event subscription and its only task is to call the RSC library function rscc_es_send_req(); the value returned by the latter has the same format of the one required by UMView. The rsc_checkfun() function implements the choice function used by UMView to decide which module will manage the given system call. The module manages all the network call, in fact if type is equal to CHECKSOCKET the function return 1; if type is equal to CHECKPATH, the function manages all the system call working with a path which doesn't start with /lib or /bin/; finally if type is CHECKIOCTLPARMS, is called the library function rscc_check_ioctl_request() which interrogates the server for the support of the specific ioctl request.

Has been decided to not manage the system call which work with paths starting with /lib or /bin to not virtualize the programs positioned in the second directory; also the /lib directory isn't managed because it contains the dynamically loaded libraries used by these programs. This choice isn't due to a RSC library limit but generated by the will of run the local version of the program instead of the remote one, in fact if the module would manage the /bin path, the program executed would be the one into the remote bin directory; this would not be a problem because UMView can copy in the local host the remote programs and execute them locally, in this way also the remote programs are executed locally and their system calls remotized; this happens, for example, with the programs inside /usr/bin. These management choice has been done only to show how the module and UMView can manage the execution of both local and remote applications.

The server

The tasks of the server are to mange the different clients, to pass the data received to the RSC library functions and to send back the responses. Its development has not required some important or original choices, it's a classic server with a main loop and a poll call to manage the file descriptor in an unblocking way.

The server's files are located inside the server/ directory, the latter contains:

  • rsc_server.c: contains the server code.
  • gdebug.c: contains some debug functions used into the code.
  • handshake.h: defines the handshake data structure.
  • pollfd_info.c and pollfd_info.h: define the data structure used by the server to manage the different clients and their states. The files define also some functions to manage this structure.

At the start, the server parses the input options, executes some initialization operations and then enters in the main loop. In the initialization step, the server:

  • creates and configures the two listening sockets.
  • get its architecture.
  • initializes the RSC library and its event subscription service.
  • registers the ioctl requests that wants to support. The server support about seventy requests.
  • initializes the data structure used to store the client informations.

After this initialization step the server is ready to enter in the main loop; as said before, the server structure is the classic structure of a server using the select()/poll() system call. The server calls the poll() function and waits for its wakening, after that it controls if there are new connections or if there are data to read or write. The data read are passed to the RSC functions which return the data to be sent back, those data are stored by the server and send when the descriptors are ready.

The server uses non-blocking read()/write() so the incoming and outgoing data must be buffered until they are fully received or sent. The RSC library functions require that the input messages are complete, so the server must read it completely before pass it to them. The header of these messages has as first field a 32bit integer with the total dimension of the message, so the server needs to read only four bytes to know how long is the message to be read. After this message is read completely, it can be passed to the rscs_manage_request() or rscs_es_manage_msg() functions, depending on the connection where is coming. These two functions returns the response messages which are buffered by the server and sent when the descriptors are ready.

For the event subscription, if the immediate test done by the library fails, the server needs to monitor the file descriptor for the given event, in fact this operation is not done by the RSC library but is leaved to the server. The server, to know the test result, must inspect the response inside the acknowledgment message returned by the rscs_es_manage_msg(): if the response is negative the server, besides sending back the ack, must insert in its poll() the given file descriptor and monitor it for the given event; when the descriptor will be ready, the server can call the rscs_es_event_occurred() function and send back the response returned.

The data structure used to store the clients' informations is defined inside the header pollfd_info.h and is pollfd_info:

struct pollfd_info {
  struct pollfd *pollfd;
  /* The number of used entries into 'pollfd' and 'clients' */
  int nfds;
  /* The size of 'pollfd' and 'clients'*/
  int size;
  /* The i-th file descriptor in 'pollfd' belongs to the i-th client 
   * in 'clients' */
  struct client **clients;
};

The field pollfd contains the structure given in input to the poll() function; clients is an array of pointers to the data structure storing the client data, its definition is:

struct client {
  /* The file descriptor associated with the client */
  int fd;
  /* The client architecture */
  enum arch arch;
  /* The client type and state */
  enum client_type type;
  enum client_state state;
  struct buffer *rbuf; /* reading buffer */
  struct buffer *wbuf; /* writing buffer */
  int esfd_index;
};

The client architecture is one of the LibAConv constant, the type can assume one of the following values:

enum client_type {
  REQ_RESP = 1,
  EVENT_SUB,
  SUBSCRIBED_FD
};

There can be three types of client entry: the ioctl request or system call remote execution client (REQ_RESP), the event subscription client (EVENT_SUB) or the monitored file descriptor (SUBSCRIBED_FD). The state field describe the connection state of the client and can assume the following values:

enum client_state {
  WAITING_ARCH = 1,
  SENDING_ARCH,
  CONN_READING_HDR,
  CONN_READING_BODY,
  CONN_SENDING_RESP
};

the first two are used during the handshaking and the last three when reading and sending data. The read()/write() operations are non-blocking, so can be necessary to invoke them different times to send or receive all the message, for this reason is necessary to track the state of the file descriptor. The fields rbuf and wbuf are the read and write buffers:

struct buffer {
  struct msg *first;
  struct msg *last;
};
struct msg {
  void *data;    /* the pointer to data */
  unsigned int n; /* number of byte of data into the buffer */
  unsigned int tot; /* total number of bytes to read/write */
  struct msg *next;
};

They are implemented as a linked list, each element is a struct msg which contains the data and some informations as the data total size and the data already managed.

The pollfd_info module provides some functions to create the client and pollfd_info data structures, to add and remove data from the pollfd_info and to enqueue e dequeue data from the buffers.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox