Vde management protocol
A process can trace the system call generated by itself by purelibc.
Pure_libc converts glibc from a libc+system interfacing library into a libc-only library.
Pure_libc is not complete yet. Stdio has been implemented onto the fopencookie call. Due to current limitations of fopencookie, freopen may not work properly when reopening files different from std{in,out,err}.
Usage
This function:
fun _pure_start(sfun pure_syscall,sfun pure_socketcall,int flags);
starts the syscall tracing. All the system calls of the programs are converted into calls of the pure_syscall function. pure_socketcall is meaningful only for architectures where all the berkeley socket calls get sent to the kernel using one shared system call (__NR_socketcall) if pure_socketcall is not NULL, purelibc calls it for each Berkeley socket call. If pure_socketcall is NULL and __NR_socketcall is defined purelibc calls
pure_syscall(__NR_socketcall,socketcall_id,argv)
(purelibc mimics the same call received by the kernel).
FLAGS:
PUREFLAG_STDIN: PUREFLAG_STDOUT: PUREFLAG_STDERR: The standard streams gets opened by libc before purelibc starts. Without these flags stdio calls on standard streams will not be traced. (e.g. getchar, printf). These flags force _pure_start to reopen the stdio standard streams to trace the calls on them. PUREFLAG_STDALL is a shortcut for (PUREFLAG_STDIN|PUREFLAG_STDOUT|PUREFLAG_STDERR)
RETURN_VALUE:
_pure_start returns a pointer to the original libc syscall function. this function must be stored in a global variable and must be used to bypass purelibc and send a system call to the kernel.
WARNING:
libc 'syscall(2)' call itself gets diverted to the pure_syscall function, too.
The following test program prints the number of the system call before actually calling it (it is a 'cat' like stdin to stdout copy, when EOF is sent it prints "hello world"):
#define _GNU_SOURCE #include <stdio.h> #include <string.h> #include <stdarg.h> #include <sys/syscall.h> #include <unistd.h> #include <stdlib.h> #include <purelibc.h> static sfun _native_syscall; static char buf[128]; static long int mysc(long int sysno, ...){ va_list ap; long int a1,a2,a3,a4,a5,a6; va_start (ap, sysno); snprintf(buf,128,"SC=%d\n",sysno); _native_syscall(__NR_write,2,buf,strlen(buf)); a1=va_arg(ap,long int); a2=va_arg(ap,long int); a3=va_arg(ap,long int); a4=va_arg(ap,long int); a5=va_arg(ap,long int); a6=va_arg(ap,long int); va_end(ap); return _native_syscall(sysno,a1,a2,a3,a4,a5,a6); } main() { int c; _native_syscall=_pure_start(mysc,NULL,PUREFLAG_STDALL); while ((c=getchar()) != EOF) putchar(c); printf("hello world\n"); }
To run this example just compile it and link it together with the library in this way:
$ gcc -o puretest puretest.c -lpurelibc
if you installed purelibc library in /usr/local/lib you need to add this directory to the linker search path:
$ setenv LD_LIBRARY_PATH /usr/local/lib
Unfortunately if you load purelibc as a dynamic library by dlopen it does not work.
The following example solves the problem. More specifically:
- It is possible to use purelibc to track the calling process and all the dynamic libraries loaded at run time.
- The code does not depend on purelibc. If you run it on a host without purelibc, it will not be able to track its system calls but it works.
#define _GNU_SOURCE #include <stdio.h> #include <string.h> #include <stdarg.h> #include <sys/syscall.h> #include <unistd.h> #include <stdlib.h> #include <dlfcn.h> #include <purelibc.h> static sfun _native_syscall; static char buf[128]; static long int mysc(long int sysno, ...){ va_list ap; long int a1,a2,a3,a4,a5,a6; va_start (ap, sysno); snprintf(buf,128,"SC=%d\n",sysno); _native_syscall(__NR_write,2,buf,strlen(buf)); a1=va_arg(ap,long int); a2=va_arg(ap,long int); a3=va_arg(ap,long int); a4=va_arg(ap,long int); a5=va_arg(ap,long int); a6=va_arg(ap,long int); va_end(ap); return _native_syscall(sysno,a1,a2,a3,a4,a5,a6); } main(int argc,char *argv[]) { int c; sfun (*_pure_start_p)(); void *handle; /* does pure_libc exist ? */ if ((_pure_start_p=dlsym(RTLD_DEFAULT,"_pure_start")) == NULL && (handle=dlopen("libpurelibc.so",RTLD_LAZY))!=NULL) { char *path; dlclose(handle); /* get the executable from /proc */ asprintf(&path,"/proc/%d/exe",getpid()); /* preload the pure_libc library */ setenv("LD_PRELOAD","libpurelibc.so",1); printf("pure_libc dynamically loaded, exec again\n"); /* reload the executable */ execv(path,argv); /* useless cleanup */ free(path); } if ((_pure_start_p=dlsym(RTLD_DEFAULT,"_pure_start")) != NULL) { printf("pure_libc library found: syscall tracing allowed\n"); _native_syscall=_pure_start_p(mysc,NULL,PUREFLAG_STDALL); } while ((c=getchar()) != EOF) putchar(c); printf("hello world\n"); }
To run this example just compile it and link it with the dl library in this way:
$ gcc -o puretest2 puretest2.c -ldl
A simple virtualization based on PureLibc
PureLibc can be used to implement virtualization. View-OS uses it to virtualize the system calls generated by the modules and the libraries used by the modules. In this way *mview (the programs that actually implement View-OS) have an efficient implementation of module nesting.
The following minimal example shows a virtualization based on PureLibc. This source code virtualizes the file /etc/passwd, when loaded the file /etc/hosts will be given instead of the /etc/passwd. This is the file xchange.c
#define _GNU_SOURCE #include <stdio.h> #include <string.h> #include <stdarg.h> #include <sys/syscall.h> #include <unistd.h> #include <stdlib.h> #include <purelibc.h> #include <dlfcn.h> static sfun _native_syscall; static char hosts[]="/etc/hosts"; static char buf[128]; static long int mysc(long int sysno, ...){ va_list ap; long int a1,a2,a3,a4,a5,a6; va_start (ap, sysno); a1=va_arg(ap,long int); a2=va_arg(ap,long int); a3=va_arg(ap,long int); a4=va_arg(ap,long int); a5=va_arg(ap,long int); a6=va_arg(ap,long int); va_end(ap); if (sysno == __NR_open) { char *path=(char *)a1; if (a1 && strcmp(path,"/etc/passwd")==0) a1=(long int) hosts; } return _native_syscall(sysno,a1,a2,a3,a4,a5,a6); } void __attribute ((constructor)) init_test (void) { _native_syscall=_pure_start(mysc,NULL,PUREFLAG_STDALL); }
This source can be compiled in this way:
gcc -shared -o xchange.so xchange.c
This virtualizer can be tested by preloading it:
export LD_PRELOAD=libpurelibc.so:/tmp/xchange.so
Please change /tmp with the absolute path of the shared library of the virtualizer.
After the LD_PRELOAD the shell works as usual but when a process tries to open /etc/passwd it gets /etc/hosts instead. This behavior can be tested with commands like cat /etc/passwd or vi /etc/passwd. Note that cat < /etc/passwd prints the real file as it is the shell to open the file, in a subshell the virtualization applies also to this file opened by redirection.
FAQ on purelibc
- Q: Which are the applications of purelibc?
- A: There are several applications that need to track/virtualize the system call generated by a process. Purelibc if different from ptrace(2) as the tracking process is the process itself. In many cases this allows to run virtualization in a very effective and efficient way as there are no context switches between processes. Virtual Square designed purelibc to support nidification between virtualization modules. By purelibc it is possible to use a library designed to interact with the system, and create a wrap to use it for other purposes.
- Q: Why purelibc is a library based on glibc instead of a different (patched) version of glibc?
- A: purelibc captures function calls for glibc and gives alternative implementations based on glibc. The code needs a minor effort to keep it updated. A patch for glibc would have required a huge amount of work to keep it up to date with the mainstream glibc development.
- Q: Is purelibc computationally expensive? In other words, does purelibc slow down the execution of programs?
- A: It is the fastest virtualization you can imagine. The cost paid for virtualization is just a function call. The slow down of applications in usually negligible.
- Q: Is purelibc transparent to applications and libraries?
- A: It should be, it has been designed to be transparent. Unfortunately it is not complete and there are some limitations on some feature we have used to implement purelibc. It is the case of freopen. The call "fopen_cookie" we use to virtualize stdio does not provide a way to redefine open files. Current implementation of freopen is able to redefine std{in,out,err} files (which is the most common use of freopen), but when applied to other files purelibc's freopen creates a different file.
- Q: Could glibc provide a virtualization feature like purelibc?
- A: It could but currently it does not. If eventually glibc decides to provide a way to virtualize system calls generated by glibc itself and by the other libraries, purelibc will have no more reasons to exist. It is not too easy to integrate purelibc into glibc as all the glibc functions are tightly linked to the syscall-send-to-the-kernel function. This latter function is implemented in assembler and it is architecture dependent.