From Virtualsquare
Revision as of 19:25, 27 December 2012 by Renzo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Inter Process Networking (IPN)


IPN is an Inter Process Communication service. It uses the same programming interface and protocols used for networking. Processes using IPN are connected to a "network" (many to many communication). The messages or packets sent by a process on an IPN network can be delivered to many other processes connected to the same IPN network, potentially to all the other processes. Different protocols can be defined on the IPN service. The basic one is the broadcast (level 1) protocol: all the packets get received by all the processes but the sender. It is also possible to define more sophisticated protocols. For example it is possible to have IPN sockets dispatching packets using the Ethernet protocol (like a Virtual Distributed Ethernet - VDE switch), or Internet Protocol (like a layer 3 switch). These are just examples, several other policies can be defined.


The Berkeley socket Application Programming Interface (API) was designed for client server applications and for point-to-point communications. There is not a support for broadcasting/multicasting domains.

IPN updates the interface by introducing a new protocol family (PF_IPN or AF_IPN). PF_IPN is similar to PF_UNIX but for IPN the Socket API calls have a different (extended) behavior.

   #include <sys/socket.h>
   #include <sys/un.h>
   #include <sys/ipn.h>
   sockfd = socket(AF_IPN, int socket_type, int protocol);

creates a communication socket. The only socket_type defined is SOCK_RAW, other socket_types can be used for future extensions. A socket cannot be used to send or receive data until it gets connected (using the "connect" call). The protocol argument defines the policy used by the socket. Protocol IPN_BROADCAST (1) is the basic policy: a packet is sent to all the receipients but the sender itself. The policy IPN_ANY (0) can be used to connect or bind a pre-existing IPN network regardless of the policy used. (2 will be IPN_VDESWITCH and 3 IPN_VDESWITCHL3).

The address format is the same of PF_UNIX (a.k.a PF_LOCAL), see unix(7) manual.

   int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);

This call creates an IPN network if it does not exist, or join an existing network (just for management) if it already exists. The policy of the network must be consistent with the protocol argument of the "socket" call. A new network has the policy defined for the socket. "bind" or "connect" operations on existing networks fail if the policy of the socket is neither IPN_ANY nor the same of the network. (A network should not be created by a IPN_ANY socket). An IPN network appears in the file system as a unix socket. The execution permission (x) on this file is required for "bind' to succeed (otherwise -EPERM is returned). Similarly the read/write permissions (rw) permits the "connect" operation for reading (receiving) or writing (sending) packets respectively. When a socket is bound (but not connected) to a IPN network the process does not receive or send any data but it can call "ioctl" or "setsockopt" to configure the network.

  int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);

This call connects a socket to an existing IPN network. The socket can be already bound (through the "bind" call) or unbound. Unbound connected sockets receive and send data but they cannot configure the network. The read or write permission on the socket (rw) is required to "connect" the channel and read/write respectively. When "connect" succeeds and provided the socket has appropriate permissions, the process can sends packets and receives all the packets sent by other processes and delivered to it by the network policy. The socket can receive data at any time (like a network interface) so the process must be able to handle incoming data (using select/poll or multithreading). Obviously higher lever protocols can also prevent the reception of unexpected messages by design. It is the case of networks used with with exactly one sender, all the other processes can simply receive the data and the sender will never receive any packet. It is also possible to have sockets with different roles assigning reading permission to some and writing permissions to others. If data overrun occurs there can be data loss or the sender can be blocked depending on the policy of the socket (LOSSY or LOSSLESS, see over). Bind must be called before connect. The correct sequences are: socket+bind: just for management, socket+bind+connect: management and communication. socket+connect: communication without management).

The calls "accept" and "listen" are not defined for AF_IPN, as there is not any server. All the communication takes place among peers.

Data can be sent and received using read, write, send, recv, sendto, recvfrom, sendmsg, recvmsg.

Socket options and flags.

These options can be set by getsockopt and setsockopt.

There are two different kinds of options: network options and node options. The formers define the structure of the network and must be set prior to bind. It is not currently possible to change this flag of an existing network. When a socket is bound and/or connected to an existing network getsockopt gives the current value of the options. Node options define parameters of the node. These must be set prior to connect.

Network Options

(These options can be set prior to bind/connect)

IPN_SO_FLAGS: This tag permits to set/get the network flags.

  • IPN_FLAG_LOSSLESS: this flag defines the behavior in case of network overloading or data overrun, i.e. when some process are too slow in consuming the packets for the network buffers. When the network is LOSSY (the flag is cleared) packets get dropped in case of buffer overflow. A LOSSLESS (flag set) IPN network blocks the sender if the buffer is full. LOSSY is the default behavior.

IPN_SO_NUMNODES: max number of connected sockets (default value 32)

IPN_SO_MTU: maximum transfer unit: maximum size of packets (default value 1514, Ethernet frame, including VLAN).

IPN_SO_MSGPOOLSIZE: size of the buffer (#of pending packets, default value 8). This option has two different meanings depending on the LOSSY/LOSSLESS behavior of the network. For LOSSY networks, this is the maximum number of pending packets of each node. For LOSSLESS network this is the global number of the pending packets in the network. When the same packet is sent to many destinations it is counted just once.

IPN_SO_MODE: this option specifies the permission to use when the socket gets created on the file system. It is modified by the process' umask in the usual way. The created socket permission are (mode & ~umask).

(Options for bound/connected sockets)

IPN_SO_CHANGE_NUMNODES: (runtime) change of the number of ipn network ports.

Node Options

IPN_SO_PORT: (default value IPN_PORTNO_ANY) This option specify the port number where the socket must be connected. When IPN_PORTNO_ANY the port number is decided by the service. There can be network services where different ports have different definitions (e.g. different VLANs for ports of virtual Ethernet switches).

IPN_SO_DESCR: This is the description of the node. It is a string, having maxlength IPN_DESCRLEN. It is just used by debugging tools.

IPN_SO_HANDLE_OOB: The node is able to manage Out Of Band protocol messages

IPN_SO_WANT_OOB_NUMNODES: The socket wants OOB messages to notify the change of #writers #readers (requires IPN_SO_HANDLE_OOB)

TAP and GRAB nodes for IPN networks

It is possible to connect IPN sockets to virtual and real network interfaces using specific ioctl and provided the user has the permission to configure the network (e.g. the CAP_NET_ADMIN Posix capability). A virtual interface connected to an IPN network is similar to a tap interface (provided by the tuntap module). A tap interface appears as an ethernet interface to the hosting operating system, all the packets sent and received through the tap interface get received and sent by the application which created the tap interface. IPN virtual network interface appears in the same way but the packets are received and sent through the IPN network and delivered consistently with the policy (BROADCAST acts as a basic HUB for the connected processes). It is also possible to *grab* a real interface. In this case the closest example is the Linux kernel ethernet bridge. When a real interface is connected to a IPN all the packets received from the real network are injected also into the IPN and all the packets sent by the IPN through the real network 'port' get sent on the real network.

ioctl is used for creation or control of TAP or GRAB interfaces.

    int ioctl(int d, int request, .../* arg */);

A list of the request values currently supported follows.

  • IPN_CONN_NETDEV: (struct ifreq *arg). This call creates a TAP interface or implements a GRAB on an existing interface and connects it to a bound IPN socket. The field ifr_flags can be IPN_NODEFLAG_TAP for a TAP interface, IPN_NODEFLAG_GRAB to grab an existing interface. The field ifr_name is the desired name for the new TAP interface or is the name of the interface to grab (e.g. eth0). For TAP interfaces, ifr_name can be an empty string. The interface in this latter case is named ipn followed by a number (e.g. ipn0, ipn1, ...). This ioctl must be used on a bound but unconnected socket. When the call succeeds, the socket gets the connected status, but the packets are sent and received through the interface. Persistence applies only to interface nodes (TAP or GRAB).
  • IPN_SETPERSIST (int arg). If (arg != 0) it gives the interface the persistent status: the network interface survives and stay connected to the IPN network when the socket is closed. When (arg == 0) the standard behavior is resumed: the interface is deleted or the grabbing is terminated when the socket is closed.
  • IPN_JOIN_NETDEV: (struct ifreq *arg). This call reconnects a socket to an existing persistent node. The interface can be defined either by name (ifr_name) or by index (ifr_index). If there is already a socket controlling the interface this call fails (EADDRNOTAVAIL).

There are also some ioctl that can be used by a sysadm to give/clear persistence on existing IPN interfaces. These calls apply to unbound sockets.

  • IPN_SETPERSIST_NETDEV: (struct ifreq *arg). This call sets the persistence status of an IPN interface. The interface can be defined either by name (ifr_name) or by index (ifr_index).
  • IPN_CLRPERSIST_NETDEV: (struct ifreq *arg). This call clears the persistence status of an IPN interface. The interface is specified as in the opposite call above. The interface is deleted (TAP) or the grabbing is terminated when the socket is closed, or immediately if the interface is not controlled by a socket. If the IPN network had the interface as its sole node, the IPN network is terminated, too.

When unloading the ipn kernel module, all the persistent flags of interfaces are cleared.

IPN networks connected to Character Devices

(on SVN. Jul. 2, 2009) It is also possible to define character devices connected to IPN networks. When a process opens a character device connected to an IPN network, read and write operations on the device operate like receive or send operations on a socket. Protocol submodules can identify when the network nodes communicate by device, and also which is the device involved. In this way protocol submodules provide different services on specific devices.

IPN uses ioctl to define/undefine/configure a character device or a range of character devices connected to an IPN network.

The ioctl tag IPN_REGISTER_CHRDEV can be used to define or allocate one or more character devices. The argument of IPN_REGISTER_CHRDEV is a pointer to a structure chrdevreq:

     struct chrdevreq {
       unsigned int major;
       unsigned int minor;
       int count;
       char name[64];

Major and minor identify the device or the first device of the range. If major is zero IPN dynamically allocate a major number for the device (the field is updated by ioctl, it is possible to read the assigned major number after the call). When major is nonzero IPN register that device, an error occurs in case the major is already registered by another device. count is the number of devices requested: IPN assigns a range of minor numbers from minor to minor+count-1. name is the name of the device. IPN_REGISTER_CHRDEV works on a IPN socket already bound to an IPN network, requires the CAP_MKNOD capability, defines the sysfs nodes, and it is compatible with udev.

IPN_UNREGISTER_CHRDEV is the tag for unregistering a device range, it has no arguments, the device or the device range of the IPNN (must run on a bound socket) get released. It is not possible to unregister just some of the devices, all the allocated range must be unregistered as a whole.

Normally when the last process of an IPN network close the socket (or the descriptor related to a device of the IPN), the network gets deleted. It is possible to define an IPN network (with devices) to be "persistent": the IPN network will survive even when no processes are connected. IPN_CHRDEV_PERSIST is the tag that allow to set/unset the "persistency" of an IPN network, it requires an "int" argument: the network becomes persistent if the argument is non zero, non-persistent otherwise.

IPN_JOIN_CHRDEV is the ioctl tag to bind the IPN network associated with a character device. IPN_JOIN_CHRDEV works on a unbound IPN socket bound and requires the CAP_MKNOD capability. The argument for IPN_JOIN_CHRDEV is a pointer to a struct chrdevreq. It is the only way to change the persistence of a IPN network when the socket has been removed from the file system.

Related Work.

IPN is able to give a unifying solution to several problems and creates new opportunities for applications. See also:

Several existing tools can be implemented using IPN sockets:

  • VDE. Level 2 service implements a VDE switch in the kernel, providing a considerable speedup.
  • Tap (tuntap) networking for virtual machines
  • Kernel ethernet bridge
  • All the applications which need multicasting of data streams, like tee, jack.

A continuous stream of data (like audio/video/midi etc) can be sent on an IPN network and several application can receive the broadcast just by joining the channel.

It is possible to write programs that forward packets between different IPN networks running on the same or on different systems extending the IPN in the same way as cables extend ethernet networks connecting switches or hubs together. (VDE cables are examples of such a kind of programs).

Personal tools