Netlink overview and its strace parsers

Mon 12 September 2016 by Saruta

This project is part of the Google Summer of Code 2016. I strongly advise you to take a look at this program.

Background

This API appared in Linux 2.0 (1996) as an IOCTL but was rewritten for Linux 2.2 (1998) as a socket address family. So, Netlink is almost 20 years old.

The original API was made by Alexey Kuznetsov.

But what is Netlink?

Netlink socket family is a Linux kernel interface used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets. (wikipedia)

The main protocols we can found are:

  • NETLINK_ROUTE: for manipulating the route subsystem
  • NETLINK_SOCK_DIAG: for debugging sockets
  • NETLINK_NETFILTER: for netfilter (current Linux firewall)
  • NETLINK_GENERIC: a generic protocol API

Some famous programs use these protocols:

  • iproute2: NETLINK_ROUTE
  • iptables: NETLINK_NETFILTER
  • strace: NETLINK_SOCK_DIAG
  • ...

Userland API

Because some code is better than a long text, here is a basic SOCK_DIAG request:

int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_SOCK_DIAG);
struct {
    struct nlmsghdr nlh;
    struct netlink_diag_req ndr;
} req = {
    .nlh = {
        .nlmsg_len = sizeof(req),
        .nlmsg_type = SOCK_DIAG_BY_FAMILY,
        .nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST
    },
    .ndr = {
        .sdiag_family = AF_NETLINK,
        .sdiag_protocol = NDIAG_PROTO_ALL,
        .ndiag_show = NDIAG_SHOW_MEMINFO
    }
};
sendto(fd, &req, sizeof(req), MSG_DONTWAIT, NULL, 0);

Work in strace

Our goal is to extend the parsing of the following syscalls: send, sendto, recv, recvfrom, sendmsg, sendmmsg, recvmsg and recvmmsg.

Get the address family

The first step for decoding Netlink is to identify that it is actually a netlink message. For that, the address family is needed.

send/sendm/recv/recvm msg syscalls

In this case, the address family is contained in the sockaddr which is the first argument of a msghdr (second parameter of these syscalls):

/* from bits/socket.h */
struct msghdr {
    void *msg_name;
    socklen_t msg_namelen;
    [...]
}
/* from linux/netlink.h */
struct sockaddr_nl {
    __kernel_sa_family_t nl_family;
    [...]
}

sendto/recvfrom syscalls

But the address family is not directly accessible with sendto and recvfrom. A quick solution to fix this problem is to use the extended attributes:

getxattr("/proc/{PID}/fd/{FD}", "system.sockprotoname", buf, bufsize - 1);

Handle multi-part messages

To decode Netlink, all the parts of the messages have to be decoded because most messages are multi-parts.

Here is a basic layout of a message:

netlink_format

First draft of the parser

Here is a first draft of the parser's result:

recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000},
    msg_namelen=12, msg_iov=[{[{{len=76, type=0x14 /* NLMSG_??? */,
    flags=NLM_F_MULTI, seq=1468324219, pid=31831}, "\2\10\200\376\1"...},
    {{len=88, type=0x14 /* NLMSG_??? */, flags=NLM_F_MULTI, seq=1468324219,
    pid=31831}, "\2\30\200\0\2"...}], 32768}], msg_iovlen=1,
    msg_controllen=0, msg_flags=0}, 0) = 164

Okay, the result was very ugly. Now let's find the protocol to get more information.

Now, let's get the protocol

I had two ideas:

  • Keep a table with the socket syscall of pair fd/protocol like in other projects
  • Parse /proc/net/netlink and keep a cache

But there is a very simple idea...

A netlink SOCK_DIAG request can be made.

NETLINK_SOCK_DIAG is a protocol divided in 4 kind of request, one for each address family that can be debugged: INET, UNIX, NETLINK and PACKET. Here, one wants the protocols of the netlink sockets. So, NETLINK family can be used to reduce the amount of data.

Here is the layout of a basic netlink request

/* from linux/netlink_diag.h */

struct netlink_diag_req {
    __u8 sdiag_family;
    __u8 sdiag_protocol;
    [...]
}

struct netlink_diag_msg {
    __u8 ndiag_family;
    __u8 ndiag_type;
    __u8 ndiag_protocol;
    [...]
}

Problem: as SOCK_DIAG_BY_INODE request is still unimplemented, the kernel answers a message for each inode contained in the family requested.

Solution: Maintain a cache (in socketutils.c)

Extend the basic parser

So now, the parser can be extended.

  • The type can be decoded. In netlink, the type < NLMSG_MIN_TYPE are reserved control messages but type >= NLMSG_MIN_TYPE are protocol dependent types.
  • The flags can be decoded (GET and NEW request). Flags can be either GET or NEW. This information is given by the type of the message.
  • Call a netlink protocol parser if it exists or printstr the buffer.

Here is an output example:

recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000},
    msg_namelen=12, msg_iov=[{[{{len=76, type=RTM_NEWADDR,
    flags=NLM_F_MULTI, seq=1468326943, pid=11443}, "\2\10\200\376\1"...},
    {{len=88, type=RTM_NEWADDR, flags=NLM_F_MULTI, seq=1468326943,
    pid=11443}, "\2\30\200\0\2"...}], 32768}], msg_iovlen=1,
    msg_controllen=0, msg_flags=0}, 0) = 164

The output is a little bit better but it's not that beautiful...

Attribute format

To explain the global format of a Netlink message, the best idea is to show an image.

nlmsg

Route parser

Here is the global format of a rtmsg:

rtnl_format

Output:

recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000},
    msg_namelen=12, msg_iov=[{{len=88, type=RTM_NEWADDR, flags=NLM_F_MULTI,
    seq=1468325023, pid=1411}, {ifa_family=AF_INET, ifa_prefixlen=24,
    ifa_flags=IFA_F_PERMANENT, ifa_scope=0, ifa_index=2},
    ifa_address="192.168.103.171", ifa_local="192.168.103.171",
    ifa_broadcast="192.168.103.255", ifa_label="enp3s0",
    ifa_flags=IFA_F_PERMANENT,
    ifa_cacheinfo={ifa_prefered=INFINITY_LIFE_TIME,
    ifa_valid=INFINITY_LIFE_TIME, cstamp=14650061, tstamp=14650061}},
    32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 164

This is much better!

Sock_diag parser

Let's do the same for the Socket diag protocol:

sendto(3<NETLINK:[SOCK_DIAG:23038]>, {{len=36, type=SOCK_DIAG_BY_FAMILY,
    flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {sdiag_family=AF_NETLINK,
    sdiag_protocol=NDIAG_PROTO_ALL, ndiag_ino=0,
    ndiag_show=NDIAG_SHOW_MEMINFO, ndiag_cookie={0, 0}}}, 36, MSG_DONTWAIT,
    NULL, 0) = 36

The <NETLINK:[SOCK_DIAG:23038]> is a new feature in strace accessed with -yy which adds debug to file descriptors.

Conclusion

And now ?

  • Try to handle a maximum of netlink attributes. (400 - 500 for route).
  • Handle nested attributes.
  • Handle not binded socket.
  • Handle ancillary data at msg level (cmsg).

Repo: github


Comments