Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

Category: Programming
Author: W. Richard Stevens, Bill Fenner, Andrew M. Rudoff
4.7
All Stack Overflow 41
This Year Stack Overflow 7
This Month Stack Overflow 8

Comments

by anonymous   2019-07-21

1) I'm not sure why you wouldn't want multiple "threads" instead of "processes".

But should you require a pool of worker processes (vs. "worker threads"), then I would recommend:

2) The master process binds, listens ... and accepts all incoming connections

3) Use a "Unix socket" to pass the accepted connection from the master process to the worker process.

4) As far as "synchronization" - easy. The worker simply blocks reading the Unix socket until there's a new file descriptor for it to start using.

5) You can set up a shared memory block for the worker to communicate "busy/free" status to the master.

Here's a discussion of using a "Unix domain socket":

  • Sending file descriptor over UNIX domain socket, and select()

  • http://www.lst.de/~okir/blackhats/node121.html

Stevens "Network Programming" is also an excellent resource:

  • http://www.amazon.com/Unix-Network-Programming-Volume-Networking/dp/0131411551/
by anonymous   2019-07-21

This may not be the specific answer you're looking for, but it may be the place you could find it. I'm currently reading:

Unix Network Programming, Volume 1: The Sockets Networking API, 3rd Edition

And there are a lot of examples of multi-threaded, non-blocking servers and clients. Also, they explain a lot of the reasoning and trade-offs that go between the different methods.

Hope that helps...

by anonymous   2019-07-21

A socket establishes an "endpoint", which consists of an IP address and a port:

  • http://www.gsp.com/cgi-bin/man.cgi?topic=socket

Yes, a single socket is specific to a single host/port combination.

READING RECOMMENDATION:

Beej's Guide to Network Programming:

  • http://beej.us/guide/bgnet/

Unix Network Programming: Stevens et al:

  • http://www.amazon.com/Unix-Network-Programming-Volume-Networking/dp/0131411551
by anonymous   2019-07-21

If you're looking to write a socket server, a good starting point is Dan Kegel's C10k article from a few years back:

http://www.kegel.com/c10k.html

I also found Beej's Guide to Network Programming to be pretty handy:

http://beej.us/guide/bgnet/

Finally, if you need a great reference, there's UNIX Network Programming by W. Richard Stevens et. al.:

http://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551/ref=dp_ob_title_bk

Anyway, to answer your question, the main difference between Apache and Nginx is that Apache uses one thread per client with blocking I/O, whereas Nginx is single-threaded with non-blocking I/O. Apache's worker pool does reduce the overhead of starting and destorying processes, but it still makes the CPU switch between several threads when serving multiple clients. Nginx, on the other hand, handles all requests in one thread. When one request needs to make a network request (say, to a backend), Nginx attaches a callback to the backend request and then works on another active client request. In practice, this means it returns to the event loop (epoll, kqueue, or select) and asks for file descriptors that have something to report. Note that the system call in main event loop is actually a blocking operation, because there's nothing to do until one of the file descriptors is ready for reading or writing.

So that's the main reason Nginx and Tornado are efficient at serving many simultaneous clients: there's only ever one process (thus saving RAM) and only one thread (thus saving CPU from context switches). As for epoll, it's just a more efficient version of select. If there are N open file descriptors (sockets), it lets you pick out the ones ready for reading in O(1) instead of O(N) time. In fact, Nginx can use select instead of epoll if you compile it with the --with-select_module option, and I bet it will still be more efficient than Apache. I'm not as familiar with Apache internals, but a quick grep shows it does use select and epoll -- probably when the server is listening to multiple ports/interfaces, or if it does simultaneous backend requests for a single client.

Incidentally, I got started with this stuff trying to write a basic socket server and wanted to figure out how Nginx was so freaking efficient. After poring through the Nginx source code and reading those guides/books I linked to above, I discovered it'd be easier to write Nginx modules instead of my own server. Thus was born the now-semi-legendary Emiller's Guide to Nginx Module Development:

http://www.evanmiller.org/nginx-modules-guide.html

(Warning: the Guide was written against Nginx 0.5-0.6 and APIs may have changed.) If you're doing anything with HTTP, I'd say give Nginx a shot because it's worked out all the hairy details of dealing with stupid clients. For example, the small socket server that I wrote for fun worked great with all clients -- except Safari, and I never figured out why. Even for other protocols, Nginx might be the right way to go; the eventing is pretty well abstracted from the protocols, which is why it can proxy HTTP as well as IMAP. The Nginx code base is extremely well-organized and very well-written, with one exception that bears mentioning. I wouldn't follow its lead when it comes to hand-rolling a protocol parser; instead, use a parser generator. I've written some stuff about using a parser generator (Ragel) with Nginx here:

http://www.evanmiller.org/nginx-modules-guide-advanced.html#parsing

All of this was probably more information than you wanted, but hopefully you'll find some of it useful.

by anonymous   2019-07-21

I think what you're attempt is acheivable but certainly difficult and time consuming to do if you're starting from zero. I'm not an expert in network programming but I know enough to appreciate the complications that would be encountered.

If you'd like some background information on network programming then I would recommend the book Unix Network Programming even if you're not familiar with UNIX. Although it's very technical and details C code it gives a very good background into sockets etc, as it is from UNIX that came the idea of sockets.

If I were doing something similar then I'd be tempted to try and look at some open-source projects such as listed here or even the Yahoo Messenger SDK. Googling 'open source p2p' might give some pointers too though I imagine a lot of it is in C/C++.

by sureaboutthis   2018-10-04
Advanced Programming in the Unix Environment - Stevens, Rago [0]

Unix Network Programming - Stevens [1]

[0] https://www.amazon.com/Advanced-Programming-UNIX-Environment...

[1] https://www.amazon.com/Unix-Network-Programming-Sockets-Netw...

by anonymous   2017-08-20

There is a Unix domain socket-based mechanism for transferring file descriptors (such as sockets - which cannot be memory mapped, of course) between processes - using the sendmsg() system call.

You can find more in Stevens (as mentioned by Curt Sampson), and also at Wikipedia.

You can find a much more recent question with working code at Sending file descriptor by Linux socket.

by anonymous   2017-08-20

I suggest reading this post and then deciding for yourself based on your own knowledge of your code whether or not it's OK to ignore the ECONNRESET. It sounds like your Node app may be trying to write to the closed connection (heartbeats being sent?). Proper closing of the connection from your C# app would probably take care of this, but I have no knowledge of C#.

You may have a problem if you get lots of users and if ECONNRESET causes the connection to go into TIME_WAIT. That will tie up the port for 1-2 minutes. You would use netstat on Linux to look for that but I'm sure there is an equivalent Windows app.

If you really want to get into the nitty gritty of socket communications I suggest the excellent Unix Network Programming, by Stevens.

by anonymous   2017-08-20

Linux is the most accessible and has the most mature desktop functionality. BSD (in its various flavours) has less userspace baggage and would be easier to understand at a fundamental level. In this regard it is more like a traditional Unix than a modern Linux distribution. Some might view this as a good thing (and from certain perspectives it is) but will be more alien to someone familiar with Windows.

The main desktop distributions are Ubuntu and Fedora. These are both capable systems but differ somewhat in their userspace architecture The tooling for the desktop environment and default configuration for system security works a bit differently on Ubuntu than it does on most other Linux or Unix flavours but this is of little relevance to development. From a user perspective either of these would be a good start.

From a the perspective of a developer, all modern flavours of Unix and Linux are very similar and share essentially the same developer tool chain. If you want to learn about the system from a programmer's perspective there is relatively little to choose.

Most unix programming can be accomplished quite effectively with a programmer's editor such as vim or emacs, both of which come in text mode and windowing flavours. These editors are very powerful and have rather quirky user interfaces - the user interfaces are ususual but contribute significantly to the power of the tools. If you are not comfortable with these tools, this posting discusses several other editors that offer a user experience closer to common Windows tooling.

There are several IDEs such as Eclipse that might be of more interest to someone coming off Windows/Visual Studio.

Some postings on Stackoverflow that discuss linux/unix resources are:

  • What are good linux-unix books for an advancing user

  • What are some good resources for learning C beyond K&R

  • Resources for learning C program design

If you have the time and want to do a real tour of the nuts and bolts Linux From Scratch is a tutorial that goes through building a linux installation by hand. This is quite a good way to learn in depth.

For programming, get a feel for C/unix from K&R and some of the resources mentioned in the questions linked above. The equivalent of Petzold, Prosise and Richter in the Unix world are W Richard Stevens' Advanced Programming in the Unix Environment and Unix Network Programming vol. 1 and 2.

Learning one of the dynamic languages such as Perl or Python if you are not already familiar with these is also a useful thing to do. As a bonus you can get good Windows ports of both the above from Activestate which means that these skills are useful on both platforms.

If you're into C++ take a look at QT. This is arguably the best cross-platform GUI toolkit on the market and (again) has the benefit of a skill set and tool chain that is transferrable back into Windows. There are also several good books on the subject and (as a bonus) it also works well with Python.

Finally, Cygwin is a unix emulation layer that runs on Windows and gives substantially unix-like environment. Architecturally, Cygwin is a port of glibc and the crt (the GNU tool chain's base libraries) as an adaptor on top of Win32. This emulation layer makes it easy to port unix/linux apps onto Cygwin. The platform comes with a pretty complete set of software - essentially a full linux distribution hosted on a Windows kernel. It allows you to work in a unix-like way on Windows without having to maintain a separate operating system installations. If you don't want to run VMs, multiple boots or multiple PCs it may be a way of easing into unix.

by anonymous   2017-08-20

Edited because I had to leave a meeting when I originally submitted this, but wanted to complete the information

Half of that material is learning about development in a Unix-like environment, and for that, I'd recommend a book since it's tougher to filter out useful information from the start.

I'd urge you to go to a bookstore and browse through these books:

Additionally, you will want to learn about ldd, which is like dependency walker in Windows. It lists a target binary's dependencies, if it has any.

And for Debugging, check out this StackOverflow thread which talks about a well written GDB tutorial and also links to an IBM guide.

Happy reading.

by anonymous   2017-08-20

Your question requires more than just a stack overflow question. You can find good ideas in these book:

Basically what you're trying to do is a reactor. You can find open source library implementing this pattern. For instance:

  • http://www.cs.wustl.edu/~schmidt/ACE.html
  • http://pocoproject.org/

If you want yout handler to have the possibility to do more processing you could give them a reference to your TCPServer and a way to register a socket for the following events:

  • read, the socket is ready for read
  • write, the socket is ready for write
  • accept, the listening socket is ready to accept (read with select)
  • close, the socket is closed
  • timeout, the time given to wait for the next event expired (select allow to specify a timeout)

So that the handler can implement all kinds of protocols half-duplex or full-duplex:

  • In your example there is no way for a handler to answer the received message. This is the role of the write event to let a handler knows when it can send on the socket.
  • The same is true for the read event. It should not be in your main loop but in the socket read handler.
  • You may also want to add the possibility to register a handler for an event with a timeout so that you can implement timers and drop idle connections.

This leads to some problems:

  • Your handler will have to implement a state-machine to react to the network events and update the events it wants to receive.
  • You handler may want to create and connect new sockets (think about a Web proxy server, an IRC client with DCC, an FTP server, and so on...). For this to work it must have the possibility to create a socket and to register it in your main loop. This means the handler may now receive callbacks for one of the two sockets and there should be a parameter telling the callback which socket it is. Or you will have to implement a handler for each socket and they will comunnicate with a queue of messages. The queue is needed because the readiness of one socket is independent of the readiness of the other. And you may read something on one and not being ready to send it on the other.
  • You will have to manage the timeouts specified by each handlers which may be different. You may end up with a priority queue for timeouts

As you see this is no simple problem. You may want to reduce the genericity of your framework to simplify its design. (for instance handling only half-duplex protocols like simple HTTP)