If you are building it yourself, pick a signaling protocol first. SIP would be the easiest as it can be done peer to peer and you can get the RFCs pretty easily at http://www.ietf.org. I would assume the computer has some sort of DSP resources to take the voice samples and wrap them with ip, rtp and udp headers.
You can find some sip source code based on linux systems at http://www.vovida.org.
You will have to read the rfc on rtp for rtp stuff.