The elevator pitch for PlanMixPlay is a more engaging way of delivering an audio and video performance. More than 8 years in the making, PlanMixPlay is still in development and lining up to host a third prototype party in Berlin, Germany, the prior two taking place in Melbourne, Australia and Tokyo, Japan.

A significant source of inspiration for PlanMixPlay is 2manydjs and their under the covers tour. To this day, it remains a watershed moment in live audio/video performances to me. However, at the time of writing, these types of interconnected audio video performances remain the exception rather than the rule.

PlanMixPlay aims to try and pull impressive visual performances off the exclusive festival sized DJ dining table and on to the weekend club circuit.

Furthermore, it serves as an experimental platform to try out novel live interaction patterns and strongly connected audio-video output.

Publication and resources

Quick Facts

  • 8 Year on-going project
  • Live prototype parties
    • 14-06-2014 – Tokyo, Japan
    • 20-08-2022 – Melbourne, Australia
    • TBD – Berlin, Germany
  • Largest single-handedly designed application consisting of:
    • GUI library (soon to be replaced)
    • Gesture detection engine
    • Custom real-time audio engine (leveraging BASS)
    • Custom real-time visuals rendering engine
  • Built with modern C++11 – C++17
  • Used in over 85 live performances
  • Variable immediacy
  • Has its own website.

Connecting audio and video

Early on-site studies and interviews revealed that DJs and VJs rarely – if ever – communicate prior to a performance, in a weekend club setting. In the rare circumstance where they do communicate, it’s typically to ensure a thematic overlap, as opposed to on-going synchronization of audio and video. Furthermore, in numerous clubs DJs and VJs are placed far apart even prohibiting easy communication during a performance.

Several studies have investigated the effects of non-synchronized audio and video. Humans vary in perceiving how out of sync visual and audio elements are, depending on the audio/video content as well as whether the audio is too early, or too late, in relation to the video. In general, perceptions of overall quality improve the more synchronized audio and video are, effectively causing the most stimuli. In other words, well synchronized audio and video is always preferable and will stimulate audiences the most.

A high benchmark for a DJ performance with audio synchronized visuals is the ‘Under the Covers’ live tour performed by 2manydjs. Having contracted visual artists to animate the cover art for a wide music selection, 2manydjs used a set of Pioneer DVJ-1000’s to perform fully synchronized audio video shows. The result is a striking and memorable performance that have few peers.

The DVJ-1000’s enabled 2manydjs to playback audio with attached video files, with the same ease as standard CDJ’s allow a DJ to play and mix music. In other words, 2manydjs could effectively perform as ‘standard’ DJs and the DVJ-1000’s would automatically produce visual output as a ‘side-effect’ of their performance. This visual output was incredibly well synchronized and unique, particularly for first-time audience members.

Their approach do have some significant drawbacks however:

  • The visuals are hand-crafted to specifically suit one song (or occasionally 2 pre-mixed songs) and required considerable effort to create. Most people do not have the artistic skills or resources to pay artists to create custom made visuals for a large selection of songs.
  • Given how each visual is custom made for one (or sometimes two) songs, they cannot be effectively re-used for any other songs.
  • Not only are the Pioneer DVJ-1000’s – used by 2manydjs – expensive, they have also been discontinued.

The success of their ‘Under the covers’ tour highlight the value in synchronized audio video performances. PlanMixPlay’s core focus is improving live audio video performances.

PlanMixPlay features – implemented and planned – to support this include:

  • Implemented – Word-granular timing and visualization (and support tool).
  • Implemented – Multiple generative visualizers reacting to a range of audio frequencies.
  • Partial Implementation – Artists compositional choices visualized as part of visual output.
  • Planned – Audio stem separation to better visualize distinct portions of a song

Two additional unique features of PlanMixPlay – Variable immediacy and remote audience feedback integration – are detailed below.

Variable Immediacy

Variable immediacy is the ability to separate action from reaction, i.e. extend the time between an event causing a reaction. In a performance context, this could be a DJ scratching a record and an audible record-scratch noise being emitted via attached speakers. Another example would be a drummer striking a drum and the sound being immediately audible from the drum itself.

A close connected term is that of liveness, perhaps most deeply explored by Auslander. Liveness, in a performance context, often refer to performances that have low (or fast) immediacy and utilize instruments that maximize expressiveness. Expressiveness is critical, as without it, one could use the definition to claim that a cd player is the epitome of liveness.

Sergi Jordà contributes a concrete formula defining the efficiency of a musical instrument thusly:

(1)   \begin{equation*} MusicInstrEffic_{correct} = \frac{MusicOutputComplexity \times PerformerFreedom}{ControlInputComplexity} \end{equation*}


(2)   \begin{equation*} PerformerFreedom = PerfFreedMovement \times PerfFreedChoice \end{equation*}

In very brief terms, his formula states that a musical instrument’s performance is maximized when its output complexity is maximized in conjunction with the performers freedom, and the input complexity is minimized. In other words, an efficient musical instrument is one that allows for a lot of flexibility both in terms of output and performer freedom, yet makes it as easy as possible for the performer to supply this input.

With all that out of the way, we can finally return to Variable Immediacy. It enables performers to separate themselves from liveness to a point they are comfortable with. For example, imagine that you’re a novice guitar player. Strumming a single note is difficult at first, but eventually possible to reliably complete in a timely fashion. As the number of notes that must be hit increase, as well as increase in complexity (chords), the act of playing the guitar becomes far more difficult. Learning to do this expediently and accurately is part and parcel of learning to play a guitar. But what if you could separate yourself from time? What if you could (seamlessly) move into the future and play a few chords, and return to right now and continue your playing. That is the core of variable immediacy. Separating performers from time, allowing them to build and perform more complex compositions.

Remote audience integration

The majority of live performance instruments do not integrate audience feedback. Partially because the most prolific and popular live performance instruments cemented their dominance in a time when this wasn’t feasible. The fact that integrating audience feedback hasn’t seen much traction in modern times can be attributed to several reasons:

  • Many live performance instruments offer no straight-forward approach to do it. How would a guitar, drum, or turntable show any kind of feedback?
  • Not all artists perform remotely. In local performances, there’s no sufficient need for instruments to support audience feedback. The artists can easily observe and hear the live audience feedback.
  • Questionable value proposition. Some artists would describe the relationship with the audience as a love/hate relationship. While general feedback – mostly positive – is appreciated, granular or negative feedback less so. Some artists see little value in a tight feedback loop during a performance.

While the first two reasons are not debatable, the third one is. I believe that the rising trend of remote broadcasting/streaming will continue and the landscape for remote audience integration has barely shifted in the past decade. Admittedly, it’s not an easy problem to solve. The bigger the audience, the more difficult it is to digitally condense a shared sentiment, in real-time. But if we look at popular contemporary solutions, such as a scrolling chat window, clearly there are improvements to be made. A scrolling chat scales incredibly poorly and leaves the performer with a limited impression of the audiences sentiment.

I think there’s incredible potential in integrating audience feedback directly into a performance interface. There are – of course – many pitfalls to avoid, as well as careful considerations to make in terms of how to balance and best encourage interaction between the performer and the audience. But it’s a source of potential that I think is worth exhaustively exploring.


Numerous individuals were involved in both testing and providing helpful feedback as part of this project. I’d like to thank Phil Huey, Makoto Nakajima, David Roy, Ryan Modality, and Jan Rod for both helping in early interviews, participation in later user studies, as well as in the first private in-house party use of PlanMixPlay. Thank you also to Mark Jackson for participating in the latter event.