By Jeff Tapper
MPEG-DASH introduced open standards into HTTP streaming but did not include the client application as part of their standards. Online video player developers have been left to fend for themselves in determining how best to build the client side applications to consume DASH content. This article discusses the current state of online video, delves into the DASH standard, explores the challenges of building a DASH player, and, finally, walks through the basics of implementing the open source Dash.js player.
The Current State of Online Video
Online video is a rapidly growing sector of Internet traffic, with studies showing it comprised over 50% of global internet traffic in 2011.1 Unfortunately for everyone involved in the video delivery industry, there are a wide and disparate set of formats in which the content must be delivered in order to be available for all of the devices that will be consuming that content.
Today, there are many different standards that are used for delivery – many proprietary to specific vendors. In order to make the same content available to computers, phones, tablets, gaming consoles, set-top boxes and other connected devices it is often necessary to deliver the same content in five or more formats.
One convergence has happened in this landscape over the past few years, as more companies are moving away from persistent connection protocols towards HTTP streaming. While HTTP traffic often has additional overhead and latency than persistent TCP connections, it also offers efficiencies, such as the wide variety of caching options available for HTTP traffic. Additionally, most firewalls will allow HTTP traffic but are more likely to block persistent connections.
HTTP streaming takes the source content and segments it into discrete chunks of data. These individual chunks can then be served from any web server, delivered over HTTP, and handed individually to the video player at the client side. The client needs to know how to reassemble these segments into a video stream.
To capitalize on the desire for HTTP streaming, many different companies developed their own proprietary solutions. Among these are HTTP Live Streaming (HLS) from Apple, Microsoft Smooth Streaming (MSS) and HTTP Dynamic Stream (HDS) from Adobe. Each of these implementations uses different ways to describe the content (the manifest files), and formats for their segments. Therefore, each client needed to implement specific logic to support any of these formats. There has been a recent effort in the industry to add support for these protocols to other company’s devices (for instance Microsoft released a MSS plugin for Adobe’s Open Source Media Framework (OSMF), which will allow an MSS stream to play in Adobe’s Flash Player2), but the landscape remains such that no one standard is broadly supported across most devices.
In April 2009, the Motion Picture Expert Group (MPEG) issued a call for proposal for an HTTP streaming standard. After receiving fifteen different proposals, MPEG collaborated with industry experts and other related standards groups including the Third Generation Partnership Project (3GPP) to develop the Dynamic Adaptive Streaming over HTTP (DASH) standard known as MPEG-DASH.3 The scope of the MPEG-DASH standard was limited to the Media Presentation Description (MPD) format of the manifest files as well as the segmentation standards for the server.
The MPEG-DASH specification was a huge step forward, but was extremely broad in its reach. In order to remain relevant in future years, the standard specifically does not endorse any specific codecs and allows for inclusion or exclusion of most elements in the MPD file.
In order to help speed adoption of MPEG-DASH, the DASH Industry Forum4 proposed a reduced specification, which was limited to a single Codec (AVC/h.264) and narrowed the options available within the MPD. This specification is known as DASH-264.5 As part of the DASH Industry Forum, Digital Primates was charged with building the DASH-264 reference client.
How to Play a DASH Stream
In order to play an MPEG-DASH stream, there are several steps the client software must undertake.
- The process always starts with a request for the manifest (MPD) file. The manifest is provided as an XML document, which must be parsed.
- Once parsed, the client needs to determine what segments are available and where to find them. The manifest provides a wealth of additional information, including which codec’s are used for the audio and video segments, how much content should be buffered at the client and the overall length of the content. With each file downloaded, the client needs to compare the size of the file downloaded against the time it takes to download the file. This metric is the bits per second (BPS) of each download. This calculation can be used as part of the logic to determine which bitrate the client can support.
- Once the client knows where to find the segments and has an initial estimate of the bitrate the client can support, a choice is made as to which representation of the content should be used. (Each discrete bitrate of the content is supplied as a separate representation.) The initialization segments are requested from the server for the appropriate representation and the download begins. The first few segments of content are also requested from the server for the selected representation.
- Before a client can play any content, the player must be primed with the initialization segments. Since different representations may be encoded with different profiles, the initialization segment prepares the player for the encoding of the content that follows. The DASH specification actually allows for the media segments to contain the initialization data, making them self-initializing segments, the reality is every DASH stream we have encountered to date is not set up this way, and indeed requires external initialization.
- DASH segmentation provides for independent audio and video segments of the content. Depending on the platform the client is built on, it may be necessary to first MUX the audio and video together. Other platforms allow for content to be buffered to the video player as discrete audio and video segments. If MUXing is necessary, the download content is MUXed together once the audio and video segments with corresponding timestamps have been downloaded. The content is then handed to the buffer of the video player to begin playback.
- As the stream plays, the client needs to monitor the player and record metrics on the player’s performance, including the ratio of buffer filling to emptying, the number of frames dropped, the actual framerate of the playing content, and others. This information, combined with the BPS measured on each download, is used to determine if the current bitrate is appropriate to continue for the client or if the client should request a higher or lower quality stream.
- As long as there is more content to play, the client continues to request segments, with the ability to switch representations at each request. This provides for a seamless adaptive stream for the end user.
Considerations for Playing a Live Stream
In many ways, playing a live stream is remarkably similar to playing a video on demand (VOD) stream. However, there are a few key differences that must be understood when building a client application for a live stream. From a playback perspective, a user will start watching a live stream from the most recent point (the live point) of the stream, where a user watching a VOD stream will start at the beginning of the content.
One of the biggest challenges in building a client application for live is actually determining what the most recent available segment is. As content is constantly being encoded and segmented, the most recent segment is constantly changing. Other HTTP streaming technologies solve this problem by constantly updating the manifest and forcing the client to refresh the manifest frequently. DASH takes a different approach in that a live stream can be done without necessitating a refresh to the manifest. There is a value in the MPD file that indicates to the client how often the client should request a new version of the manifest. In many cases, this value is measured in hours or days.
The unfortunate reality is that the DASH specification does not specify whether or not a stream is live. To handle this, the player needs to be built to allow implementers to externally indicate if the stream should be treated as a live stream or not.
At the simplest, the calculation to compute the Live Point of a stream is the
availabilityStartTime attribute from the MPD subtracted from the actual time. (
NOW – MPD.@availabilityStartTime). There are a number of other values in the manifest which can affect this.
In reality, it gets more complex, in that the manifest may separately specify a
suggestedPresenetationDelay and a
minBufferTime. So, to take those into account, the formula becomes
NOW-availabilityStartTime – suggestedPresentationDelay – minBufferTime. The results of that must be checked to make sure it is now greater than
NOW-timeShiftBufferDepth, or it will end up looking for a segment which does not yet exist. Once the live edge is calculated, a check needs to be made against the manifest to find the closest segment which starts before that live edge time.
Another thing to note for live playback is whether or not the content allows for digital video recorder (DVR) functionality, which would allow the user to pause the live stream and continue where they left off, or seek through the available content prior to the live point (DVR window). If the manifest indicates that DVR functionality is supported, the client must provide the necessary user interface to allow the end user to interact with the content.
One challenge with the DVR functionality is that the specification does not provide any definitive way to indicate if this functionality is available for a stream. To date, we have used the
timeShiftBufferDepth value from manifest for this. The value of
timeShiftBufferDepth indicates how much time is guaranteed to exist “before” the live point. This, in theory, is the “DVR window”, as it’s the amount of time that’s guaranteed to exist in which the user may seek.
Building the Dash.js Client
When we were chosen to build a reference client for the DASH Industry Forum, there was discussion about the platform on which the client should be built. To allow for as many people to be able to use it as possible, a consensus was reached that it should be a desktop platform available across multiple operating systems. Eventually it was decided that the reference client would be built to run in web browsers that support the MediaSource Extensions to the W3C specification for browsers, leading to the inception of the DASH.js project.
DASH.js was designed to be a freely available open source application that could serve the needs of a reference client for the DASH Industry Forum as well as be a usable sample client that could be extended and deployed across the web. As such, the architecture had a few key goals.
- Provide a lab quality reference client that offers a wealth of diagnostic information.
- Be able to accurately play any stream that is compliant with the DASH-264 specification.
- Provide an extensible framework so other participants can change / enhance the client as their needs dictate.
- Provide the client under a permissive open source license to allow for it to be freely used and modified.
- Provide a framework to accept submissions back into the project from participants wishing to contribute.
The DASH.js project was established on Github in the second half of 2012. The code is all submitted under the BSD3 license.
The code for the project is divided into two main packages: streaming, which contains all the core classes for the player, and the dash package, which contains the classes specific to the DASH specification.
The core class that implementers are most likely to use, this class contains core methods such as play and pause. This class is also passed the URL to the MPD file, as well as a reference to the HTML Video tag in which the content will be played.
The context class provides a mapping for the Dependency Injection framework. It specifies what concrete classes should be used in place of specific classes listed. The use of Dependency Injection in this application allows for easier unit tests, as individual pieces of the application can be tested in isolation.
Injected into the MediaPlayer, the Stream class interacts with the
ManifestLoader which is responsible for loading/refreshing the Manifest, listening to the HTML Video element for events and creation of
Responsible for all interactions with the
BufferController feeds segments to the
SourceBuffer, and checks the length of the buffer as long as the content is playing. The
BufferController also captures many metrics used in the
Handles actual loading of the manifest, and returns parsed objects from it.
This class is responsible for loading the requested fragments sequentially.
ABRController uses the available metrics to determine if the current rendition being played should be changed. The metrics are applied against a set of rules to make the decision of whether or not to change the bitrate.
The rules package contains the current rules available within the dash.js player. Current rules include the
InsufficientBufferRule and others. Rules that implementers want used in their player are added to the
BaseRulescollection. Rules are envisioned as one of the key extension points that contributors will be able to add to as their needs arise.
Used by the
Parser handles actual parsing of the MPD into JSON objects to be used by the system.
Injected into the
DashHandler is responsible for deciding which segment should be loaded next. The
getSegmentRequestForTime() method takes a time and a quality, and returns a URL for the appropriate segment. The other most commonly used method is
getNextSegment(), which returns the URL for the next segment in sequence of the same quality.
Putting it Together
MediaPlayerclass, and pass an instance of the
DASHContextclass to its constructor.
- Call the
startUp()method of the
MediaPlayerclass. This will ensure the necessary dependencies are injected into the class.
- Call the
isLive()method of the
MediaPlayerclass to indicate whether the stream to be played is a live or VOD stream.
- Call the
attachSource()method to pass the URL of the MPD file to be used.
- Set the
autoPlayproperty to indicate if the stream should start playing as soon as the manifest is parsed, or if it should wait for the user to click a play button.
- Call the
attachView()method of the class to indicate the HTML element in which the content will be played.
autoplayis false, playback begins when the
play()method is called. This starts the internal process to play the stream.
Streamobject is created and passed the MPD URL.
Manifestis loaded and parsed.
BufferControllersare created. One
BufferControlleris created for Audio and another for Video.
- Duration of the
MediaSourceis set using the value from the manifest. Infinity is set for a live stream.
- If stream is live, calculate segment for live edge.
- Play is called on the HTML video element.
BufferManagerchecks duration of video still in buffer.
ABRControllerchecks if bitrate switch is necessary before next segment is loaded.
DashHandlerto get the URL for the next segment.
FragmentLoaderuses URL to request next fragment.
- Repeat until done.
While playing video with DASH is more complex that a simple progressive download, the DASH.js player simplifies the process. With just a few lines of code, it is possible to have a fully functional adaptive bitrate player up and running.
1. Cisco Visual Networking Index: Forecast and Methodology, 2011-2016 https://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11- 481360_ns827_Networking_Solutions_White_Paper.html
4. DASH Industry Forum https://www.dashif.org