Top of pageSkip to main body

make.believe Special site

Global


Skip to content

Technology

12 Tone Analysis(Music Analysis Technology)

There is a growing need for high-quality musical metadata (data characteristics) to support new ways of enjoying music, including advanced music search and recommendation. Conventional manual metadata assignment is costly and can lead to other problems, such as data inconsistency.
Sony has developed a unique 12 Tone Analysis system that automatically extracts a variety of metadata, including beat, chord progression, song structure, genre, instruments and mood, by using signal processing and statistical processing to analyze musical waveforms. This technology has been used in Sony's GIGA JUKE and Rolly, and also in VAIO software.
Figure 1: 12 Tone Analysis
Figure 1: 12 Tone Analysis

Applications Based on 12 Tone Analysis

With 12 tone analysis, metadata can be applied to all songs automatically. The following are some examples of the types of applications that can be implemented using automatically extracted metadata.

  • Searching for songs with specific characteristics (fast, bright, etc.)

  • Searching for songs with similar metadata to find songs similar to one's favorites

  • Continuous playback of just the chorus (main part of the song) sections of multiple songs

  • Automatic creation of slideshows, etc., based on the mood of songs

  • Automatic classification of radio shows into music and talk

How 12 Tone Analysis Works

With 12 tone analysis, music is analyzed through the following processes.

Time-Tone Analysis
The 12 tone analysis process begins with a two-dimensional analysis of the song based on time and tone. There are 12 tones (equivalent to the do-re-mi scale) per octave. When this analysis is performed first, it becomes easier to extract the information needed to carry out subsequent processes, including the detection of the timing and strength of the initial sound, timbre and chord structures.

The filters developed for 12 tone analysis allow rapid high-precision analysis from bass to treble.
  • Figure 2: Example of Time-Tone Analysis
    Figure 2: Example of Time-Tone Analysis


Analysis Based on Musical Theory
Using the two-dimensional image obtained through this analysis, a variety of signal processes and detection processes are then carried out to detect features based on musical theory, such as beat elements, including tempo, rhythm and bar, as well as chord progression, key, and song structure.
  • Figure 3: Example of Automatic Detection of Chord Progression
    Figure 3: Example of Automatic Detection of Chord Progression


  • Figure 4: A Song Structure
    Figure 4: A Song Structure


Previously element technologies, such as chord detection, song structure detection, were treated separately. With 12 tone analysis, all detection processes are integrated, allowing estimation based on the reciprocal use of multiple detection results. This ensures extremely accurate detection processing.

Feature Extraction
The results of time-tone analysis and analyses based on musical theory are next used to extract features that can be used to classify songs. The 12 tone analysis system brought to market by Sony uses several dozen highly independent features to support the classification of music from various perspectives.

Metadata Estimation
Finally, the features obtained through these musical analysis processes are used to estimate metadata, examples of which are listed below. The resulting metadata can be used for song searching and other purposes.

Examples of Metadata
Perceived
speed
The speed of the music as perceived by the human ear. This feature is distinguished from tempo, since the perceived speeds of songs may vary because rhythm patterns and other factors, even if the tempo is identical.
Perceived
energy
The energy of music as perceived by human ears. A quiet song will seem to have less energy, while a bright and lively song will seem to be more energetic.
Genre Whether or not the song fits a particular genre, such as rock, jazz or classical Instrumental sound: Whether or not the music includes particular instruments, such as piano, bass or guitar.
Instrumental
sound
Whether or not the music includes particular instruments, such as piano, bass or guitar.
Mood Whether or not the song fits particular mood keywords, such as "bright" or "refined."

With 12 tone analysis, a vast amount of statistical data attached to each of several dozen metadata can be subjected to statistical analysis and machine learning, resulting in extremely accurate metadata estimation.

Classification of Music and Talk

Some musical analysis technologies can be used to distinguish between music and talk with considerable accuracy. A music/talk recognition system based on 12 tone analysis would extract basic features from the results of time-tone analyses of radio programs. After learning from actual radio broadcasts, the system would then be able to label recordings as music or talk at specific intervals.

Because 12 tone analysis classifies music using a large number of features that have been optimized for music, it is able to provide extremely accurate sorting of music that has been prone to recognition errors with earlier systems, such as rap and music with wide variation in volume.

Finding Similar Songs

When earlier systems are used to search for similar songs on the basis of metadata distance, the resulting songs are not always similar. This is because metadata distance does not necessarily reflect similarities in music as perceived by human beings.

With 12 tone analysis, features are first converted into features that closely reflect similarity based on experimental data obtained by measuring similarity as actually perceived by human beings. Song similarity is then estimated using the converted features. When similar songs are detected in this way, the results more closely match human perceptions.

The Future of the Technology

Sony will continue to develop technology to improve the accuracy of automatic metadata assignment. Another goal is the proposal of new applications based on metadata.




End of main body
Copyright 2012 Sony Corporation
End of pageReturn to top of page