使用Python讀取Mp3的標簽信息

2023-08-08 23:19:22 分類：留言閱讀(6)

什么是ID3

MP3是音頻文件最流行的格式，它的讀取的標全稱是 MPEG layer III。但是簽信這種格式不支持對于音頻內容的描述信息，包括歌曲名稱、使用演唱者、讀取的標專輯等等。簽信

因此在1996年，使用Eric Kemp在Studio 3項目中提出了ID3的讀取的標解決方案。ID3全稱是簽信Identity of MP3，其解決方案是使用在音頻文件末尾附加一串數據，包含關于歌曲的讀取的標名稱、歌手、簽信圖片的使用信息。為了方便檢測，讀取的標數據的簽信長度固定為128bytes。這個版本的解決方案稱為 ID3v1。

Michael Mutschler 在 1997 對格式進行了微小的調整，壓縮了Comment字段，增加了Track信息，這個版本被稱為ID3v1.1。

1998年，Martin Nilsson & Michael Mutschler牽頭，多個貢獻者一起發起了ID3v2的工作。這個班的結構和ID3v1完全不同，數據的長度不再固定，位置也從尾部移到了頭部，并且引入了Unicode支持。ID3v2的第一個版本是ID3v2.2，2000年發布了ID3v2.4。

ID3v1

附著在音頻數據之后，長度為128bytes，每個字段最大支持30個字符。

具體的字段信息

Song Title	30 characters
Artist	30 characters
Album	30 characters
Year	4 characters
Comment	30 characters
Genre	1 byte

在數據開始之前，總是有三個字符TAG，這樣和上面的字段加起來，正好是128bytes。如果Artist字段內容不足30個字符，不足的部分用0填充。

ID3v2

ID3v2是加在音頻數據前面的一組數據，每項具體的數據稱為frame(例如歌曲名稱)。可以包含任意類型的數據，每個frame最大支持16MB，整個tag大小最大支持256MB。存儲編碼支持Unicode，這樣就不會產生亂碼問題。

Tag數據放在音頻數據之前還有一個好處，對于流式訪問可以首先獲得歌曲信息并展現給用戶。

列舉一些特性：

The ID3v2 tag is a container format, just like IFF or PNG files, allowing new frames (chunks) as evolution proceeds.
Residing in the beginning of the audio file makes it suitable for streaming.
Has an 'unsynchronization scheme' to prevent ID3v2-incompatible players to attempt to play the tag.
Maximum tag size is 256 megabytes and maximum frame size is 16 megabytes.
Byte conservative and with the capability to compress data it keeps the files small.
The tag supports Unicode.
Isn't entirely focused on musical audio, but also other types of audio.
Has several new text fields such as composer, conductor, media type, BPM, copyright message, etc. and the possibility to design your own as you see fit.
Can contain lyrics as well as music-synced lyrics (karaoke) in almost any language.
Is able to contain volume, balance, equalizer and reverb settings.
Could be linked to CD-databases such as CDDB and FreeDB.
Is able to contain images and just about any file you want to include.
Supports enciphered information, linked information and weblinks.

使用 Python 讀取ID3 信息

我寫了一個 Python 腳本可以用來讀取 ID3v1 的信息。實際操作過程中發現兩個問題：
1、ID3v1的信息沒有編碼字段，所以有的時候同樣的Mp3，在不同的系統環境中播放，會顯示亂碼。針對這個問題，打算再寫一篇文章說一下如何探測編碼。
2、iTunes應該是優先使用ID3v2的信息

我把腳本放到了 github 上，感興趣的同學可以通過 https://github.com/cocowool/py-id3 查看。

# Read ID3v1 tag informationimport osimport stringimport base64import chardetdef parse(fileObj, version = 'v1'):fileObj.seek(0,2)# ID3v1's max length is 128 bytesif(fileObj.tell() < 128):return FalsefileObj.seek(-128,2)tag_data = fileObj.read()if(tag_data[0:3] != b'TAG'):return Falsereturn getTag(tag_data)# Detect the encoding and decodedef decodeData(bin_seq):# print(bin_seq)result = chardet.detect(bin_seq)# print(result)if(result['confidence'] > 0):try:return bin_seq.decode(result['encoding'])except UnicodeDecodeError:return 'Decode Failed'# Get ID3v1 tag datadef getTag(tag_data):# STRIP_CHARS = compat.b(string.whitespace) + b"\x00"STRIP_CHARS = b"\x00"tags = { }tags['title'] = tag_data[3:33].strip(STRIP_CHARS)if(tags['title']):tags['title'] = decodeData(tags['title'])tags['artist'] = tag_data[33:63].strip(STRIP_CHARS)if(tags['artist']):tags['artist'] = decodeData(tags['artist'])tags['album'] = tag_data[63:93].strip(STRIP_CHARS)if(tags['album']):tags['album'] = decodeData(tags['album'])tags['year'] = tag_data[93:97].strip(STRIP_CHARS)# if(tags['year']):# tags['year'] = decodeData(tags['year'])tags['comment'] = tag_data[97:127].strip(STRIP_CHARS)#@TODO Need to analyze comment to verfiy v1 or v1.1if(tags['comment']):tags['comment'] = decodeData(tags['comment'])tags['genre'] = ord(tag_data[127:128])return tags# Set ID3v1 tag datadef setTag():pass

本文為作者原創，如果您覺得本文對您有幫助，請隨意打賞，您的支持將鼓勵我繼續創作。