Last updated: 2019-04-18 16:02:13 +0100

Upstream URL: git clone


View repository

View issue tracker

Contents of README follows

Music Management Scripts

For each artist directory in Music/Commercial/*, see if we have a cached albums file. If so, loop through the cached albums and see if anything inside the artist directory matches (using <code></code>). If not, report the album as missing.

For each artist directory in Music/Commercial/*, invoke <code>check_on_metalarchive</code>. If that fails, invoke <code></code>. If both fail, report the artist as not being found.

Look up the given artist on, by fetching the URL "". Cache the result, and if it's a 404 page, return an error code.

Search for the given artist on and cache the results. If no results are found, return an error code. If multiple results are found, and the artist name contains a country code (e.g. "Foo (Ger)"), narrow down the results to that country. If there still isn't a unique result, return an error code.

If a unique result is found, fetch and cache that artist's discography page from metal-archives, then return a success code.

Python script to find duplicate files. For any stdin lines of the form "COMPARE", check if the filenames "foo" and "bar" have a suffix which appears to be audio ("ogg", "mp3", "aac", etc.); if so, and they both have the same suffix, then invoke <code>avconv</code> to get the CRC of their audio data. Cache the CRCs, and if they match, print out a command which can be used to move one into a "DUPES" directory.

Look for non-music files in the Music directory. For example, image files, text files and rubbish left behind by inferior operating systems.

Given an artist directory name, like "Foo (Ger)", echoes the artist name and canonical country name, delimited by tabs, e.g. "Foo"

Run the "fdupes" command to find duplicate files, once per artist directory.

Compare the names of artist directories using <code></code>, to try and find duplicates. False positives (e.g. "Master" and "Masterplan") can be written to a text file ".allowed_artist_dupes", and will be ignored on subsequent runs.

Run <code></code> on all directories found inside each artist directory. This is useful for finding duplicate album directories.

For each artist directory, list the files it contains and send through <code></code> to find possible duplicates. Send these through <code></code> to check which are actual duplicates.

Look for any file with a name which indicates it's a full album (i.e. one large audio file, when we'd prefer a directory of individual tracks).

Look through the Music/Commercial directory for files which we know should be in Music/Free (e.g. those from OCRemix, Newgrounds, etc.)

Given an artist name, an album name and a YouTube playlist URL, creates the appropriate directory (if needed) and extracts the audio from the playlist tracks using <code>youtube-dl</code>. <code>tag_album_dir</code> is invoked afterwards.

Uses TaskSpooler to queue up download and tagging processes, if available.

Uses <code></code> to compare each line of stdin with those which came before, looking for possible duplicate filenames.

Like <code></code>, but produces output in a format which is more easily parsed. May run through stdin forwards or backwards, with a 50/50 chance. This allows more duplicates to be spotted, without requiring any state, and without going through the input twice, which might potentially list both entries as being duplicates (e.g. if "foo1" and "foo2" appear in the input, we only want one to be flagged as a potential duplicate, since removing duplicates should always leave one copy in place)

Loop through a bunch of music directories. For each, look for files which don't appear in Music/Commercial and move them over. Those files which appear in both are output in a format suitable for checking by <code></code>.

"Top-level" script, invoking a bunch of others. Doesn't invoke anything which requires Internet access, or anything which involves guesswork (e.g. duplicate finders)

Look for album directories which appear to have rubbish like "(Disc 1)" in their name, and output commands which will clean them up. This can greatly reduce duplicates.

Look for cruft in filenames from various music sources, for example and, and output commands which will strip this cruft.

Look for dodgy whitespace in filenames, for example double spaces. Output commands which will remove such dodgyness.

Report empty files and directories.

Remove characters from the argument which aren't letters and make the result lowercase. This increases the chance of spotting duplicates, e.g. if they have different capitalisation or punctuation.


Given a directory path ending in "FOO/BAR", will use <code>fmmpeg</code> to set all of the "Artist" tags of the contents to "FOO" and all of the "Album" tags to "BAR".

Uses <code>avconv</code> to ensure the audio data isn't changed, by comparing the CRC before and after.