Thursday, January 26, 2006

Calling All Techs!!! RIAA Defends Its "Investigation", Says Metadata Shows Illegal Copying

In its opposition to defendant John Doe Number 8's motion to vacate the RIAA's "ex parte discovery order", filed today in Atlantic Recording v. Does 1-25, the RIAA has submitted a declaration of Jonathan Whitehead -- an RIAA Vice President -- that the metadata in John Doe Number 8's shared files folder shows that illegal copying took place:

Declaration of Jonathan Whithead
Exhibit A
Exhibit B

This would appear to be in contradiction to the earlier affidavit of computer programmer Zi Mei in support of the motion.

Any input from the tech community would be of interest. The response to the Whitehead declaration is due February 7th.

Keywords: copyright download upload peer to peer p2p file sharing filesharing music movies indie label freeculture creative commons pop/rock artists riaa independent mp3 cd favorite songs


indiefeed said...

Chris MacDonald here from IndieFeed. I'm not so sure this is a technical question so much as an identification question. The metadata tag in question appears to be the "comments" field where ripping entities provide information about their software, or the individual who does the ripping. So to the extent that your collection has various and sundry comment "signatures" you could establish that the possessor of the file obtained the files from various and sundry sources, debunking any claim that the person lawfully transcoded the file from their cd to their own music management system. It's an interesting argument but it would seem to make sense...

tinfoil said...

tinfoil from here. It would appear to me that they are attempting to use ID3 information as proof that this person has illegal material. I assume that is what they are calling the metadata, at least. Alas, it's trivial to mislabel or change ID3 information. There's ID3 tag editors available for free or commercially. It's very poor proof indeed!

However, they do have the SHA hashes for each file, so if they have his computer and the files are still on it, they can compare the files in their list to the files on his hard drive, and if they find an MP3 with a matching hash, he's pretty much up shiat creek.

Still, I find it very hard to believe that they verified each one of these files. 300+ MP3s is roughly a gig or so, and on a conventional 1Mb ADSL connection with a roughly 300kb upstream, it would have taken them like 3-4 days to grab the files, assuming that the RIAA was getting 100% of this persons upstream.

CodeWarrior said...

Although I am not a lawyer, nor play one on television, much of the testimony of seemed not so much based on hard science, as hearsay and speculation of the grossest nature.

Just having a file that is marked as a "shared folder" that has a large number of files, proves nothing. I could create a folder called "My Shared Folder" and copy a thousand files from my hard drive to it, and it would not be prima facie evidence that the files were per se, copyright infringements.

Of course, if they are using the DMCA as their legal authorirty, they ned to provide proof of copyright registration to sue under the DMCA.

Having metatags that says "Jazzy D ripped this from So and So's album" is hearsay of course without conclusive proof of such unauthorized copying.

I noted tinfoil's comment (shout out to Tinfol, how's it going my friend ) and agree with the poor quality of proof aspect.

In closing, I noted the "signature" of Jonathan Whitehead. It looks ambiguously either like a poorly formed star, which this wannabe star may actually aspire to be, and hence his association with the RIAA, or it could be a giant "A". I wonder what a giant "A" could stand for, given this gentleman's personality.


CodeWarrior said...

One further comment. In looking at the programmer's testimony, and the silly exhibits A and B which Jonathan Whitehead presented, it is clear to me which has the greater convincing weight, and it is certainly NOT the man who signs his statement with a Giant Start / Giant A.

The difference in expertise is like trying to decide who knows more about the construction of light bulbs, Charlene Tilton, or the chief engineers at Westinghouse.

Larry Rosenstein said...

I agree with Chris MacDonald. Zi Mei describes the technical aspects of metadata, including that it is optional (does not affect playing the tracks) and easily modified.

It becomes a legal question whether having tracks with a lot of different metadata characteristics is proof of infringement.

An analogy might be someone who has a collection of books with hand-written notes in the margins, all with different handwriting. Is that proof that the books are stolen?

jaded said...

In his declaration, Messr Whitehead indicates that 'Doe 8 is a user of the Limewire system' and that 'all of the Doe Defendants in this case are users of the Gnutella network'. This is interesting in light of the fact that all of the Exhibits associated with the original filing were screen-shots of Kazaa, which is not, from what I understand, a Gnutella network. It is its own network, I believe. This is an inconsistency that cannot be resolved with the material available, but does suggest there is some confusion on the part of the RIAA.

Beyond that, the User Log that is attached to the current declaration is a fabricated document (i.e., I do not believe that it is a report that can be readily produced by LimeWire, or probably even Kazaa). The only way that an association can be made between a users IP and the list of songs is to have a screen capture showing a download occurring from Doe 8 with Doe 8's IP address and nickname clearly visible (ergo the comment above re the depiction of Kazaa screen captures in the initial filing). The other piece that would be needed is a screen capture that associates Doe 8's nickname with the list of shared songs that are included in the attachment to the exhibit. I don't believe that LimeWire even provides a way for a user to dump (i.e., print) a copy of what files they might have in their own share folder. Given that, there obviously would be no way to easily print, or capture, the files shared by another. If there is a utility that the RIAA agents have used to automate this, it has not been disclosed and subjected to scrutiny.

Beyond that, there is the issue of metadata (and the hash codes, which was not discussed in this declaration). There is certainly a wealth of circumstantial evidence that suggests that the files might have been obtained from other sources, but there is no conclusive evidence. All of the evidence points to the use of a Comment field that anyone can alter at will. Now, would anyone not associated with a particular web-site type that web-sites URL into a file's ID3 comment field - one can only speculate. Is it possible - ABSOLUTELY. Anyone can do so.

There are, however, tell-tale markings that encoders do introduce into a file that serve to identify which encoder was used, but this is not metadata and only available by perusal of the binary file itself (there are some tools out there that attempt to do that - it's not an exact science). One must have physical copies of the actual files to do this level of investigation.

What has not been shown, and would be a simple matter to do, is an indication of exactly which files it is that the RIAA had downloaded from that list (one would expect that this file would be the same file shown to be down-loaded from the above screen capture). If this file has none of the attributes that Messr Whitehead alludes to in his declaration, then nothing is proven. Absent that, all that exists is a listing that purports to represent a number of files that contain arbitrarily changeable metadata (for the most part - bit rate and length is not changeable, obviously).

But what is there to prove that an arbitrary file that the RIAA has is, in fact, the specific file that was downloaded by the RIAA agents? There will be a time-tag associated with the created file, but that is also something that can be readily altered. There is, in fact, NOTHING that can be used to show conclusively that any file that the RIAA might have in its posession came from the alleged Doe 8. Even the hash code, which presumably would be shown as being the same as others shared files could come from any one of those sources (or even fabricated, as discussed below).

WRT the hash code, as has been previosly indicated and captured in Zi's declaration, any rip/encode of a CD track with the same version of an encoder and quality setting (etc., etc.) should result in files with the same hash code. [I've not tried this myself, but am sufficiently intrigued to contemplate doing so.] Further, going to commonly available on-line metadata repositories would result in the same data being inserted as metadata which, once again, should result in the same hash code. Unique aspects of a files metadata, such as the Comment field discussed above, should also result in the same hash code if entered identically between the two files.

recordjackethistorian said...

The only thing that is clear is that this is a list of *someone's* songs. Lists like this are available quite freely and having a list doesn't mean he had the material on his computer. If I have a Ikea Catalogue does that mean I have Ikea furniture? Of course not.

The other thing that is striking here is that theses are indeed ID3 tags, but Windows users don't always use these tags. The custom for windows users is to use the file name as metadata instead of the MPEG layer 3 audio tags. Some people are assiduous about these things, others don't care.

Johnathan Whitehead betrays his ignorance of how things really work by making assumptions which any experienced computer user would know.

I'm a Mac user myself but I know these things about how my Windows using friends usually do things. This is certainly not it. Where is the list of files actually found on his computer? That might be more convincing than a catalogue of a collection from an unknown source.

I have a desert dish with a piano keyboard border on one side from a restaurant in Montreux Switzerland, does that mean I've been to Montreux? Could you accuse me of stealing such a plate because they do not ordinarily sell them? In actual fact, it was a gift from a friend who was traveling there and obtained it legally.

As has been pointed out above,"file generated for user ", seems very suspicious to me. A user name would be the standard way of addressing a user of the system. It would be a trivial process for the RIAA to have created this document themselves. Its a list, nothing more. It proves nothing.

Good luck with this!

silencer said...

As a former help moderator for and overnet, both filesharing programs, this is my input:

The RIAA and MPAA has thousands of people flooding the internet's peer-to-peer networks with dead mp3 and movie files - in other words they may be labled as you show here, but there may not actually BE a music waveform that is playable as the music it supposedly represents via the label.

A user can download thousands of files that appear to be copywrited songs, have the right ID tags and length yet only contain static. I had heard that this practise (file faking) is illegal in the USA, as a form of fraud, but that is another story. The other issue here is that a person may attempt to download music, then cancel. The log will still show the attempt, and this may be all this list is, if these even ARE music files.

The ONLY way to prove infringement is if 1) no licence was ever purchased (has this person ever owned these songs on cassette or CD, if so then she has a license and can have a digital copy) 2) they can produce a VALID WAVEFORM. IF they can show a perfect waveform match between the song on the harddrive, and a waveform from a copywritten CD, then you have a fingerprint match, and can start asking how that waveform (playable as the actual song) got there.

No waveform, no proof there was ever a song transfered.

I can give you an empty bag and write 'cocaine' on the bag, but it is still is just an empty bag no matter what 'list' or metadata it is recorded on.

Whirling Blade said...

Hi folks. By way of very brief background, I have been a professional computer consultant for over 20. I am no expert on the current crop of file-sharing systems, but I have done extensive development of other distributed multi-user applications. The point I would like to make relates to the "UserLog", and the behavior of multi-user systems in general.

In a system with multiple users, everyone has an account name. This is true whether you're talking about Oracle databases, or Slashdot. It's not uncommon for people to have *different* usernames on different programs. For example, I am "Darkmoth" on Slashdot, and I am "WhirlingBlade" on Blogger.

To the system in question, I am that account name, and generally speaking, I am referred to consistently by that name. It is highly unusual for a multi-user system to make a generic reference to a particular user, not only because it is jarring to the user's experience, but also because it's confusing in any context where two users may interact.

If Doe 8 was a member of either Limewire or Kazaa, then they certainly must have user accounts with those two services. Any interface with, or reports from those services, should typically include the account name.

Exhibit A begins: "Log for User at address...". If this log were generated by the Kazaa or Limewire software, it would almost certainly refer to the actual account name rather than the generic term "User", unless the account name was literally "User". I do not rule out the possiblity that this output has been edited to hide the account name. However, this seems unlikely given that the IP address is still visible.

It is also possible that this is the output of some tool that the Limewire Admins use, and not something the average user can access. The point about account identity still stands, and if anything it's even more important for a sysadmin to know WHO his reports are referring to. Again, it's highly unlikely that the report would be generated for "User".

The title "UserLog" is also unusually generic. In most applications a log is a record of events, whether it be network errors, webserver access, or user login times. Generally the title of the log is indicative of the purpose of the log. A "UserLog" would almost certainly contain information about users (probably timestamped), not about *files*. And this is certainly a list of files.

If I had to guess, this looks like a custom program that simply reads a PC's folder and lists the files with their tags. The title is a bit misleading, as it does not seem to be a "log" of anything. It seems to be the contents of a folder on Doe 8's machine.

I also noticed a couple of oddities:

1) The inclusion of the IP address - Was this the IP address the computer had at the time this report was run? Notice the port number (a standard Limewire port). A port number is only relevant when you are interacting with internet software.

If you're reading a report from your webserver, you might see :80 as the port. Bittorrent might be :45121 or whatever. The inclusion of the port number implies that this report was run in the context of Limewire, yet as I have pointed out, it does seem to fit the signature of an actual report FROM Limewire. I strongly believe the port number may have been included simply for effect.

2) The contents of this "share" folder - If you look near the bottom of page 50 in exhibit A, you will see several files that seem unlikely candidates for sharing. Specifically, the "Limewire.exe" file, and the "Limewire On Startup.lnk" file.

I do not know the structure of Limewire, but it is not at all common for an application to keep it's executables in the same folder as the files it manipulates. Winamp doesn't do that, nor did Morpheus, or Napster, or Windows Media Player. Typically, there is a specific folder for sharing that contains only media.

It's possible that a novice user might set the "share" folder to be the same as the installation folder, but Doe 8 doesn't seem like a novice user.

In addition, the "Limewire On Startup.lnk" file is a Windows shortcut, and I cannot think of *any* circumstances under which you would share a shortcut over a p2p network. If anything, it looks like that file was moved from where it should have been (the Startup folder) to this "share" folder.

Those are my observations, some of which mirror questions you folks have already voiced. It's very late here, so I apologize for the no doubt numerous spelling errors I've probably made.

In summary, I think the operation of this "UserLog" report-generator is highly questionable, as is the source of it's input. If I were you I'd want to know what it is, when, where and how it was run.

Good Luck!

firstaid said...

To me it does not matter what files are on an "IP", because there is no proof in the "IP". The RIAA can see the "IP" the file is on, they can not see who is downloading it, as every "IP" on a network can be another small little ISP. This can be done through a proxy at home, through wireless routers that that have been hacked into at a personal residence, by a computer that has been zombied and taken control of to download and other means as well. They can not show proof, in other words the RIAA can not meet the burden of proof on this and there request for subpoena should be thrown out of court. So, under current standards the RIAA is using this to harass people that are completely innocent. They get the IP and connect it to someone who could be completely innocent, and launch a lawsuit against someone who could be completely innocent, causing them considerable hardship in doing so. I leave you with this thought.

How would you like people taking control of your computer to download files and then have to pay for court battles from the RIAA and such? You are being sued and facing jail time as well, and have done nothing wrong at all.
Lawyers are telling people to settle that are not even guilty, because it would be cheaper now to settle than to pay the Lawyers and possibly face jail time and when the RIAA has no tangible proof at all.

This is happening folk, and needs to be stopped NOW!!!

Do not focus on the hashes; focus on the legality of the subpoena. There should never be a request for a subpoena unless the burden of proof has been met, and in this and all cases there is no way to do that.
Has this ever been challenged in court? Did a judge actually approve this type of subpoena knowing the RIAA was going to start suing people that could be completely innocent?
Just my $.02

Alex H said...

As you guys have already established, it is possible for two people to create identical files using the same ripping program with the same settings.

If a music file is popular on p2p networks and someone makes their own copy from a CD they bought, putting that new (identical) rip in a shared folder will make the p2p client (e.g. LimeWire) record the users as having a file with the SHA1 hash XXXXetc.

When another user searches for a key word like the title of the song, both the old and popular file and the new file will identify as one and the same to the searcher.

Once you find another "source" on a p2p network, you are able to download metadata from them. Some p2p clients do this automatically.

So yes, it is possible that a user could have their metadata filled in automatically, just by having their own legally ripped file in a shared folder.

As far as the Kazaa/LimeWire thing goes, they are completely separate programs and networks: A Kazaa user can't download from a LimeWire user and a LimeWire ser can't download from a Kazaa user.

Kazza uses the FastTrack network to communicate with other Kazza users.

LimeWire uses the Gnutella network to communicate with other LimeWire users.

This is a very basic concept and anyone who doesn't know the difference between these clients and networks can't claim to have any knowledge of modern peer-to-peer technologies.

In case this ends up coming up in court somewhere:

1) The Gnutella network is open source - anyone can code a client that connects to this network, provided they follow the published specifications.

2) As the original code Gnutella code was released under the General Public Licence, anyone using it as a base to work from would have to follow the conditions set out in the GPL.

3) Anyone not following the GPL in writing their software would (obviously) be breaking the licence under which Gnutella was released.

4) If I remember correctly, GPL-breakers forfeight their right to use the GPLed code.

5) The vast marjority of developers in the p2p community in general and in the Gnutella community specifically would never endorse a "closed source" client as a well written application.

To sum up those points, it's possible that the RIAA operatives have no legal right to use any software they've built to track people's downloading habits, and even if they do, they would need to display the code publically for anyone to take the software's results seriously.

Alex H

Whirling Blade said...

Hello again. First of all I'd like to thank Mr. Beckerman and you, Zi, for tackling this important issue. After an evening of reading, I'd like to specifically address Whitehead's contention that "metadata contains a wealth of information" (Declaration, page 3 bottom).

In the Index of Litigation Documents, there is a cross examination of a Gary Alan Millin, CEO of MediaSentry (Exhibit E). According to the document, MediaSentry is a Digital Rights Management company which offers several services to their clients.

Question 104 establishes that one of MediaSentry's services is something called MediaDecoy, which involves putting "fake" music files up for distribution. It also establishes that "this technique puts out [...] what appear to be copies of copyright work that don't work".

Question 105 asks "when one of these is put out, why don't people just ignore it?" The answer to this question is, "They don't know what's in the file until they listen to it.". Questions 106 through 108 amplify this answer, with the final answer to Question 108 being "the only way to tell [fake files and real files] apart is to listen to them".

This seems to indicate that not only is metadata unreliable, it is irrelevant. This is documented evidence that there are files on p2p networks whose contents have nothing to do with their names, much less the various tags that float along with them. According to the testimony, MediaDecoy tries to "overwhelm file trading communities with non-working versions of your copyrighted material.", so it's not even an uncommon thing.

Much of Whitehead's presentation involves the particulars of metadata contents. Those particulars are exactly what a good "fake" file would contain. No one is going to download a fake with the comment "Property of RIAA", but they may download one with the comment "Ripped by Reloaded, props to Joe and Lil Trooth!".

In fact, based on the testimony of this DRM expert, a list of files names (as in the UserData report) is in some ways meaningless, as you must listen to the contents of the files to tell what the files ACTUALLY contain. I could certainly record myself singing "New York, New York" and change the Artist field to "Frank Sinatra"! Presumably, a good hash is also sufficient identification, but that is only my supposition. Without a matching list of hashes from the owners of the copywrighted works, the files haven't even been *identified*.

Also, Alex H makes a great point about the GPL license. You may have every legal right to see the source code of the "UserLog" program, which I now suspect is MediaScan.

ben said...

file list (including hashing) information is made available to supernodes/ultrapeers when a client connects to the service.

The RIAA could either pose as a supernode or simply just perform asynchronous searches for content. Searches return hashing information for files as well as the ip addresses which are reported to have these files.

All this can be optained without even downloading the files in question (atleast on gnutella). Take alook at the mutella source code for an example of this.

Now i think we are approaching this issue wrong. Courts deal in facts, and the MPAA/RIAA is telling the courts that their evidence is fact.

So why don't we tackle their arguments head on? Lets setup a testbed to dissprove their arguments.

If we can prove that:
(1) Metadata is in no way associated with the encoded audio stream in mp3 files.

(2) Metadata is in no way unique and can be readily changed with free software such as winamp.

(3) Metadata can be duplicated when the same music cd is ripped using the same software on different computers.

(4) Hashing produces the same ID's when two different copies of the same music cd is ripped on different machines. Which means hashing can be replicated on demand.

These assertions can be easily proved in a court of law, so why don't we do it. Hell, make video evidence of it using two different computers.

Once our arguments are proved to be fact a court can't ignore them.

Alex H said...

I have a few things, so I'll try to brak them up:

First point - Supernodes

To start with, are we talking about supernodes or ultrapeers? "Supernode" is a term used on Kazaa's FastTrack. "Ultrapeer" is a term used in conxtext with Gnutella. They basically do the same job, but there are obviously some differences considering that they are components of completely separate networks.

A Gnutella ultrapeer acts like a temporary server and acts as a clearing house for search queries made by users (such as a keyword search). Your PC may be "chosen" by the network to act as an ultrapeer (provided your PC is capable of handling lots of search data) or you can elect to make yourself an ultrapeer.

As a serch query is passed from a user to an ultrapeer, which passes the query on to other (regular) nodes, the ultrapeer basically gets to see what you're searching for.

This has been exploited by network spammers - as soon as you send a query for "A really cool song.mp3" the spammer uses that info to return a result for the exact thing you searched for. The spammer will have set up a number of nodes to act as sources for the spam file, which will return a "Yes! I have "A really cool song.mp3" message directly to the user.

Basically they rename their spam files (like virii) on the fly. Users are tricked into thinking the spam file is the file they want and download it.

So yes, it is possible to hack up an ultrapeer to find out what people are searching for. This does not mean the ultrapeer is aware of what people actually download, it just means the ultrapeer knows what people are searching for. Whether they find it and download it or not is a completely different matter.

Second Point - Validity of evidence

Without knowing exactly how RIAA has been collecting their evidence, that evidence could never be called anything but suspect.

It is well known that the RIAA (through companies like MediaSentry) pump fake files onto the p2p network. They obviously have some skill in that area, but for a company that makes their money through deceiving people, I personally would be skeptical of any information they present.

I can't emphasise this enough: in the p2p world, nobody takes your claims seriously unless you can prove the result is accurate by showing off the source code.

The RIAA saying to a court that they have "evidence" is like someone saying "trust me, I'm a doctor" - nobody could take them seriously without getting a second (or third, or fouth) opinion from an independent source. The RIAA will need to get their methods validated by someone other than their subcontractors.

It is quite possible that they "got the right guy", but by using dodgy information gathering that would be by pot luck rather than "evidence" by any legal standard.

This leads to my next point:

GPL issues

As far as I know, it is flat out illegal to knowingly break a licence agreement like the GPL - you give up any sort of right to use the software code.

I'm not a lawyer, but I would be interested to see what the RIAA's position is if it turns out they have been gatheirng "evidence" with tools they have no right to use (kind of like a private investigator who breaks into your house to snoop around).



Metadata literally means "data about data" or "information about information". There should be no expectation that one piece of data (like metadata) is any more reiable than another (like a file).

We know that files can be unreliable because the RIAA themselves have demonstrated it by creating fake files, so it is just as likely that the metadata is inaccurate too.

I really don't know how you can prove that a file transfer took place "because the metadata said it did". Metadata doesn't contain things like "Transfered from IP 185.678.etc.etc. at 10:57PM".

Metadata is information about a file's contents, not it's history.

On the whole, metadata should be regarded as:

* Easily faked
* Unreliable
* Not capable of showing a history

If you want to get really technical, head over to the FrostWire IRC channel ( or the LimeWire IRC channel - they are both filled with the people who actually write the gnutella code and they'll be able to tell you in more detail.

Alex H

Raystream said...

OK, after reading everyones comments here I am lost as to how no one knows where the "hash" values come from exactly. They come from an application that RIAA is using... it is in conjunction with the FISMA (Federal Information Security Management Act of 2002). That is an Act the mandates the methods and can does/can't do for the Federal Government over Digital Evidence.

1. The log file they have has been created by them. It is also fake. In the FISMA Act it clearly shows the proper way that a Log file must be created, and they have not followed it.

2. Digtal Screen Shots do not hold up in court. The main reason is because they can be easily tampered with. What they would have had to do is literally photograph a computer screen.

3. The hash values are an attempt to follow the "Mirror Image Backup" (Evidence grade, non tampered proof) method. The problem here is to create the hash values they need every file. The hash values are fake!

Please go through the follow PDFs and you will find the proper ways the Government must follow for digital evidence. ->

If you want to win this you have to go right after the source and validity of everything. Honestly, as a certified Security Expert you need to do this to win...

1. Presuming you are still in NY you need to go to the best CSI unit around and by whatever legal means get in contact with a Computer Forensics Expert.

2. After discussing with the Expert what points show the validity of evidence you need to drag the Expert into court to testify by whatever legal means.

3. The Expert needs to testify that by using the FISMA Act that none of the evidence has A Chain of Custody, none of the Evidence (Digital) can stand up to court without proper documentation. And that the evidence is falsified.

The RIAA isn't that stupid when they put the HASH values in there. They were trying to follow their International Piracy Raids. Where as during those they had the FEDs involved and had to follow a Chain of Custody, and properly document and seize information. The HASH values are a fall back to proving validity... as I said earlier they are from the Mirror Image Back-up (bit-stream backup) method.

Everything from their evidence is a bocked from there experience with International Raids. They tried to take what they did their and by strictly doing everything over the internet apply it. They failed... they falsified information, and as far as I am concerned should go to jail once it has been proven.

I would gladly go to court and testify since I am a Security Expert, but I can't at this time. So I urge you... if you want to win this case instead of beating the bush grab a Computer Forensics Expert currently working in the Government or third party that day to day MUST follow the FISMA Act.

p.s. This is what has to happen in every case from the RIAA. You can't deny the Federal Governments methods because they are fail proof.

Also the RIAA can not create the Chain of Custody, etc. They have to have it done by a Third Party that has no interest in the case. So Media Sentry could not. And being that they didn't then by using the FISMA Act the case will be thrown out of court.

ben said...


Each client generates the hash ID's from their own file list. They then upload this file list information including the hash values, which ultrapeers pass around.

So there is obviously a fatal flaw there. The protocol assumes each client generates the correct hashes for their shared files.

So it's impossible for the RIAA or anyone else for that matter to determine the contents of another clients shared files unless they actually download them and manually verify it.

Alex H said...

I still can't work out how they got that "UserLog".

As far as I know, LimeWire doesn't provide anything like that, and the "Matching files" stuff at the top is certainly something that's been added in by the RIAA investigators.

Whithead says nothing about how the RIAA actually obtained their "evidence", except that they apparently did.

Alex H said...

By the way, so far I've found 7 of the songs listed in Exhibit A on, a free metadata database.

Alex H said...

Oh, and I also manged to create my own mp3 rip with the same SHA1 as another track that is quite popular on the Gnutella network. The source was a CD I bought in Barnes & Noble, which has an "Imported" sticer on it and I don't know which other countries got the same imported version of the CD.

I found the ripping and encoding apps used (from the metadata) on the existing file and used the same versions of those apps to rip and encode my own mp3 file. All the metadata was available for the (existing) file, so I just used it to rip and encode with the same settings. It was pretty easy actually.

I moved my new rip into a shared folder and got my p2p app to "aquire metadata". It searched the network for a bit and now my new rip has all the same metadata attached to it as the existing file that was already on the network.

Just in case anyone is interested.

Tsu Dho Nimh said...

I have some comments on the declaration:

1. He claims to be able to be able to "competently testify as to the facts" ... but unlike any expert witness deposition I have seen, does not detail what makes him competent about those facts. Unless he's a geek, with a specific area of expertise, he's not competent. He's just a suit. Get the "KErnighan" deposition in the SCO vs IBM case for an example of a real expert's deposition.

2. He claims ot have personally "supervised, directed or reviewed" the results ... what does that entail? Did he REVIEW these results, or just tell a flunky to get something.

3. What was their PROCESS for reviewing? Did they test to make sure it didn't get false positives?

4. Is there any innocent way that this folder could have been shared? If Doe8 was using MSFT, I think it shares with world+dog by default.

5. How is the "sheer number of files" indicative of anything except that there were lots of files?

6. How were the files they downloaded kept? Is there a chain of evidence, and secure, digitally signed copies of the download, or could any over-eager flunky with a meta-tag editor get to them?

7. They "undertook an expedited review" of the files IN RESPONSE TO the motion. Had they reviewed them before then? What were the results of the preliminary review?

8. The broad range of software and comments does tend to indicate that Doe8 was DOWNLOADING somgs, which might n ot be illegal according to USC-17 (despite what the deposition says), but they have to prove he was making them available for download, deliberately.

Tsu Dho Nimh said...

Zi -
It may be true that the RIAA is not a government, but they MUST be able to explain exactly how they got their evidence, and show that it could not have been falsified or altered while it was in their posession.
What was their method, what software did they use, etc.

"I moved my new rip into a shared folder and got my p2p app to "aquire metadata"." And this is what a LOT of people do, because typing all that metadata is a pain in the butt!

Bacardi said...

What effect will GhostSurf have on the internet addresses gathered by the RIAA?

I am not a user of p2p so I cannot test this. I do use GhostSurf but I don't know how effective it is.

Dogeron said...

Hi All,

The file that took my interest was _GEAREXT.WO_IDENT.TXT on page 47 of the "User Log".
I've checked around and no one's been able to confirm what these file actually are but all search responses seem to link to trojan/malware removal sites....this one for instance.

Now assuming that the user log is genuine - strange that a directory listing has no files sizes - could this be an indication of a compromised machine?


A Lag said...

_GEAREXT looks like some part of a CD/DVD mastering program. Not sure what, though, but someone posted what was supposedly a copy of it to some forum.

The SHA1 hashes are pretty problematic here; they pretty well uniquely identify a user. Also, the metadata gives mention of a number of release groups, etc. which seem to indicate an unauthorized source for those files.

While it's true that one can edit ID3 tags, it's a bit of a stretch to suppose that someone flagged all of their files to make them look like they infringed upon someone's copyright. Also, if they can prove that the SHA1 hashes match something they know to be infringing :(

However, there's little mention as to how they got this metadata and whether all of it is accurate? While that file is pretty damning, I simply do not know enough about the utility they used to generate that. Can they produce actual copies of the files? Where did they get the SHA1 from? Did they SHA1 the file themselves or use data reported by the network?

Note here that they COULD, in theory, have just taken the metadata out of the search network, never *actually* validating any of it themselves. In fact, I'm starting to suspect it's exactly what they did the more I think about it. In that case, they still have to link that metadata to that actual user's computer. Just because the network reports it does NOT mean that you can necessarily trust the network. I've heard tell of others having sharing programs pestering unrelated computers for files because a DHCP lease or something expired and the person who was *actually* sharing the files appeared under a new IP, while the sharing network was unaware of this.

In other words, unless you can show that the metadata was fabricated, you're better off forcing them to prove the link from the metadata to the actual contents of Doe's computer. This metadata does NOT do this, although they claim to have actually downloaded at least one song & listened to the entire thing.

Make sure that they listened to the *downloaded* song, not merely some file with the same SHA1 that they never got from that person's computer. It does NOT establish that they even had the file just because the sharing network claimed they did and they already had a copy to listen to.

TreadingThroughEntropy said...

I don't know that this will help you any, but you might consider pushing to identify what version of Limewire the John Doe was using. More recent versions have Bitzi metadata lookup, and at some point Limewire integrated iTunes support.

My work environment is a college with 550+ laptop enabled students. As students start arriving in studios, iTunes will automatically start inventorying playlists and libraries for those who have the software set to share. During peak times as many as fifty laptop's files will be listed. If Limewire aggregates this information as part of it's iTunes support, App. A may in fact be a list of music on other machines.

Lev Abalkin said...

I don't have any special competence: IANAL, never used p2p, but I used computers long enough that you might find my guesses useful.

It appears to me that Exhibit A is generated using information freely available on the public Internet. My only knowlege of LimeWire comes from a wikipedia article here. According to that article, it does use Gnutella peer-to-peer protocol. It also uses SHA1 algorithm to "ensure that downloaded data is uncompromised", so I would not be surprised if SHA1 values were made available by the network.

Here is my guess of how Exhibit A was generated. They used undisclosed software to connect to a public IP address (I am not sure port 6349 has any significance). At the time when they connected a computer associated with that IP address was running LimeWire and was advertizing the contents of the "shared folder". The metadata listed in Exhibit A is not necessarily stored within each individual file, it is just information provided by the LimwWire node to advertize what it makes available.

I would take Exhibit A and the affidavit very seriously. It is very likely that this is not a fake, but a competent evidence and they do have all the necessary chain of custody documentation to prove that copyrighted material could be downloaded from IP address on 9/1/2005 at 12:10:38 AM EDT.

It appears to me that not only they obtained a listing of available files from that IP address, but also downloaded a sample of files and found copyrighted music inside. Thus, assuming that hired competent experts to do the work, they will have a proof that copyrighted material was served by a given internet node on a given day.

It remains to be seen if they will be able to associate an IP address with a real person responsible for infringement.

The IP address appears to belong to Mediacom Communications Corporation, which is an ISP. As someone already commented, this address is probably in rotation and handed out by DHCP to different computers in different times. It is not clear if Mediacom will be willing and able to determine which one of their customers was using that address on January 9.

Assuming the worst, there will be a proof that Doe 8 indeed was the customer with that address at that time, but unless he or she comes forward with a confession, I don't see how anyone can prove that he/she controlled the computer in question. The computer could be a zombie, indeed used for illegal purposes, but Doe 8 was the victim, not the offender.

Good Luck!

Lev Abalkin said...

You should definitely push for full disclosure of the investigative techniques, but RIAA will probably stonewall your requests untill the last possible moment and will try to overwhelm you at trial.

I would recommend that you find someone who can reproduce RIAA results. All you need is to set up a closed network and install p2p software. You don't need to use real songs, but I guess as long as you don't connect to public Internet it will be ok to "rip" a CD and place the result in a "shared folder" (IANAL, you are). Once you've done that, you will know exactly what the "crime scene" should look like.

Lev Abalkin said...

Ok, I did my homework on the Gnutella protocol and the Exhibit A does make sense. According to Meta Information Searches on the Gnutella Network by Sumeet Thadani, Lime Wire LLC, "Each file in a user’s library can be associated with multiple sets of meta-data tags ... – each of which is an XML document (a document is thus a set of related information about the file)."

The entries in the Exhibit A appear to match the Schema for the "audio element." Note that the genre names that are clearly arbitrary follow the Gnutella specification. I would guess that RIAA experts used a perl script to parse XML stream and produce a text "UserLog". The document does not contain any clues as to what scripting language they used. I said "perl" because it would be particularly easy to write this kind of program in perl.

Dogeron said...


Sorry for the red herring in earlier has to try ;)

I had read an article that one of the hashing algorithms had been comprimised, but couldn't remember where.....I've now found the article here.....

I leave it to the legal eagles to decide how best to use this........

Hope this helps.


Ausage said...

I am not a lawyer and I look at this case from a distinctly Canadian, not US, point of view. That said, it appears to me that this case carries remarkable similarities to the RIAA case in here that was "Dismissed without prejudice" by the Federal Court of Appeal in Canada.

First, there is no specific allegation of a crime or torte. Nowhere, does anyone testify that "I connected to John Does computer, I downloaded CopyOfSongA, I listented to CopyOfSongA, CopyOfSongA was an infringing copy of Song A (copyright certificate attached).

The declaration of Johnathan Whithead is mostly hearsay. He states that he has personally directed, supervised or reviewed the investigations, but no where does he indicate direct knowledges of the facts.

Exhibit A provides a list of the songs Doe #8 is alledged to have downloaded and made available for distribution, but no where does he state how Exhibit A was created, by whom with what program, or when it was created.

He states that songs were downloaded, but again omits to mention which songs, who downloaded them, who listened to them, when, and from which IP address.

Mr Whithead discusses at great length the "Technical Analysis" of the meta data, but again fails to indicate who the "expert" making this analysis was, when it was made, etc. Therefore a hearsay analysis of hearsay data.

I hope this helps...

Another note from the Canadian case. Several of the ISP's involved made strong arguments that the lack of timeliness of the data cast into doubt the accuracy of any information which might be removed from their log files, possibly endangering the privacy of third parties.

jaded said...

A couple more observations and then an overarching question.

The Gnutella network, and its specification, is not something that is done under the auspices of the GNU organization. They are careful to point out that they have their own network (GNUnet)and a set of truly open-source clients that use GNUnet. Further, the entities that attempt to maintain order on the Gnutella protocols clearly state that: 'The XML spec should not be integrated. We just specify where there is XML. The XML format that limewire uses is not a part of Gnutella. Anyone can use any XML format.' Therefore, any use of XML metadata under the Gnutella network appears to be solely up to the discretion of the individual client provider. The Gnutella protocol standards include a 'Browse Host' function that, again depending on the client, is capable of returning XML metadata for a shared folder on a clients 'puter with a specified IP address using a series of Gnutella messages. It would be a relatively simple matter to write a script (as suggested above) to take that data and create a 'log' (inappropriate use of the word as discussed above). It would appear that getting the data can be done quite readily and people shouldn't get hung up on the GNU appellation.

Question is: In this digital age, where anything and everything can be crafted/created/counterfeited/'shopped'/etc, what constitutes undeniable/incontrovertible evidence that something that one is alleging is so? It seems to me that this is more of a legal question than a technical one.

Alex H said...

@ Zi

Regarding the metadata, I had some issues and can't find the links I made to the files.

This however is the link to one of the files listed near the top of Exhibit A:

The last bit if the URL is the SHA1 hash you can go through the list pretty quickly by just copying the SHA1 hash from Exhibit A and substituting it in the URL.

Regarding the SHA1 duplication, buy two copies of the same CD. Rip and encode one of them, then use the same tools to rip and encode from the second one. You can try it on two different computers if you like, but provided you use the same tools on the same settings, there is a good chance you'll create two files with the Same SHA1 hash.

raybeckerman said...

Great job, guys!!!!!

On behalf of all those folks out there being pushed around by the RIAA, thanks for your help in fighting back.

raybeckerman said...

By the way, if the judge sets an oral argument date, I'll post it in the directory of upcoming court dates. It would take place at 40 Centre Street in Manhattan.

Court proceedings are open to the public, but be sure to allow extra time to go through the metal detectors.

Oualline said...

I am a expert computer engineer currently working for
St. Bernard Software. We make systems to filter
the Internet to control the user of P2P file sharing
(among other things).

Metadata is a defined as data which describes a data set.
In music it defines things like the name of the song,
artist, the type of music, who made it, and other

There are other types of metadata around. One type which
is familiar to most people is the card catalog information
available at your local library. For example, for every
book in their collection, the San Diego County Library
system publishes metadata containing the following

Title of the book
Publisher's name and location
Date published

All of this metadata is available on line to anyone which wishes
to download it from the Internet.

Even though all this data is available on the Internet the San Diego
County Library is not engaged in massive illegal copying of books.
Metadata and actual data are different things. Being able to get
a card catalog entry on-line does not mean you can download the book.

Now the San Diego County Library does not have a "share" folder. Instead
it has a series of building called branch libraries in which you'll find
copyrighted books. You can even go into a library and walk out with one
of these books. However, to the best of my knowledge this activity is
perfectly legal.

The library also maintains a "UserLog" (although they don't call it that) showing
who checked out a book and when. However, even the existence of this "UserLog"
does not mean that the library is engaged in massive copyright violations.

In paragraph 6 of Mr. Whitehead's declaration, he appears to conclude that because
there are a large number of files in Doe 8's share folder, that Doe 8
must have obtained them illegally. There are more books in a single branch
of the San Diego County Library than in Doe 8's share folder. Using Mr. Whitehead's
logic, all of these books must have been obtained illegally.

In paragraph 8 of Mr. Whitehead's declaration, he states that "The metadata shows that
may of the files in Doe 8's "share" folder have been downloaded from other users
without authorization...." The metadata Mr. Whitehead described contains no
information concerning the legality of illegality of the files. The MP3 standard
contains no defined metadata specification to be used to hold the legal state
of the file.

In paragraph 9 of Mr. Whitehead's declaration, he seems to imply that the
some of the songs contain metadata which indicates that they came from
sources known to supply pirated songs. The San Diego County Library
contains many books which were published in China. China is famous for
ignoring copyright laws and producing pirated books. If we follow
Mr. Whitehead's logic, we must conclude that every book in the San Diego County
Library which came from China is illegal.

Also in paragraph 9, Mr. Whitehead implies that the software used to make the
sound files indicates that they were pirated. The San Diego County Library
has some books which were produced on a copy machine. Copy machines are
frequently used by pirates to illegally copy books. Therefore according
to Mr. Whitehead's logic, all books in the San Diego County Library which
were produced on a copy machine are illegal.

In paragraph 10, Mr. Whitehead states that some of the metadata in the "share" folder
contains the name of people or organizations accused by the FBI of piracy.
As an attorney Mr. Whitehead should know that there's a vast difference between
an accusation and a conviction. Also, I'm not a lawyer, but since when is a
press release from the FBI considered evidence?

In an effort to combat piracy, companies are inserting fake songs into
file sharing networks. These songs contain the same metadata as real
songs but the actual music is white noise or other useless sounds.
In order to be mistake for the real thing it is imperative that the
fake song's metadata look just like a real song's metadata. These fake
songs are designed to be downloaded and shared. So it is entirely
possible for Doe 8 to have a "share" folder full of mostly fakes
and void of any infringing material.

Another form of fake file is a sales pitch disguised as a song. Frequently
someone will take an audio file describing how to apply for a low mortgage
rate or how to buy certain pills cheap and add metadata to it that makes it
look like a real song. Does Doe 8 have a collection of songs or a collection
of sleazy advertisements. You can't tell from the metadata.

Mr. Whitehead's declaration leaves out an important fact. Legitimate files
can have the metadata seen by Mr. Whitehead.

Metadata is like the cover of a book. The title, author, publisher, and other
information on the book can describe the book itself. But it doesn't have to.
You can put the same cover over a blank book, a public domain book,
a advertisement, or a book of gibberish. If you don't actually look
into the book you can't tell what's inside.

In other words, Mr. Whitehead states that the music in the "shared" folder
uses the same metadata tags that are used by pirates therefore Doe 8 is
a pirate. He omits the fact that these same tags are used legitimately.
Just because some people who use these metadata tags are pirates
does not make them all pirates.

Finally, having music that someone intended to share does not mean that
you intend to share it. I have downloaded many copies of Linux using
the "bittorrent" file sharing software. All of these copies were
intended to be shared by their producers. My copies I keep in a private
directory on a system hidden from the Internet and I share with no one.

Oualline said...

I should point out that the comments I just posted are personal and do not reflect the opinion of the company I work for.