Thursday, June 05, 2008

Study finds that record company methods for detecting infringement are inconclusive

As reported by the New York Times, an academic study out of the University of Washington has found the record industry's methods of detecting infringement among BitTorrent users to be "inconclusive". I would appreciate input from the technical community, in our "comments" section, on the extent to which these findings would be applicable to MediaSentry's supposed "detection" of infringement among FastTrack users, as opposed to BitTorrent users, since every single lawsuit of which I am aware involves the FastTrack or Gnutella protocols, rather than BitTorrent. Thanks to my many friends who alerted me to this article and study. -R.B.

The Inexact Science Behind DMCA Takedown Notices
By Brad Stone
June 5, 2008
New York Times Technology Section

A new study from the University of Washington suggests that media industry trade groups are using flawed tactics in their investigations of users who violate copyrights on peer-to-peer file sharing networks.

Those trade groups, including the Motion Picture Association of America (M.P.A.A.) Entertainment Software Association (E.S.A.) and Recording Industry Association of America (R.I.A.A.), send universities and other network operators an increasing number of takedown notices each year, alleging that their intellectual property rights have been violated under the Digital Millennium Copyright Act.

Many universities pass those letters directly on to students without questioning the veracity of the allegations. The R.I.A.A. in particular follows up some of those notices by threatening legal action and forcing alleged file-sharers into a financial settlement.

But the study, released today by Tadayoshi Kohno, an assistant professor, Michael Piatek a graduate student, and Arvind Krishnamurthy, a research assistant professor, all at the University of Washington, argues that perhaps those takedown notices should be viewed more skeptically.
Complete article

The underlying study: "Challenges and Directions for Monitoring P2P File Sharing Networks – or – Why My Printer Received a DMCA Takedown Notice" By Michael Piatek, Tadayoshi Kohno, and Arvind Krishnamurthy (PDF)

Commentary & discussion:

Electronic Frontier Foundation
Linha Defensiva (Portugese)

Keywords: digital copyright law online internet law legal download upload peer to peer p2p file sharing filesharing music movies indie independent label freeculture creative commons pop/rock artists riaa independent mp3 cd favorite songs intellectual property


matt said...

On first glance, it appears that the monitoring methods for BitTorrent are very different from the monitoring methods used in these cases.

It appears that DMCA notices over suspected BitTorrent users are generated without downloading anything from the user. They just look at the list of IP addresses the tracker says is participating. My guess is that the process is mostly automated, so it's no surprise the process generates so many garbage complaints.

In the cases discussed on this blog, involving Gnutella, eMule, FastTrack, etc., the RIAA/MPAA or their agents claim to have downloaded and verified at least some of the content. Whether the verification was performed by a human is unclear, since MediaSentry is uncooperative with discovery requests, but it probably is. That would distinguish it from the DMCA spam mill that is the subject of this study.

Nohwhere Man said...

Having given the paper a quick read, IMHO it was fairly well written. Nothing jumped out at me as bad science, the descriptions and technical conclusions all make sense, and are quite likely defensible.

It would be interesting to set up a 'honey-pot' node (using maybe a printer or a network monitoring box), wait for a takedown notice, and say "see you in court". It would be even more interesting to see the discovery request for the hard disk of a printer.

(30 yrs of messing about with computers)

Justin Olbrantz (Quantam) said...

I was considering submitting this story to you. I posted it on my blog a couple hours ago, with the following commentary:

This was actually a study I've been wanting to see done for some time. The other study that I think is very important but has not yet been done is to determine empirically how, on a system like eDonkey, where users search all peers for a certain file, the number of requests a single computer gets for a single file varies with the popularity of the file. The basis of this investigation is the claim by RIAA and others that users could be sharing thousands or millions of copies of each copyrighted work, therefore constitutional limitations on civil damage awards do not apply.

Clearly files that are popular (e.g. the latest hit song) will be downloaded more (in total) than files which are unpopular. But does this mean any single computer will upload popular files significantly more often than unpopular files? I believe the answer is no, for the reason that because the files are more popular, not only are they downloaded more, but they are also available from more computers. In theory, the increase in demand is accompanied by a proportionate increase in supply, keeping the ratio invariant regardless of demand. According to this belief, I have argued on forums (one example here) that most of the people the RIAA has sued have, according to simple probability, not uploaded more than a single copy of each file, on average (so about $0.70 of damage per file, if you assume 1 download = 1 lost sale, which itself is highly suspect).

Anonymous said...

I'm not even close to an expert on the FastTrack network's protocol. It's a proprietary protocol, so I wasn't able to find much on it; you'll want to try to have the inner workings of the protocol clarified somehow. What's posted below is based on what I was able to find on the 'net about the protocol.

FastTrack reverse engineering doc

It seems that when performing a search on the FastTrack network your local KaZaA client will send its request to a supernode that aggregates the available files of all of its clients. The search is then performed on that supernode, and passed on to other supernodes that it knows about; at no point during a search operation does your client actually connect to the peers that are reported to have files available.

Furthermore, it looks like the only time that your client connects to other peers is when its trying to download a file from said peer.

Of particular interest is that it looks like packet types 0x20 and 0x21 are used by your client to request a list of files from a peer. The important thing to note about this packet pair is that it's between you and the supernode; at no point does it contact the peer you want the file list of.

Based on this the cached information about what a client is sharing is discarded as soon as the client disconnects from the supernode. However, without knowing the frequency of keep-alive pings, it's impossible to know how long it takes a supernode to identify that a client is no longer connected. If keep-alive pings are only sent every X minutes, then the stale data about a client will stick around on a supernode for at most X minutes after an unclean disconnect (killing off the KaZaA program without giving it a chance to properly close its TCP connections -- such as by a power outage, turning off your computer, your router/computer crashing, etc) from the supernode.

So, with that all in mind, the "mistimed reports" scenario of section 4.2 of the UWashington study is certainly plausable; especially in a University residence environment where DHCP leasing of IP addresses could potentially quickly reassign an IP when someone disconnects. How plausable depends very much on the amount of time between keep-alive pings from supernodes, though. The plausibility of this scenario is further increased if the RIAA investigators don't actually try to download the offending file from the peer; if they're just testing whether the IP address is still around via a ping that tells them nothing about whether the person's connected to the FastTrack network.

This all makes knowing more about FastTrack very important. To wit:
- How long does it take for a supernode to identify that a client has disconnected?
- Is it definitely the case that you only connect to another peer when trying to download from them?

It's also important to know the RIAA investigator's methodology. At the least:
- Given that search results aren't returned directly from each peer, and that a peer's file list isn't sent directly from said peer (unless the peer is a supernode and they connected directly to it), how to they verify that an IP address is sharing files? Ping/traceroute? Connecting through KaZaA and downloading?
- Do they record whether an offending peer is a supernode or an ordinary client node? If it's a supernode, and they got the file list directly from it, then the UWashington scenario is pretty much impossible. But, if it's an ordinary node then it's possible.


Rick Boatright said...

Saddly Ray, this article has nothing at all to do with the various Gnutella varient (emule, fasttrack, limewire etc) p2p programs.

In the bittorrent p2p programs any one computer may well be part of a "swarm" of computers and may, or may _not_ actually participate in the act of being downloaded from.

On the other hand, the gnutella varient programs estable a one-to-one relationship between the downloader and the uploader.

It's totally different technology.

Alter_Fritz said...

well, it actually does not need expensive and dryly serious studies to come to such an conclusion.

Some oversimplified picture
found via let one reach similar conclusions too, I guess. ;-)

Justin Olbrantz (Quantam) said...

"It would be interesting to set up a 'honey-pot' node (using maybe a printer or a network monitoring box), wait for a takedown notice, and say "see you in court". It would be even more interesting to see the discovery request for the hard disk of a printer."

LOL @ setting up a honeypot and then suing for filing false DMCA notices. That would be beyond godly. And I bet it would very quickly bring an end to mass-mailing of DMCA notices.

kdsde said...

Those (upcomming) lawyers among you readers of this blog lhat defend the real innocent dolphins like Mrs. Andersen, Mrs. Santangelo, Mrs. Lindoralready these days against MAFIAA while they are still reaping in their "extortion"money from the p2p system of choice in use in 2004-2007 should be aware of:

Something that might be extremely noteworthy for you lawyers and could become important information in litigation in 3 or 4 years from now for alleged wrongdoings in 2008 by your "then clients":

Oh, and what also should be noted with respect to false positives of printers stealing Indiana jones;

The tracker software (open tracker) that is used by one of the largest Trackers in the world, is known that it can be setup to report bogus IPs to peers that are not actually in the swarm and doing ANYthing even remotely resambling to copyrightinfringement.

The german programmer of that software calls that feature “Perfect Deniability”

Everybody recieving a takedown notice should be made aware of that “defense”

Maybe has switched that feature to “on” too?
— Posted by kdsde

Anonymous said...

The study itself shows through proof that in the real world that IP addresses can easily be spoofed. That you can appear to be some other device on the network with little effort. This concept is applicable to more than just the BitTorrent discussed in the article. By extension, although not covered in the article, all you have to do to make your own computer appear to be that of your despised dorm rival down the hall is to reset your MAC address (easily done for most network adapters, and doable through having a simple router if you can't figure out how to do it on your own hardware) to match his MAC address. To your university, your computer now appears to be his computer when IP addresses are assigned and logs written. Spoofing of IP addresses is no longer some theoretical concept that: Yes it can happen but how likely is it really, you Honor?

Justin Olbranz, you make the excellent point that, in fact, the record companies may have suffered no damages at all by downloading because there is no indication at all that any download has ever equated to any lost sale. Other factors can explain completely the downturn in record sales (the rise of DVD games, the drop in quality of the music being sold, the insanely stubborn high prices of music CDs, the downturn in the economy, all competing for a limited pool of spending) without blaming all, or any, of it on filesharing.


StephenH said...

I beleive that the RIAA and MPAA need to learn that IP address logs do not identify people the same way that DNA does, and the idea of having automated bots indentify canidates for DMCA notices is bad, because they can easily reach innocent users.

This paper clearly shows the errors RIAA and others made when sending takedown notices, and how they can easily reach innocent victims. I personally beleive that one should have at least some recourse rights if a DMCA notice reached an innocent victim. Personally if it were me, I would abolish the DMCA altogether.

The DMCA has done nothing positive for technological innovation.