For a generation, federal sentencing policy-makers have been preoccupied by the ideal of national uniformity — the ideal that federal judges in Milwaukee should sentence the same as federal judges in Maine and Miami. I’m a long-time skeptic of this ideal; since most of the impact of most crime is local, why shouldn’t local needs and values determine the punishment? But even I am troubled by judge-to-judge disparities within a single federal courthouse. The random assignment of a case to one judge instead of another should not govern the punishment.
Although there has been a great deal of anecdotal evidence of such local disparity, it has been very hard to quantify because of a longstanding agreement between the U.S. Sentencing Commission and the federal judiciary that blocks the release of judge-specific setencing data. However, thanks to a great deal of painstaking effort by the Transactional Records Access Clearinghouse, it is now possible to analyze the sentencing practices of individual judges.
Earlier this year, TRAC made waves with a public announcement of which districts had the greatest inter-judge disparity. However, TRAC’s methodology was sharply criticized, and with good reason. More recently, TRAC published a new and improved version of its report at 25 Fed. Sent. Rep. 6 (2012).
So, which cities have the greatest disparities?
According to TRAC, the five cities with the largest drug sentencing disparities are: (1) Atlanta, (2) Chicago, (3) Dallas, (4) New Orleans, and (5) Philadelphia. The five cities with the largest white-collar disparities are: (1) Chicago, (2) Baltimore, (3) Portland, Oregon, (4) Atlanta, and (5) Grand Rapids. Chicago and Atlanta, of course, stand out as the only two cities on both lists.
These rankings are based on a simple comparison of the median sentences of the most severe and most lenient judges in the courthouse. For instance, the median drug sentence of the most severe judge in Atlanta was 144 months, while the median of the most lenient was 54 months. The difference of 90 months was sufficient to put Atlanta at the top of the drug disparity list, significantly ahead of second-place Chicago at 72 months.
The new version of the TRAC report is published alongside some very thoughtful and perceptive commentary by Paul Hofer (25 Fed. Sent. Rep. 37). (Paul is a sentencing policy analyst for the Federal Public Defenders and — full disclosure — a fellow editor of the Federal Sentencing Reporter.)
As Paul points out, TRAC’s ranking methodology still seems unsatisfactory. In essence, TRAC’s approach only pays attention to two judges in a courthouse, the most severe and the most lenient. This might be okay if every federal courthouse were like Milwaukee’s, with its four Article III judges. But bigger cities may have several times as many sentencers, which means that focusing on just two may give a very misleading picture of what is happening in the courthouse. For all that can be gleaned from the TRAC report, it may be that all of the judges in Atlanta but one are sentencing in a consistent fashion; that one outlier, however, would be enough to push the entire courthouse to the top of the disparity list.
The ranking methodology, moreover, is skewed against big cities — the more judges there are in a city, the more likely it is that there will be one extreme outlier. For that reason, it is not suprising that places like Atlanta and Chicago stand out on the disparity list.
It is more curious that Grand Rapids shows up on the white-collar list. One imagines there is an interesting story behind that. Then, too, bigness does not doom a city to a high rank; Los Angeles is only tenth on the drug disparity list, and New York does not appear at all in the top ten.
More useful than the city rankings may be TRAC’s overall assessment of local disparity in the federal system. TRAC found that 61 percent of federal courthouses had statistically significant differences in the median sentence between the most and least severe judges. This is a notable finding that does move the disparity debate beyond anecdote.
Still, the finding does need to be evaluated with caution. For one thing, most federal courthouses were excluded from this bottom-line result because they did not have at least two judges who met the qualifying criteria (e.g., sentenced at least fifty defendants between 2007 and 2011). Obviously, there cannot be measurable disparity unless there are at least two judges to compare, so TRAC’s decision makes sense. But policy-makers should nonetheless bear in mind that the 61 percent figure is painting an incomplete picture of the federal system, leaving out many smaller courthouses (such as the one-judge Green Bay courthouse here in Wisconsin) where disparity is not a problem.
Additionally, policy-makers should bear in mind that the 61 percent reflects all courthouses that meet a rather minimal disparity threshhold. Again, larger courthouses are disadvantaged in TRAC’s methodology. A single outlier among a couple of dozen sentencing judges in a courthouse, while certainly regrettable, may not be a sufficiently serious problem to warrant significant, systemwide reforms.
Paul Hofer also highlights a couple more caveats. First, prosecutorial charging and plea-bargaining decisions obviously have a big impact on sentences, and these decisions are not necessarily made in consistent ways from judge to judge. For instance, if Prosecutor A thinks it is a pain to try cases in front of Judge X, but enjoys appearing before Judge Y, Prosecutor A may be more inclined to offer generous plea deals in her cases with X than with Y. The resulting disparity would be a concern, but it could not be effectively addressed (and might even be exacerbated) by cracking down on judicial sentencing discretion.
Second, to quote Paul,
[T]ests of statistical significance assume independent, random assignment of each subject to each condition. Although cases are assigned randomly, sentencing data are for defendants. Thus sentences are not the result of strictly random assignment. If a judge happens to draw a large, multidefendant case that results in markedly more or less severe sentences than average, what appears to be a judge effect may actually be the result of this case assignment fluke. (41)
Paul’s article also includes some broader reflections on the politics of disparity data. Unfortunately, purported new demonstrations of disparity typically become occasions for politicians to demand new restrictions on judicial discretion and stronger requirements that judges adhere to the sentencing guidelines. However, the politicians usually ignore the role of prosecutor-created disparity, which (as suggested above) may actually be made worse by some judge-focused reforms. Moreover, the politicians also typically assume that guidelines compliance is necessarily desirable. There are, however, some things that may be worse than disparity, including excessive severity. Thus, for instance, I find it hard to fault the judges who began to use their enhanced discretion under United States v. Booker (2005) to impose below-guidelines sentences in crack cases, even at the cost of greater judge-to-judge disparity. Those judges were softening what Congress itself recognized to be a major injustice five years later when it passed the Fair Sentencing Act of 2010. Good sentencing requires a balancing of various considerations; there are no absolutes, no even uniformity within the courthouse.