Trust, But Verify - My Take on Sources
Original Date of Publication: Tuesday, April 13, 2010 4:32:53 PM
I always sound defensive when I talk about this, so let me make one thing clear - I am not any smarter or any harder working when I research than anyone else. I have experience, though. I have been burned so badly on sources, and I have learned a very harsh lesson I have never forgotten.
My first SOM projects were in both football and baseball, and they went pretty well. For 1957 I had a compilation right from the HOF that was solid out of the box, and for the 1983 USFL I had their guide and in order to get special teams materials I reconstructed team summaries from the box scores. There were only 72 boxes to enter, so this was easy. There were not many mistakes. Most of the stuff was complete, and therefore simple.
Then came the greatest mess up I have ever made, my first attempt at the 1992 Japanese Season you see here later in the blog. I had good stuff for 1976, but for 1992 there was one mistake in the the league summary, a listed batting average for the league of .261 instead of .251. Since I start to frame a league using the averages, this little typo was devastating to 1992. I got to the very end, and I had four hundred batters and a hundred and sixty pitchers all individualized, but try as I might, when I ran the season it would not work. The normalization was off. I could not get the teams to balance.
In baseball, you can approximate sabermetrically the defensive outcomes for teams from their pitching lines, sum them, and compare these to the like summary of the total offensive numbers pretty easily. And the key thing here, Fred's First Law of Doing Baseball Teams is that the Sabermetrics Have to Match (and the same is true in football) in order to proceed. No league can be carded from incomplete data that will work (compile tabulations correctly) when you fire it up as a simulation.
I did not do this for 1992 at first, only after I had chucked four months of solid work and started over did I reconstruct every team and then I found that one typo - ONE typo (!) that had killed the season’s simulation space. I watched it compile replay after replay that failed. It was a harsh lesson, and 1992 waited another two years for me to finish it (Lenny Durrant would remember this.)
I have a list of source types that I have had posted, and I will revisit it here, in order to clarify how I think about data. Simply put, I don’t have the time or energy to revisit every mistake pro-football reference has made or will make every time someone gets a game result they don't like and they cannot confirm it there. Their data constantly changes. This sounds defensive to customers and I know that, but there has to be some “change control” process here, or this interaction will not work well for both sides.
But on the other hand, and the Brammer guys know this after the Great Gino Marchetti Mistake of 1958, where my head rotated a full 360 degrees in front of the whole tournament field when I saw the cards for the first time, I will strike with great force on obvious typos or key mistakes that are confirmed. I think they were wondering if I was actually going to convert all ~100 kg of my mass to pure energy right there in that room.
Here goes:
*Game accounts I can watch or listen to (nothing beats film);
If I can see players on the field in a game film from a certain date, or I have a good radio game account tells me something, I take that as read. Nothing else is as good as that.
*Game accounts from a coach;
Coaches notes tend to be well researched. Coaches are also much better sources than players.
*Anything from the HOF;
If I request something from them and they say - this is the best we have, that source is going to tend to trump any other materials I might have. That's not to say they cannot make mistakes, though. For many years they listed the record team offensive leaders in passing and they had the Rams 1950 figure of 3709 yards and over 300 yards per game as the record. They thought this was the net figure. Well, that is the LA gross passing figure. The Rams actually had, net, I found in a statistical guide from the era (the fifties were a boom time for written promotional annuals) lost 151 yards on sacks. Even the HOF was surprised when I pointed this out to them.
*The Sporting News or a good PFRA article
The Sporting News has a very high standard for stats, usually. Good PFRA materials can be useful, mostly as background. Stats can come from good or bad sources there.
*Local newspapers
Just under TSN, as they are not always standardized on how they report materials.
*Listed articles from inserts, programs, or lineup cards;
Since these are written BEFORE the game is played, lineups reported there may NOT account for injuries or late scratches. A lot of times these can be good for raising questions, looking for alternative positions, and confirming certain items you have seen elsewhere.
*Team summaries (it depends on the team, Cleveland or the Giants definitely, the Cardinals not so much);
There are just some teams that are very sensitive to their history and maintain a very high standard in protecting it. They may have dedicated staff to assist people who ask about their history. The Miami Dolphins and Cleveland Browns are relentless, both teams have collections of materials that have been written about their teams and franchises. The Giants have an archivist, and so do the Patriots. Some do not seem to cherish their history as much or have moved so often their materials have been lost. Arizona is in this camp, although I think they are getting better. Team sources are usually to confirm and to me they do not act as a primary source.
*Encyclopedias, Media Guides, Yearbooks;
I can give some good and bad examples of encyclopedias. On that Forum right now someone is asking about the '63 Chargers. My Neft and Cohen has no blocked punt data for the entire 1963 AFL. But it is highly unlikely an entire league punted over five hundred times in their third season without a blocked punt. My point is- if other sources parrot that data, there is a confusing element to the story. I see this and I will look elsewhere for that data, or assume if there is none that every team had a blocked punt.
For the WFL my primary source is Maher and Speck's WFL Encyclopedia, but I have had to correct it. It is not bad, and for the data they had, it is probably excellent. It whips the more "professional" USFL Guides. But if I had edited it, I would have checked the simple things- do the compiled team offensive numbers match the totals of all the quarterbacks? Do the points for and against each team equal the sums of their scores? These are stats 101 items and on occasion they come up short.
-At one point, the passing stats after a simple cleanup came up 59 yards short of the defensive end over 19,000 plus passing yards, comparing the offense to the defense. Boy, I searched awhile for that. It turned out this was the exact passing yardage for Jim Ettinger, third QB for the San Antonio Wings. I looked at the game accounts and he was a backup who came in for one last drive in a blowout game against the Charlotte Hornets. This was missed in the summary; they only compiled Johnny Walton's starting numbers. That kind of a mistake - the game's final drive, in garbage time- is very easy to make and entirely understandable. I've made it in my own writeups. In finding it I learned something about both teams.
-1975 Tommy Reamon is listed in this guide as having 144 carries for 278 yards for an average of 1.9 yards per carry and 5 TD with a longest of 44 yards. That might be the most ineffectual line ever posted by a starting halfback for a winning team, and what a strat card THAT would be.
Since the totals for the running backs for 6-5 Jacksonville with this figure come up exactly 200 yards short of the team figure, it is not hard to see that this line ought to be 144 carries for 478 yards. You check the game accounts and sure enough, the summary figure is off. Too bad for George Mira, the starting QB, who would have had a Manning or Brady like card to compensate for that, carrying a winning team with no halfback.
-The summary line for 1974 Birmingham has one PR TD but Willie Smith has two in his individual line.
What I am saying is even good summary data needs to be checked on occasion. Make sure the offense matches the defense. Trust, but verify.
*A book written about the season (although The First Season and Instant Replay were excellent Packer accounts).
These are better for background (who started at what postion and then got benched, etc.) than as a primary source. Instant Replay was a play by play acccount, though, even without the Lions names, that's a pretty good source.
*Player accounts (which are generally worthless for gaming purposes because players are not investigative journalists and they sometimes confuse years and teammates);
I read a PFRA article about the 1950 Rams where the interviewee was Tom Fears. Fears would have been in his 70s and while he had some nice insights it took me about twenty seconds to figure out while the interviewer had asked specifically about the 1950 team, Fears had answered with observations from 1951 and 1954 as well. Some of the observations he made just did not occur in that season, some teammates should have come in later in the narrative.
Players played, and their discussions of the game and opponents are very valuable, but taken as reporters, I do not think in general the details matter as much to them as they would to us as potential design inputs.
*Internet sources, which may pull confusing and conflicting materials from any of the above with no attribution. I trust nfl.com on modern materials, maybe not so much on historical materials without checking another source.
Here we get to crux of the matter - the vast majority of people who are looking at the details start on the internet, with, say PFR, and end there, but these sources to me are execrable at best. I will - put simply - NOT - accept them as the final answer.
They are hand entered, often unsupported or poorly so, and they will pull information from any or all of the above without attribution, so there is no way to know how consistent they have been in their research. If you want proof just enter in the offensive data for all of the players from 1958. Now get the NFL's Official Guide and count the mistakes. We trip over this in checking cards, longest runs, dots, these get missed here.
I compiled playbooks from nfl.com last year and that is sometimes no better. They listed Bryant Johnson of the Lions as a tight end for half the season. He is and was a WR, he has many cards in the game as a WR. This was a mistake. The moment I see these kinds of inconsistencies I am going to pull up team blogs to settle disputes, find a newspaper account of the position struggle, or start watching those games myself. As good as they are they are not the final answer.
The final answer usually is a balance of information from more than one place. People who seek certitude may not like this observation, but it is my take on the situation as I see it. If you have an observation found from the bottom of the above list, it is good to also check to see if something from above it squares with the item.
Fred
Edit - 1963 San Diego Chargers
The question is usually "is this a mistake?" To me this means - was this entered in error?
It looks to me that the SOM card data tracks nicely from the team results vs known punts in Neft and Cohen. If you subtract the punter's results from the team results you get the blocked punts:
1963 AFL Team / Neft and Cohen difference / SOM blocks
Boston-2 (SOM 12 block)
Buffalo-1 (SOM 12 block)
Denver-3 (SOM block 11)
Houston-0 (SOM no block on card)
KC-1 (SOM 12 block)
NY-1 (SOM 12 block)
Oak-1 (SOM 12 block)
SD-1 (SOM 12 block)
So the 1963 Charger block result is not a "mistake". Clearly there are no typographical errors here. If Houston had a block - that could be a typo, in this context. Now whether or not the choice of that source (Neft and Cohen) was "mistaken" - that is another question. If it is, well, we have the same issue then for other seasons.
I know I have used similar results in a pinch for other fifties teams. I also know N&C is not perfect- if you tried to use the same trick for 1959 the punters and team results match perfectly, indicating no blocks by this method, which as I have said elsewhere is very unlikely. And I have the 1960 Record and Rules Manual ('59 data) and Philadelphia, Green Bay, and LA had blocked punts.
Lee Segall put it best - a man with one watch knows the time, the man with two watches is never quite sure. Sometimes you have to say - I trust this source and it is what I am going to go with. This may just be an area where there is going to be uncomfortable disagreement - complete and accurate old football and baseball data is by its nature a pain to get. But having done this for awhile, I'm inclined to trust a consistent approach.
I wish we could get to the point where people could trust those who are doing this work. That does not seem to be the way this customer base prefers to handle things.
Fred
1-16-2026
Addendum: you’ll hear me say something is a slowly changing dimension. As an example, your address is not a fact, it’s a slowly changing dimension. At the moment I confirm it, your address is a fact. But addresses change, you may not have the same address in six months and you may not have lived there ten years ago. I have to also stipulate if it’s your home address, work address, ship to address, etc.
Your birthday is a fact; it’s not changing. I don’t have to state any context alongside your birthday.
The problem is to make teams you have to settle on a set of facts, from different sources in some cases, and make the determination a season can be modeled. I have everything I need to do a 1974, 1975, or 1976 CFL season, except that sacks aren’t official in the CFL either until 1981. So unless I can find something from their Hall of Fame that can stand in for these numbers with some degree of reliability, that kind of project is on standby.
For something like sacks which are unofficial data prior to 1982 in the NFL any compilation is likely to be a slowly changing dimension, meaning it changes over time as more people might look at it. When I first did a 1972 Miami set for Glenn in 1989 (!) I used data right from the Dolphins media guide. I got 1978 Pittsburgh’s data from the Steelers. I see now looking in the game that these changed; it’s not a big deal because pass rush points allocations have some leeway anyway and those team totals in the game seem right. But dimensions like these always have a caveat- they were the best representation at the time the season was made.
If you can’t say this caveat, well- what happens if you base a determination from just an internet source, that is, some guy had x sacks when originally you had them at zero, and so you revise the season- but then the number gets changed back to zero on the internet a year later? This can happen; it has happened. You’d be chasing your tail with revisions because the source itself is unreliable. This is what change control means.
Fred (again)
