Over the past two weeks I’ve made much progress on the next stage of my sfadb.com site — the compiling of all awards references and citations references into overall scores and rankings. One product of this effort will be Top 100 lists of SF novels, and fantasy/horror novels: ultimate lists based on data crunching of thousands of awards records and citation records. (These aren’t actually my primary goals; I’m more interested in developing some cool timelines, of these top 100s spread across the decades, and more elaborated timelines of top ranked books and short fiction within each calendar year.)
These are concepts I’ve had in mind for nearly 20 years… though they began as ideas about short fiction, to expand the crude rankings of reprint statistics of stories in anthologies, with data about awards. (Bill Contento’s Locus Index on the one hand, and Aurel Guillemette’s 1993 book on the other hand.)
Almost two years ago, in January 2016 (http://www.markrkelly.com/Blog/2016/01/25/syllabuses-and-sfadb-com-rankings/), I described how my approach would be similar to the Open Syllabus Explorer project, that compiled the syllabuses of thousands of colleges to see which titles were most mentioned. As I’ve worked my own project, I’ve decided to dismiss the fourth step I described then: I will not adjust the highest rankings in my ranking to 100%, but will instead display the actual rankings, by percent, of actual scores (or points) against possible scores (or points), given the year any particular book was published — and, as it’s developed, the awards or citation sources that any particular book was actually eligible for.
And I’ll actually display a tickertape of abbreviated links, for each title, of all potential scorers and actual scorers. If some top ranked title, say, DUNE, gets only a combined percent score of 75%, then what were the 25% others who didn’t award or cite it? You will be able to see.
The process of developing these rankings has involved quite a bit of back and forth about the significance of ordering steps, about compiling points, adjusting them against potential points, and so on. One firm decision I’ve made is that the universe of books to be ranked must be divided by genre. Science Fiction vs. Fantasy/Horror. Mostly this is because the many awards and references for citations are heavily weighted to SF. To rank every book against all sources of awards and citations that I’ve compiled, would place very few fantasy/horror novels in any combined top 100. It makes much more sense to consider the genres as two separate realms. So far, I’ve worked steps for ranking SF titles first. I’m thinking perhaps that the F/H titles might be scaled in some sense so that a merge of all genres would make sense. And similarly, eventually, for short fiction. (I have in mind, eventually, producing pages of rankings for individual authors, that would merge book and short fictions rankings, to indicate for each author what are their truly best regarded works, of any length.)
Making much progress, as I’ve said, and this progress entails new insights about the process and what it means, almost daily. It helps to take long walks in the woods. Over the past two days I’ve developed steps for integrating ‘series’ titles into the overall rankings — most prominently, The Foundation Trilogy, and The Book of the New Sun. These are working out; examination of the results are intuitively correct — but I need to make sure they are statistacally correct, given the mass of data I am crunching.
As of today, the process for integrating the thousands of awards records and citation records I’ve compiled over the past 20 years, into a set of definitive top 100 lists, is as follows.
1 tallying raw counts of number of awards, number of citation sources
2 weighing total points from awards and sources, where points are weighted by significance of source
3 scoping separating sf and fantasy; not scoring many books against awards they’re not eligible for
4 scaling compute maximum possible points by genre and category
5 merging merging multiple book records into a single series record, and allocating actual and possible points appropriately
6 scoring calculate percentage of actual points to possible points
7 ranking sort records in descending order by score
8 tracking expanding records to examine which sources do and don’t contribute to every point total, for output onto the site
I think I can finish final rankings of SF and F/H books within another couple weeks, certainly by the end of this year. But doing the same for short fiction is a separate task, for next year.