Sulla pericolosità degli azzeccagarbugli numerologi

Ho letto con sorpresa e poi orrore le dichiarazioni del presidente della Camera civile di Prato, Duccio Balestri, pubblicate ne «Il Tirreno» del 15/04/2020. Snocciolare numeri non è in nessun modo un segnale di averne compreso il senso e, più precisamente, le mortali conseguenze. L’avvocato Balestri confonde l’effetto con la causa: se il tasso di mortalità è rimasto comparabile alla media stagionale nel centro e sud Italia, questo è dovuto a tre fattori. Il primo è che il virus non si è diffuso in maniera importante al di fuori dei focolai del Nord. Come è stato dimostrato da un brillante scienziato con sede a Basilea, la semplice operazione di contare il numero di necrologi ne «L’eco di Bergamo» dimostra chiaramente come il tasso di mortalità a Bergamo sia di molte volte maggiore alla media. Detto in termini più comprensibili, in tre settimane sono morte tante persone quante ne morirebbero normalmente in sei mesi. Considerando il regime di quarantena, e dunque le ridotte possibilità di morire per altre cause (incidenti stradali e sul lavoro, eccetera), voglio chiedere all’avvocato Balestri se può gentilmente indicarci le cause della morte di queste persone. Il secondo fattore è strettamente legato al primo: il virus non ha causato molti morti fuori dai focolai a causa della quarantena. In poche parole, la quarantena ha molto probabilmente fatto sì che i focolai rimanessero tali e non si espandessero, provocando un incremento generalizzato della mortalità. Terzo: focalizzandosi sul tasso di mortalità nei primi tre mesi del 2020, Balestri ignora che le morti appaiono nelle statistiche con un ritardo di alcune settimane, ovvero il tempo che occorre per le persone infettate per ammalarsi e infine morire, spesso dopo giorni di agonia. Dato che fino al primo marzo il numero riportato di positivi era appena 1577 e i deceduti 34, è palese che il tasso di mortalità nazionale a tre mesi non risenta degli effetti del virus. Se invece si utilizza il dato settimanale scorporato per regioni e classi d’età, i numeri con cui Balestri sembra avere una grande dimestichezza mostrano in modo incontrovertibile che in tutto il nord Italia le persone, anziane e non, stanno morendo a un tasso almeno doppio rispetto al normale. Infine l’artificio retorico di ignorare i decessi con comorbidità (ovvero altre malattie pregresse) non merita un commento articolato, ma basti dire che pare un argomento da azzeccagarbugli fatto sulla pelle di chi ha perso i propri cari.
Ora che il periodo di quarantena cambierà e diventerà una nuova normalità fatta di mascherine e distanze di sicurezza, non dobbiamo in nessun modo farci ingannare da chi usa i numeri in maniera maldestra e irresponsabile. È dimostrato che questo virus uccide se non viene preso sul serio. Invece di dare adito a numerologhi, sarebbe bene dare spazio agli esperti di salute pubblica, che aiutino a istituire misure di contenimento e sorveglianza capillare. Sarà uno sforzo lungo ed estenuante ma ampiamente nelle nostre possibilità. Basta non dare ascolto a chi si improvvisa esperto e non capisce che i numeri non basta darli, ma vanno compresi.

Anecdotal evidence of the effectiveness of the Italian lockdown

Are Italians are complying with the lockdown aimed at reducing the spread of COVID-19? It’s yesterday’s news that aggregated cell tower location data has been used by Lombardy’s governor to menace harsher measures for those who are out and about without a legitimate reason. Movements of more than 500 meters have reduced by more than 60%, according to this data. This is in agreement with a previous study, in which a team of Italian scientists have used location data from users that had opted-in to a program called “Data for good”, showing how movements between counties reduced by at least 50% nationwide since the lockdown. This perhaps demonstrates how a granular data approach can be used to track the effectiveness of public health policies with limited compromises to individual privacy.

Turns out even website traffic data might be sufficient to draw similar conclusions. I happen to manage a small website that helps people automate a bureaucratic task in Italy; since it has had consistent traffic for the last months it might be a good dataset to look at. The assumption is that since people will be less involved in bureaucratic tasks while forced at home, they would visit the website less after the lockdown. That is indeed the case:

Another question is whether cities with a larger number of cases would see a bigger drops in visits; looking at four major cities (one near the epicenter, the others less affected) with this data that does not seem to be the case:

It would be interesting to model what fraction of the population agreeing to sign up for voluntary tracking is needed to enact finer grained lockdown strategies aimed at confirmed cases and their close contacts. Given how privacy is valued in the EU this might be a difficult task, although things might have changed already.

Fall trips (2019)

Social media posts should be impermanent

It’s probably fair to say that social media is at a turning point in western societies. The perception of the balance between their benefits and their risks has shifted considerably since their involvement in the outcomes of the Brexit referendum, the US presidential elections and other nasty things around the world1. Despite many people rightfully pointing out how these events can only partly be blamed on social media and its use by political parties and foreign agents, these companies are nonetheless faced with a choice: reassure users and governments with meaningful changes or face harsher regulations2. Some of these companies have already started to change, using “spin” like Apple championing its focus on privacy and Google, which has recently announced that users will be able to routinely delete data older than three months on their servers.

I am absolutely in favor of features like these; like many people who have benefited from the web in terms of free access to data, communities and social relations, I find it difficult to completely “go native” on technology, and I welcome the opportunity to have a little more control on the information tech companies have access to.

Many have commented on the self-destructing artwork by Banksy, which is both an example of exercising control over one’s persona and, at the same time, not. Others have taken this concept to the extreme, where control actually meant disappearing entirely. The “pictures for sad children” webcomic disappeared leaving nothing but a video of a cloudy sky behind, and street artist Blu deleted all of his works in the city of Bologna as a form of protest.

Blu deleting one of his walls in Bologna

While you can still find the deleted comic strips and photos of the erased graffiti online, the possibility of control puts these artists in a enviable position from the perspective of social media users. I’d argue that giving even just the illusion of control would put the big internet companies in a more favorable position in the eye of the public. For those of us who lack the motive or courage to resort to the “nuclear option” and disappear completely, the solution might be making all social media posts self-destruct after a few months. The fact that many hacks are increasingly being developed to do just that3 is an indication that in the future we may get to be like Bansky for the proverbial 15 minutes.

[1] – including the election of the first “true” populist government in western Europe

[2] – if the GDPR has nudged these companies ahead of these scandals or not is perhaps an open question

[3] – e.g. I use a modified version of this script to keep “only” 500 tweets and 145 favorites on my account at any given time. A similar hack might be available for Facebook

Can a biologist fix a smartphone?

About a month ago two unrelated events happened:

Now that my smartphone's touchscreen right side is having hardware issues and I can't write Os and Ms anymore I can finally appreciate Georges Perec's 300-page lipogram

Luckily, the Fairphone 2 is designed to be easy to repair; I ordered a new screen for about 100 Euros2 and once received I could change it in under 5 minutes without having to unscrew a single bolt. While I was waiting for the new screen I tried to see whether I could still use the phone without its screen on. Given that non-phone devices running Android are available, I wasn’t too surprised to see that the phone was booting up fine and even allow me to send whatsapp messages using the web interface, meaning that it was connecting to wifi/cellular.

At the beginning of the presentation I poked fun at our own experiment by referencing a seminal opinion piece called “Can a biologist fix a radio?”. In it the author challenges the way much of molecular biology is conducted; that is, by taking each gene out and looking for the ones that make the biological system “stop working”. The analogy with the radio shows how this approach wouldn’t necessary lead to a proper understanding on the way a radio works by taking each resistor out sequentially without a more formal electrical engineering approach (or using systems biology at the other end of the analogy).

Even though I did mange to fix my smartphone without having to fully understand how it works, the fact that it was still “working” without its screen indicates another potential pitfall in biological studies: are we able to tell when a system is broken (or “fit” in biological jargon)? If I had chosen to use the ability to connect to a wifi or cellular network and exchange data as a (fair enough!) measure for the fitness of my phone in the “laboratory setting” of my desk I would have not realized that it wouldn’t be of much use to browse the web or send a message when away from my computer. The situation would be even worse if I used the green LED that turns on while charging, which could miss problems with the antenna, the camera, and many other components, including software. We can intuitively tell what the “fitness function” of man-made artifacts is, while for biological systems we can only take educated guesses and hope that our measurements are of relevance outside the lab.

Anyway, we have used colony size on solid agar for each gene knockout in our study, which is now available as a preprint.

[1] with financial support from a eLife ECR travel grant

[2] that’s 80 Euros plus an expensive shipment charge of 20 Euros

Murcia (2018)

Molecular biology is the new radioactivity

Culture plays an important role in shaping our society. Throughout history, stories, poems and songs have commented on the world around them but have also inspired change through self-reflection. This seems partly to be satisfying a basic human need for a “structured” way to interpret our world. More recently, the presence of communications media that are able to virtually reach everyone in the world have allowed some forms of culture to dominate. The prime example is the so called pop culture: mostly movies, but also comic books, books and more recently even memes.

If one assumes that pop-culture as a whole can be a reflection on how its creators perceive our world and its future, looking at overarching trends might prove interesting. For instance, how’s pop culture perceiving science and its influence on society? An easy starting point is a hallmark of last century, which still has a strong influence in pop culture: World War II. Every year there are a number of movies with it as a setting, either to revive some sort of nationalistic pride or for plain old “entertainment”. I believe there’s something more to it, related to the huge influence the war has had on essentially all branches of sciences, from space rockets up to cryptography. One development in particular has for decades haunted pop culture: nuclear physics. Even a superficial survey of a few mangas should make clear how nuclear explosions and fallouts have been sublimated in many forms. But it doesn’t stop in Japan; American comic books from the 50s up to the 90s have featured super-heroes whose origins are overwhelmingly due to the effects of radioactivity. It’s easy to conclude that the fear and anticipation of its impact on society was translated in pop culture. Whether this influence stopped after the Chernobyl and Three Mile Island incidents is perhaps an open question.

Can you tell what this looks like? (from Akira)

What is pop-culture now perceiving as the current scientific bleeding edge? I’m tempted to say that molecular biology is the new radioactivity. Reflecting on the gradual takeover of molecular biology over biological sciences as a whole in the decades after the discovery of the structure of DNA, pop culture has started to display increasing anticipation and fear about its impact on society. The hallmark of this shift is most likely the ‘94 movie Jurassic Park, showcasing the power and ultimate hubris of “tinkering with mother nature”. The very same “radioactive” American superheroes have gradually been rebooted (multiple times over) to have their origin rewritten to be due to some failed experiment with some human/animal/alien DNA.

Bingo! Dino DNA

if dinosaurs and spider-men are perhaps very crude examples, two recent works of pop culture have captured my attention for being slightly more sophisticated. The first example is the movie Annihilation, which by itself is a great exercise in distilling and focusing an existing book. In it Alex Garland (writer/director) poses a simple question: how would a cancer look like if it came in the shape of an extra-terrestrial entity? How would plants and animals change when they are absorbed by a tumor the size of natural park? Despite a few sentences here and there that will certainly make a hardcore molecular biologist cringe, the movie brilliantly succeeds in portraying the irrational and senseless threat of cancer.

The second example is unlikely to have made the international stage, and its connection to molecular biology is rather comical for those following genomics discussions on twitter: the Italian TV-series “The Miracle” (“Il Miracolo”). Incidentally the series also comments on the uncertainties of the EU experiment: at the verge of an “Italexit” referendum, the prime minister is informed that a statue of the Virgin Mary weeping liters of blood every hour has been found1. Apart from the effect of revealing the presence of this unexplained phenomenon to the world and to the referendum itself would have, the first question that comes to mind is: “whose blood is it?”. Instead of crossing the sequence data with government and public repositories (as law enforcement are increasingly doing), the authors came up with a dodgy online service that could predict… facial features from DNA. I have no clue as to whether the authors were aware of the “facial features prediction” paper, but its critics would be happy to know that it didn’t work so well in this TV series. I’ll leave to the reader to figure out which other controversial molecular biology technique ended up working here.

[1] it sounds very silly, but Italy has an established history of weeping virgin Marys, and prime ministers gambling their careers on a referendum

Methods section driven reproducibility

A cornerstone of the scientific method has always been the ability to draw the same conclusions after the execution of different experiments. I would very much like to say that there is a consensus in the scientific community on how to call such a process but unfortunately that doesn’t seem to be the case. The terms “reproducibility”, “replicability” and “robustness” are often used interchangeably and different people might rank them differently depending on how they interpret them. Luckily, a recent paper cleverly proposed to stick to “reproducibility” to describe the process as a whole and to name its different flavors by adding a prefix. In short, Goodman et al. indicate the following kinds of reproducibility in science (the short summaries are mine):

  • Methods reproducibility: giving sufficient details about the experimental procedure and the processing of the data so that the “exact same” results can be obtained
  • Results reproducibility: carrying out an independent study with “very similar” procedure and obtaining “close enough” results as the original study
  • Inferential reproducibility: drawing “qualitatively” similar results from independent studies or a reanalysis of the same data

In the specific area of computational biology, the requirements to meet these three objectives can be more precisely defined:

  • Methods reproducibility: providing “machine code” that give exactly the same output given the same input
  • Results reproducibility: providing all the relevant details about the algorithms used so that they can be re-run/reimplemented and give quantitatively similar results on the same or different data
  • Inferential reproducibility: providing an interpretation of the results of an experiment so that it can be qualitatively compared with another study

It’s easy to see how the latter flavor of reproducibility is the most valuable, as getting to the same conclusions using different data or even completely different experimental strategies can sometime provide further support by itself. Needless to say that is also the one that requires the most work and resources to achieve.

Regarding methods reproducibility, it has become pretty fashionable in computational biology; many journals are explicitly requesting authors to deposit all computer code as supplementary material. The extreme case being providing either VMs or so called containers to ensure that the specific computing environment does not alter the final result, leading to perfect methods reproducibility. This is an important thing to aspire for, especially to avoid scientific fraud (or bona-fide errors), and many people have proposed technologies to make this relatively easy to achieve. Despite all this, I believe that in many cases the emphasis should be on achieving better results reproducibility over perfect methods reproducibility. This usually comes in the form of none less than the good old methods section of a paper1. If the algorithms used in an experiment are explained with sufficient detail, it will only be (relatively) trivial to reimplement them to produce very similar results on different data, thus reproducing (in the “results” sense) the original paper. What’s more interesting, writing an implementation of an algorithm from scratch is a great exercise and provides a great way to properly understand how a method works, not to mention the possibility to improve it. In fact, I recently had to reimplement some algorithms that were very well described in other paper’s methods sections (part of this, and the whole of this with some help)2. In the process I have better understood the algorithms and I ended up making improvements and extensions. It also has convinced me that trying to reimplement an algorithm from a paper could be an interesting part of a computational biology class. All of this is simply not possible through methods reproducibility, unless a thorough inspection of the source code is made, which in many cases can be a true nightmare. Even the most advanced container technology or programming language will eventually fade, but a well-written couple of paragraphs will continue on for a long time.

[1] or the documentation of your software package, or a chapter of a book or an interactive blog post

[2] our former colleague Omar was particularly good in reimplementing existing methods to make them more user friendly and extensible, like SIFT or motif-x

Mexico (2017)

Cover alternativa per “Teoria della classe disagiata”

Una cover alternativa per “Teoria della classe disagiata” di Raffaele Alberto Ventura. Nella versione in inglese di questo post ho provato a spiegare di cosa parla questo saggio e come sarebbe interessante vederlo tradotto perché se ne parli il più possibile perlomeno in Europa. Per fortuna scrittori migliori di me hanno letto e commentato il libro, per cui mi limito a lasciare alcuni link per chi volesse approfondire, invece di tradurre la mia pasticciata spiegazione. Buona lettura!

Copertina originale