I suppose that for us to be intelligent beings a necessary prerequisite is that we do not forget every piece of information, data or knowledge we ever come in possession of. In other words remembering/retaining is important. On the other hand since we are in some way finite (our memory and processing capacities are) forgetting (i.e. purging some of what we come in possession of) is unavoidable. Forgetting ((The thoughts in this post are a result of being part of an interdisciplinary workshop which took place in Schaffhausen, Switzerland. Lawyers, historians, philosophers, technologists, engineers, archivists, psychologists all in all under 20 individuals gathered to discuss forgetting and remembering on the digital age. The workshop was organized by a research group of Information Law experts of the Univ. of St Gallen. Format, group size and disciplinary plurality made for an interesting and intensive two days.)) in this sense is mostly undesirable. In the long run we are guaranteed to have more forgotten information than remembered since the past has an infinite supply from the future. Interestingly our inability to forget certain pieces of information is also undesirable and often detrimental when it comes to erroneous knowledge or traumatic memories. Forgetting can also be immoral (the Germans should never forget National Socialism). Remembering can usually not be immoral although when tied to forgiveness it could (we can legitimately ask how long we will have to remind the Germans of their past). In the Science Fiction film Eternal Sunshine of the Spotless Mind we learn that the downside of forgetting something undesirable is that we forget its properties including that it was harmful thus exposing ourselves to repeating the same mistakes.

In science, forgetting is a powerful tool. I will discuss some examples from Mathematics (Category Theory, Invariant Theory, Game Theory) and some from Data Science (data aggregation and visualization).

In Category Theory, a mathematical field which provides what popular lingo would call a helicopter view of mathematics, one views mathematics as a process in which we map one mathematical field (its objects and its morphisms) onto another. The tool with which we do this is called a functor and functors can have properties such as faithful or forgetful. The main point is that we use functors to forget complexity thus translating intractable problems to solvable ones. We then work in a new category where we solve problems, which in their turn solve some of the problems in the category we started from. If we then want more we may construct other functors. In other words functors are a strategy of forgetting in order to gain an advantage. Invariants work similarly in mathematics. If we want to classify some objects we typically compute some algebraic “signature” they have. Whenever the signatures are different the objects are different too. If the signatures are the same our computations are inconclusive. When this happens for two different geometric objects we need a new tool, which forgets less of the original structure. These are two examples where we strategically use forgetting in order to gain an advantage. What we do is to forget enough of the intractable original structure so that we translate it into one, which is suitable for computations, but not so much as to empty all content.

In Appendix A of this Game Theory paper you may see a proof that in a repeated fixed game between two players the one who attempts to devise a strategy based on a longer memory has no advantage over a more forgetful player. Another case where forgetting is harmless if not advantageous.

Forgetting is a common strategy in the data world as well. Every time we aggregate some variables we forget details in order to have a better view of the basic picture. Data visualization is another form of strategic forgetting as well. For example if we have a large body of emails forming the directed or undirected graph of email sending between people and labeling the edges of the graph with a number to indicate the number of exchanged emails is a form of forgetting all but certain information which helps us “understand” the data in an initial approach. Macroeconomics is the application of forgetful techniques on Microeconomics.

I hope that I could sketch some of the subtleties of remembering and forgetting so far. In our digital age the dramatic drop of the cost of data storage allows us to save enormous amounts of raw micro data. In fact we save so much data that we on occasion seem to get a glimpse of what a computable universe would be like. At the time of the writing of this note I was following the manhunt in Boston after the bombing of the city’s marathon. As photo and video data from hundreds of cameras and smartphones came in the two culprits were identified not long after the dramatic event giving us the feeling that we can do effective forensics of any event anywhere turning the world into a computable universe in the sense of digital physics.

I think it does not take much to convince ourselves that we cannot possibly save all the data that there is because that would mean that we would need a universe large enough to fit all of its past an idea we can easily dismiss as impossible. So no matter how much data we save we will always need to make choices of what to retain and what to forget. On the other hand the more data we save the more it looks like the stream of noise this data came from. It is like viewing an expressionist painting from closer and closer: from a distance you “see” a picture whose broad strokes allow you to complete the missing details thus creating an illusion of precision. As you get closer to the painting you see more fine brush strokes and detail but that amount of precision turns the detailed brush strokes into noise: you see no meaningful picture any more. As we save more and more data our ability to make sense of it all will increasingly rely on techniques of forgetting (aggregating, visualizing etc) which will have to be done by robots and automata. Two things are important to keep in mind in this context. Firstly as we will be parsing the growing volumes of data with machines, which selectively forget some aspects to allow us an expressionist understanding of the total, we will increasingly run the risk of powerful non linearities: displaying a top ten list helps make the list permanent. Google’s psychic feature is a prime example of this. Secondly as we create more and more aggregates of the data (in order to also respect the privacy of individuals) it will be like writing more and more equations on the space of all individual attributes. It will not take long and we will over-determine the space of solutions thus being able to solve for the individual. This means that eventually the sensitive attributes will be computable. The normative and legal relevance are evident.

A final remark on the exponential growth of data we remember today. If we center our viewpoint to our own present changes look dramatic like never before. This is deceptive. Exponential functions have an important property, which is called scale invariance. This basically means that while the past looks boring in a scaling suitable to our present a scaling chosen to understand a point of the boring past will reveal changes were as dramatic back then. For example the impact of the invention of the printing press by Guttenberg in Mainz in 1450 on information replication and dissemination was comparably disruptive as computing is today. Stated in reverse current changes will pale in comparison to future ones. This means that as much as it is true that the amounts of data we remember are huge they are also tiny. There is another way in which the amount of data we remember is insufficient. Despite all the data we save we are still unable to know what is happening in the economy in real time. We learn a first approximation of the quarterly GDP with months of delay and the number changes for years afterwards.