What to pseudonymize?¹ ²

As a general rule, all personal data should be pseudonymised. According to Directive 95/46/EC of the European Union (hereinafter referred to as the "Data Protection Directive"), personal data shall mean "any information relating to an identified or identifiable natural person." An identifiable person is one who can be identified, directly or indirectly. This can be done by means of an identification number, but also by means of one or more specific characteristics which are an expression of physical, physiological, mental, economic, cultural or social identity. This makes it clear that special care must be taken not only to pseudonymise obviously personal information, but also information that can enable the identification of a person via secondary characteristics, e.g. locations or behaviour patterns.
In general terms, the Data Protection Directive aims to ensure the protection of fundamental rights and freedoms of individuals, in particular privacy, with regard to the processing of personal data.

The definition of "personal data" consists of four elements. A more detailed analysis of these is instructive for understanding what is meant by "personal data". These four elements are:

1. any information
2. relating to
3. an identified or identifiable
4. natural person

1. Any Information

With the wording "all information", the Data Protection Directive refers to the fact that the term "personal data" is to be understood as broadly as possible, i.e. it requires a generous interpretation. "Personal data" can thus be any information about a person. This can be objective information, such as body size or medical findings, but also subjective information, such as opinions or assessments by and about a person. Assessments from companies about customers such as: "Titius is a reliable borrower" or "Titius is a good employee and deserves a promotion" are examples of this. This information does not necessarily have to be true to be considered personal data.

2. Relating to

In general, information refers to a person when it is information about that person. In many situations, a reference can be easily established. The information in personnel files or medical records clearly refers to a specific person. The same applies to video or audio recordings. However, there are also many situations in which it is not easy to determine whether certain information refers to a person. In principle, in these cases, "data relates to an
individual if it refers to the identity, characteristics or behaviour of an individual or if
such information is used to determine or influence the way in which that person is
treated or evaluated³."

An example of this would be the location monitoring of taxis to improve the quality of service. Even if this monitoring is solely to improve the quality of service, it can have an indirect impact on drivers. If a taxi company's taxis are tracked by GPS at their headquarters in order to assign each customer to the taxi closest to them, which would save time and fuel, it is in principle the vehicles' data that is of interest and not the drivers'. Thus, the purpose of the data collection is not to assess or monitor whether drivers are taking the shortest routes or obeying speed limits. Nevertheless, such collection could be made with this system. Thus, it is in principle possible that this data processing system may have a significant impact on individuals and thus the related data may be classified as "personal data". Therefore, the data collected should be subject to data protection rules.

3. An identified or identifiable

A natural person is considered an "identified person" if he or she is distinguished from all other members in a group of persons, i.e. he or she is clearly identifiable. If there are several persons with the same name in a data collection and no further information is available that makes a person uniquely identifiable, these persons are thus not considered to be "specified persons", since it cannot be clearly determined to which natural person the corresponding data record refers. An "identifiable person" in this context is therefore a natural person for whom it is in principle possible to establish his or her identity.

In this respect, the Data Protection Directive identifies in Article 2 general "characteristics" of "personal data". Such data exist when a natural person "can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity". A person can be identified directly by his or her name or indirectly by a telephone number, car registration number or insurance number. Equally, however, this is also possible through a combination of essential criteria, such as place of residence, occupation or age group, which makes it possible to narrow down the group to which the person belongs. This shows that the information needed for identification depends on the context of the situation. While a frequently occurring family name is not sufficient to identify a person within a national population, within a school class it may well be sufficient to successfully identify a person.

The category "indirectly identified or identifiable persons" thus refers to identification by means of unique combinations. The scope of these combinations does not matter. Depending on the context, a person can be determined by means of a few characteristics or a very large quantity of them is required. Even if the existing data do not allow an immediate conclusion to be drawn about specific persons, these persons can still be identifiable because the existing information in combination with other information could make a clear identification of these persons possible. It is irrelevant whether this additional information is stored by the data controller, is not stored or is not available to the data controller at all.

"Whereas to determine whether a person is identifiable", as defined in the Data Protection Directive, "account should be
taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person." This means that the mere hypothetical possibility to identify a person is not sufficient to consider the person identifiable. If this possibility does not exist or is negligible, then the person is not to be classified as "identifiable" and the corresponding information is not considered "personal data". However, this classification must be seen as a dynamic process. What is not possible at the current time, with the current state of the technology, may well be feasible at a later date. Consequently, the potential development of the technical possibility should be taken into account in relation to the time period during which the data is stored. For example, if the data is kept for one month and identification is almost impossible during this period, the data is not considered "personal data". However, if the retention period is 10 years and identification could become possible during this period, this possibility should be taken into account by the data controller. In case of a future possibility of identification, the data shall be considered as personal data from that moment on.

4. Natural person

The provisions of the Data Protection Directive apply to natural persons. This means that the right to the protection of personal data applies generally to all people and not just to nationals or residents of a particular country. A more precise definition of what constitutes a legal personality is found in the civil law of individual states. In general, it refers to the capacity to enter into a legal relationship, which begins at birth and ends at death. Personal data is therefore data that relates to specific or identifiable living persons.

Since it is not always possible to determine whether a person is actually no longer alive, in practice, for the sake of simplicity, data on deceased persons should be treated in the same way as data on living persons, rather than processing the two groups separately. Furthermore, information about deceased persons can also relate to living persons. For example, information about a hereditary disease of the deceased father can enable conclusions to be drawn about his children. Therefore, if information about deceased persons relates to living persons at the same time and is personal data, personal data about deceased persons may indirectly fall under the protection of the data protection regulation.

Since the definition of "personal data" refers in principle only to human beings, i.e. natural persons, legal persons do not fall within the scope of the Data Protection Directive and its protection therefore does not apply to them. This general provision is limited for cases where the information on the legal person allows a reference to a natural person. In such cases, the data must be considered personal data. An example of this would be when the name of a legal person (name of a company) is derived from the name of a natural person.

Some EU Member States (Italy, Austria or Luxembourg) have extended the scope of the Data Protection Directive to the processing of information on legal persons.

As in the case of deceased persons, it may be easier in practice to process all information in accordance with the Data Protection Rules, rather than differentiating between legal and natural persons.

Limits of the scope of application

Although the definitions of "personal data" and "processing" in the Data Protection Directive are intentionally broad, it does not follow that every case in which it would hypothetically be possible to identify specific individuals is in fact a case to which the provisions of the Data Protection Directive apply. Such restrictions may exist, for example, if the data are not processed automatically and in a non-structured form. Likewise, the purpose of use may constitute a restriction of the Data Protection Directive, for example, if data are used by a natural person exclusively for personal or family activities.

Even if the processing of personal data falls within the scope of the Data Protection Directive, not all of its provisions necessarily apply to the specific situation. A number of provisions of the Directive provide a high degree of flexibility to strike a balance between the protection of data subjects' rights and the legitimate interests of controllers, third parties and the public interest, if any.⁴

The data protection of the Data Protection Directive therefore refers to the protection against forms of processing that are typical of "easy access to the data" and the risks that go along with it. For example, if in a dataset the name of individuals is replaced by a unique code (e.g. name → X4321), there is a certain risk of identification in case access to the key used for encryption is obtained. Therefore, such possibilities must be included. In principle, the guideline in this respect can be: "whether the persons can be identified taking into account all the means likely reasonably to be used by the controller or any other person". Examples of such a consideration would be the risk of a hacking attack, the likelihood that an individual of the organisation responsible for the data set could disclose the key, or the feasibility of indirect identification. These considerations then determine whether the information should be classified as "personal data".
If the codes are not unique, but, for example, the same code number is used for different persons in different cities and for data from different years (i.e. the coding is only differentiated to the extent that it is possible to distinguish persons in one year in one city), the persons could only be identified if it is known to which year and to which city the data refer. However, if this additional information is no longer available and cannot be recovered with reasonable effort, the data could be considered as not relating to identifiable persons and consequently the data protection provision would not apply.

As has been shown, there are situations in which information is not to be classified as "personal data". This is the case if the information does not relate to an individual or if the individual is not considered to be identified or identifiable. If the information does not fall under the definition of "personal data", the Data Protection Directive does not apply.

However, if the Data Protection Directive does not apply, national data protection laws may apply in certain circumstances.

Example of personal data

Direct and indirect information

name and first name
date of birth or age
place of birth
home address
e-mail address
identification numbers
- national insurance number
- tax identification number
- health insurance number
- identity card number
- matriculation number
- etc.
ethnic and cultural origin
political, religious and philosophical beliefs
health
sexuality
trade union membership
location data
IP address
cookie identifier
advertising identifier of your phone
credit card data
bank details
- account numbers
- credit information
- account balances
- etc.
property characteristics
- vehicle and property ownership
- land register entries
- vehicle registration plates
- registration data
- etc.
customer number
information affecting private and family life in the strict sense of the term
all kinds of activities, such as those related to employment, economic or social behaviour
biometric data
- biological characteristics
- physiological characteristics
- facial features
- gender
- skin, hair and eye colour
- stature
- clothing size
- fingerprints
- eye retina
- face shape
- voice
- Hand geometry
- vein structure
- reproducible actions
- etc.
special skills or other behavioural characteristics
- handwritten signature
- keystroke
- characteristic gait
- manner of speaking
- etc.
data on consumers, patients, customers, employees, etc.
information on drug prescriptions
- identification number of the medicinal product
- name
- active substance content
- manufacturer
- sales price
- new pack or refill
- reasons for use
- reasons for not using generics
- first and last name of prescribing doctor
- telephone number
- Whether in the form of an individual prescription or in the form of samples recognisable from several prescriptions. This information, even if the patient is anonymous, can be considered personal data about the doctor issuing the prescription.
- etc.
value judgements
- School and work certificates
- etc.

Formats or carriers of the information

information in alphabetical, numerical, graphic, photographic, aural or any other form
information on paper and digital information
Sound and image data in particular are to be considered personal data because they may constitute information about a natural person.

Links

EU Commission → What is personal data?
EU Commission → What personal data is considered sensitive?

The following information is based on Opinion 4/2007 on the concept of personal data of the Article 29 Working Party of the European Union. ↩
The text on this page is intended only as an overview and introduction to the topic of "personal data" and its protection and is not a legally binding treatise. In this sense, the author assumes no liability and makes no claim to completeness or accuracy. ↩
Working Party document No WP 105: "Working document on data protection issues related to RFID
technology", adopted on 19.1.2005, p. 8. For detailed information see Opinion 4/2007 on the concept of personal data, S. 9ff ↩
For further information see: Opinion 4/2007 on the concept of personal data of the Article 29 Working Party of the European Union, S. 5 ↩

What to pseudonymize?1 2