Privacy concerns with big data from probe vehicle systems
In the age where the basis of Internet infrastructure for digital communication is developing, one of the most important issues that we need to address urgently is the building of a global service infrastructure in our society based on our activities and mobility.
The cutting edge network technologies are continuing to evolve at a remarkable speed, in a compelling environment in which we carry out various activities transparently and not limited to fixed locations. In such mobility aware environment, infrastructure and service based on mobility have become a necessity for transparent and optimal activities. With such mobile activity focus – collecting, sharing, and representing valuable probe information from vehicular activities using the Internet is becoming a hot topic in the research field and deployments beyond many countries.
Probe vehicle systems   are designed to collect and share valuable information from vehicles (these are called probe data) via certain information infrastructure. By constructing the probe vehicle system on the open communication platform such as the Internet, the system can be kept independent from the under lying layer, and be able to provide various services. Organically consolidated probe data that make social instructive information, such as traffi c congestion, accident, and environmental information, is deserving of societal expectation. The probe vehicle system has become a new trend for service deployment of ITS (Intelligent Transport Systems) to enhance car telematics. Furthermore, shifts in devices that do not require to be fi xed to equipment inside vehicles, such as PND (Portable Navigation Device) or smart-phones as probe vehicle system have become a new focus on ‘BIG DATA from Vehicle’ sharing infrastructure. Using smartphones as probe vehicle system enhances the possibility to create a mobility aware service platform which not limited only to vehicles, but to any activity related with mobility, relying on massive innovation not limit itself only to the ITS.
General pictures of probe vehicle systems
A vehicle has more than one hundred sensors. Useful information can serve as a secure foundation for society, if information from those sensors can be collected. The probe vehicle system collects the vehicle sensor data via a communication infrastructure. The probe vehicle system has been researched upon, studied and examinedarchitecture, common data format and interfaces were standardized as ISO 22837:2009 in ISO/TC204/WG16. The following list is denoted in the defi nition regarding the main components of probe vehicle system. The relationship between the factors is illustrated in Figure 1.
• Vehicle sensor:
It is a device on a vehicle that senses conditions inside and/or outside the vehicle or that detects actions that the driver takes, such as turning on/off headlights or windshield wipers, applying the brakes, etc.
• Probe data:
Vehicle sensor information that is processed, formatted, and transmitted to a land-based center for processing to create a good understanding of the driving environment. These probe data includes probe data elements and probe messages.
• Probe data element:
An item of data included in a probe message, typically from onboard sensors. Systems in the vehicle may do some processing on the sensor reading to convert it into a suitable form for transmission.
• Core data element:
Core data elements are basic descriptive elements intended to appear in every probe message. Core data element consists of a time stamp and a location stamp describing the time and place at which the vehicle sensor reading was made.
• Probe message:
It is the result of transforming and formatting one or more probe data elements into a form suitable to be delivered to the onboard communication device for transmission to a land-based center. It is emphasized that a probe message should not contain any information that identifies the particular vehicle from which it originated or any of the vehicle’s occupants, directly or indirectly.
• Processed proba data:
The result of fusing and analyzing data from probe data messages in combination with other data.
Motivation – probe vehicle systems and privacy
In the International Standard ISO 22387, a probe vehicle system does not include any personal data within the probe data by defi nition. The probe vehicle system consists of vehicles that collect and transmit probe data and land-based centers that carry out probe processing. Probe processing builds an accurate understanding of the overall roadway and driving environment by fusing and analyzing probe data sent from multiple vehicles and data from other data sources. Namely, the probe vehicle system processes the data statistically to generate useful information. Therefore, a probe vehicle system doesn’t require the vehicle and the data subject identification. In other words, probe data/ probe message requires ‘anonymity’.
However, personal information might be handled in many different ways in the probe vehicle system . For example, consider a probe vehicle system that does not include any personal data within the probe message, but uses personal information to authenticate the data subject when collecting probe data. In this case, even if their personal information is not contained in the collected data, the data subject cannot furnish vehicle data with complete peace of mind unless there is a system to protect their personal information. In addition, a probe message surely contains ‘Location’ and ‘time’ of transmitted vehicles. It may become personal information where the vehicle ‘existed’. A vehicle has a close relation to the data subject and an excursion of the vehicle shows the activity history of the owner. Furthermore, there is a possibility of identifying a particular vehicle on the basis of the characteristic of probe data and where it has been collected. Identifying a vehicle means the possibility of disclosure of data subject’s personal information and privacy.
Moreover, there are some novel applications using probe data with smartphone or some nomadic devices have been intergraded into many aspects of our lives in not just the ITS area, such as ecological services and concierge services. These applications can help bring recognition to the probe vehicle systems, and have the economic effect of expanding markets of sensor data like probe information. There is much to be done for sharing valid data with some different probe vehicle systems to enhance the applications even through the typical probe vehicle systems. Coordinating services with some kinds of probe data collected by smart-phones or other nomadic devices could cause problems about the collision of type, reliability, and granularity of probe data among existing systems.
ISO/TC204/WG16 published international standard about personal data protection in probe vehicle system as ‘ISO 24100 Intelligent transport systems – Basic principles for personal data protection in probe vehicle information services’ in 2009 . ISO 24100 is stipulated that even if data cannot identify an individual directly, if it can do so indirectly it should be regarded as personal data to be specifi ed in this standard as a target of protection, as is mentioned in the OECD guidelines  for personal data protection. In order to protect the privacy, a vehicle should not be identifi ed by the collected data. However, an authentication of the data subject is necessary to protect a probe vehicle system from a menace.
For privacy protection, one solution is to use a kind of random code. When a vehicle sends probe data with the given code, the probe data center knows that the data is valid, because the data came with a code signed by them. The data subject’s privacy is protected using the random code, and therefore cannot be ‘traced’. However, there is one problem in this approach, i.e., the data subject can never be traced by the probe vehicle system even if it is required (e.g., by the authority).This requirement is called ‘Traceability’. As a method to satisfy ‘Traceability’, there is a proposed Anonymous Authentication scheme such as various anonymous credential schemes based on cryptography. However, this scheme is applied to all transactions, such that the probe data are completely unlinked. This state is called ‘Unlinkability’. ‘Unlinkability’ is good state for the data subject’s privacy, but the probe vehicle system still has a problem. For a measurement of the link travel time, the consecutive vehicle data are necessary to the probe vehicle system. In addition, complete anonymity requires a high cost that is not practical to the probe vehicle system. Therefore, in order to achieve the requirements of both the data subject’s privacy and the probe vehicle system, there is a need to defi ne the ‘affordable’ anonymity for probe vehicle systems.
Concept of anonymity in probe vehicle systems
Fundamentally, almost all probe vehicle systems process the data statistically to generate useful information. That means the probe vehicle system doesn’t need the vehicle and the data subject identifi cation. On these assumptions, the anonymity of probe vehicle system can defi ne as follows:
• Anonymity can be defi ned that the data subject cannot be identified.
• If it contains information which can identify the data subject directly is not anonymity.
• In the case that contains the information that can identify the data subject indirectly depends on the knowledge of the observer.
Besides, the information that can identify the data subject indirectly (e.g., user ID, device ID, provisional communication ID…) can be defined as follows:
• Indirect identifiable information of the data subject is a ‘pseudonym’ unless what is widely known.
• A ‘pseudonym’ can be defi ned as having anonymity to the public when management is appropriate.
In a general probe vehicle system, ‘perfect’ anonymity of technical and conceptual sense is not required. Anonymity for probe vehicle systems may adopt those accepted by society and market is reasonable. Such a concept is often introduced in the ITS framework projects in Europe and the United States of America  . Secure Vehicle Communication, Deliverable 2.1; Security Architecture and Mechanisms for V2V/V2I  defined two categories of anonymity in order to allow the diversity of anonymity.
• Total anonymity where a participant in an IVC (Inter- Vehicular communication) system remains completely anonymous, i.e., no information that could identify that participant can be gained by other parties.
• Resolvable anonymity is the same as total anonymity with the exception that under certain, well defined circumstances others may be able to identify the otherwise anonymous entity.
Based on the classification, this study is divided into three categories of anonymity, ‘Total anonymity’, ‘resolvable anonymity’ and ‘Identify’. ‘Total anonymity’ is a status that the data subject cannot be identifi ed permanently. Anonymity to be discussed in the probe vehicle system is mostly ‘resolvable anonymity’. Because many probe vehicle systems require some authentication method for security, traceability and linkability of a certain period of time for high quality services. ‘Resolvable anonymity’ is the state when only the observer who in an intimate relationship with the data subject identifies the individual.
This paper proposes a two additional classifi cation, ‘pseudonymity’ and ‘common anonymity’ for probe vehicle system. ‘Pseudonymity’ is the state in which the consecutiveness of the data subject is recognized among the ‘resolvable anonymity’. The consecutiveness means that it is possible to identify that it is the same data subject even though the probe vehicle system cannot be specifi c to the individual. For example, one probe data group contains the same member ID and the member ID cannot identify directly – this status is called ‘pseudonymity’. In either case, if the tie cannot be easily individuals and have been generally known, and cannot directly identify the data subject, it has a suffi cient anonymity as almost all probe vehicle systems. The other, ‘common anonymity’ is the state that there is no linkability of probe data. For example, the system uses the same as total anonymity with the exception that the system detects a malicious attack or there is a request from the data subject, called ‘common anonymity’.
‘Pseudonymity’ allows probe data to be collected continuously while keeping the anonymity of the data subject. ‘Common anonymity’ is almost the same as total anonymity. The only difference is deniability in the data subject. Deniability is the state of being able to prove to a third party that is not the act itself. In the case of ‘total anonymity’ is needed, ‘un-deniability’ is required in the probe vehicle system. However, in most cases, it is unrealistic in terms of cost-effectiveness. Figure 2 shows the flowchart of the classifi cation of anonymity for probe vehicle systems.
Probe vehicle systems have become the new focus on ‘BIG DATA from Vehicle’ sharing infrastructure. Basically, a probe vehicle system processes the data statistically to generate useful information, so that probe vehicle systems don’t require data subject identification. On the other hand, many probe vehicle systems need consecutive data group for high quality service. Moreover, the perfect anonymity requires a high cost that is unrealistic in terms of cost-effectiveness.
This paper has analyzed privacy concerns related to probe vehicle systems and proposes the classifi cation of affordable anonymity for probe vehicle systems in order to allow the diversity of anonymity.
I gratefully acknowledge the contribution of comments by Michiko IZUMI (HOSEI University) and Hiroshi ITO (Japan Automobile Research Institute). I would like to thank KEIO NUS CUTE Center members, WIDE project members, especially members of InternetCAR WG. In addition, a special thank is approached to the members of ISO/TC204/WG16 for their practical advice and support.
 U. Keisuke, S. Hideki, and M. Jun, “The InternetCAR network architecture: Connect vehicles to the internet using IPv6.” ITST2005, June2005, pp. 187–190
 Huber, W., Ladke, M. and R. Ogger, “Extended fl oating car data for acquisition of traffi c infromation”, Proc of the 6th World Congress on ITS, Toronto, Canada. 1999.
 International Organization for Standardization (ISO), “ISO 22837:2009 Vehicle ProbeData for Wide Area Communication”, International Standard, 2009
 Masaaki S, Michiko I, Hideki S, Keisuke U, Jun Murai, “Threat analysis and protection methods of personal information in vehicle probing system”, The Third International Conference on Wireless and Mobile Communications(ICWMC), March 2007
 International Organization for Standardization (ISO), “ISO 24100:2010 Intelligent transport systems — Basic principles for personal data protection in probe vehicle information services”, International Standard, 2010
 The Organisation for Economic Cooperation and Development, “OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data”, adopted on 23 September, 1980
 SeVeCom, http://www.sevecom. org (Feburary 20, 2012)
 PRECIOSA, http://www.preciosaproject. org (Feburary 20, 2012)
 EVITA, http://evita-project. org (Feburary 20, 2012)
 Antonio Kung, “Secure Vehicle Communication, Deliverable 2.1; Security Architecture and Mechanisms for V2V/ V2I”, SeVeCom, August 2007