Computer Vision - Understanding and Using New Data Sources to Address Urban and Metropolitan Freight Challenges

Value Proposition – Computer Vision Video Analytics

What is the data source?

Computer vision is the process of using an image sensor to capture images, then using a computer processor to analyze these images to extract information of interest (Guan et al, 2012). A simple computer vision system can identify the physical presence of objects within its view range by identifying visual attributes such as shape, size, or color. Advances in computer processing power and the refinement of algorithmic approaches have enabled rapid advances in computer vision in recent years (Sivaraman and Manubhai 2013). More sophisticated computer vision systems can identify or classify an object based upon the requirements of an application. For example, a transportation-related computer vision system could identify types of vehicles. The key advantage of computer vision in transportation applications is its non-intrusiveness. Computer vision systems do not need to have devices embedded, physically printed, or externally attached to the objects targeted for detection. In this way, computer vision has operational advantages over other types of data collection technology such as radio frequency identifier (RFID) tags, barcodes, and wireless access points, which require additional installation of complementary equipment on or within a vehicle. Furthermore, computer vision systems can easily be upgraded, as only camera sensors, and no other devices on or within vehicles need to be replaced.

Automatic license plate recognition (ALPR) was one of the pioneering applications of computer vision for ITS, and is used in highway tolling as an enforcement aid to complement tolling transponders. Computer vision technology also plays many other roles in improving productivity and safety in traffic management and other transportation operations. For example, surveillance cameras with machine vision functionality have been mounted along freeways and main intersections for public safety, traffic incident detection, ramp metering, and traffic signal timing.

What challenges do the data address?

Computer vision is an important tool for vehicle identification and classification. Thus, the data can provide insights on congestion, last mile access, last 50 foot access, and truck parking challenges.

Computer vision also has a significant role in advanced driver assistance systems (ADAS), particularly for heavy vehicles such as semitrailers and other trucks (Guan et al, 2012). Since commercial carriers are generally more cognizant of the costs of crashes and the impact to their business’ reputation and bottom line, they have been the most aggressive adopters to date of lane departure warning and forward collision warning systems, many of which include computer vision technologies.

Why is it new?

Computer vision systems are relatively new to the field of transportation for two reasons: cost and speed of computer equipment, and specific research and development of algorithms for transportation applications. To properly classify objects, computers require sophisticated algorithms, and computing hardware powerful enough to quickly complete these algorithms. Computer vision for transportation has only become possible as computing hardware has decreased in price while simultaneously increasing in power. At the same time, specialized algorithms tailored to transportation applications (such as identifying specific vehicles or license plates) had to be developed, tested, and refined, and because of this complex development process, reliable transportation-relevant computer vision algorithms have only been available for a few years.

How are the data captured?

Two core sensor technologies are the foundation for nearly all computer vision systems (Guan et al, 2012). Complementary MetalOxide-Semiconductor (CMOS) and Charge-Coupled Devices (CCD) are used on most camera-enabled devices. CMOS is noted for its better performance, battery life, and heat dissipation, thus is mostly used in portable electronic devices such as cell phones. CCD has a similar price point to CMOS and usually has better sensitivity and image quality, and is thus used in most other cameras.

Once visual data is acquired by the image sensors, it is processed and analyzed by software to enhance image quality and extract key features for either object detection or identification as required by the application. Visual data can be in the form of a single still image, multiple images, or consecutive image sequences, also known as video.

To minimize computational complexity, many applications of machine vision segment video data into a background image and a moving object image. Background images tend to be motionless over a long period of time, and moving object images only contain foreground objects. Change detection (Kim et al., 2001; Foresti et al., 1999) is the simplest method for video segmentation. The figure below shows detection and tracking of moving vehicles by using computer vision.

Detection and Tracking of Moving Vehicles, Hadi et al, 2014

When more than one camera is used, the technique is known as stereo vision, and the distance between each of the two cameras and the object is used to determine three-dimensional features. Since a full spectrum of information may be analyzed and retrieved from an image, e.g. colors, shapes, patterns, and depths, computer vision systems have long been viewed by technologists as a promising approach for a multi-purpose data acquisition or sensing solution.

What are policy considerations in its use?

Regulatory Environment: Facilitates data access and use.
Ownership: Not tightly controlled.
Privacy: Current applications typically protect private information.

No specific regulatory law addresses computer vision systems. When the cameras are used to focus on particular vehicles for applications such as classification, resolution of imagery tends to be the minimum requisite and does not raise privacy concerns as vehicle signage or lettering is not visible.

The digital pictures are often enhanced by license plate recognition, where the digital picture of a license plate is digitized and compared against a database of license plate numbers and letters associated with particular vehicles and their owners. Most of the decisional law regarding privacy and recording license plates has not determined a license plate to be private information. The argument is that a license plate cannot be private because it is affixed to the exterior of the vehicle where it can be seen by anyone (Glancy, 2004). However, if a camera were to capture an image of the face of a driver or passenger, then the privacy of the individual photographed could come into question. However, computer vision technologies are usually tailored to specific types of applications (ex. classification), so agencies have ways to mitigate such concerns.

What are institutional considerations in its use?

Capacity: Special skills and significant computing resources required to work with the data.
Stewardship: The volume and nature of data require significant storage and management capabilities.
Equity: Data and analyses are representative of those vicinities, road segments and types of vehicles that have been sampled, however it is easy to develop samples across the entire transportation network.

The accurate interpretation of images from a computer vision-based system requires a sophisticated understanding of application requirements and the ability accommodate potential variation of a number of environmental variables, such as varying light levels. The development of complex algorithms for different applications often requires new analytical approaches, as well as extensive software development skills specifically related to machine vision. These skillsets are unique, and usually outside of the scope of skills available at a transportation agency. As a result, agencies may need to partner with private sector vendors of machine vision software, or academic institutions whose staff or students have the unique experience needed to create or modify machine vision software.

Computer vision technology requires significant computing resources (BITRE, 2014). Data processing and analytics in computer vision systems is usually intensive and requires large amounts of computational resources and memory. For example, a simple camera with 800 x 600 pixel resolution is able to capture more than one megabyte per second without image compression (image compression algorithms require additional computational resources). For many ITS applications, this massive amount of data needs to be processed and analyzed in a fairly timely manner. For applications that are not time sensitive, all this data needs to be stored for post-processing, and it is therefore no surprise that many vision-based systems are usually equipped with significant processing memory and data storage.

What are technical considerations in its use?

Completeness: Data gaps exist due to functionality issues.
Accuracy: Limitations due to functionality issues.
Verifiability: Information extraction algorithms hinder verifiability.
Dynamism: Time from capture to analysis is lengthened due to enormity of data processing requirements.
Durability: The many promising applications ensure its future stability as a source of data.

Computer vision systems suffer from limitations in completeness and accuracy. The systems may not function or suffer impaired functionality under some lighting conditions (e.g., dusk, dawn, darkness) and inclement weather like rain, snow, or fog (Guan et al, 2012). Another technical challenge in urban environments includes partially or fully hidden vehicles in dense traffic, for example, a car partially blocked from view on the far side of a semi-truck may not be recognized. (Bush, 2011). Since many transportation applications of machine vision operate outdoors, they can be very susceptible to illumination variation such as shadows or other low lighting conditions. For example, in segmenting foreground objects from the background, color and shape are often considered to be the main attributes to compare against an existing stored image, but both color and shape can be highly affected by illumination conditions and viewing angles. Illumination variation further complicates the design of robust algorithms because of changes in shadows being cast. For example, a functional tracking algorithm for vehicles may fail due to the frequent alternation of direct light and shadow of high-rise buildings in urban downtowns. Illumination variation remains the main obstacle for robust computer vision-based ITS applications to overcome. Pairing a traditional camera with other sensors—radar, or especially an infrared-enabled camera—will improve object and pedestrian detection accuracy in a variety of lighting conditions (Iwasaki et al, 2013).

Verifiability of the source data is an issue, particularly when sophisticated machine learning algorithms have been applied to extract the required information, because of the “success rate” of correct identifications of vehicles. Historically most validation and verification is performed manually which can be prohibitive.

In terms of dynamism, the time from capture to processing is dominated by the enormity of data processing requirements. Machine vision as a transportation data source will have excellent durability because of the many applications of computer vision systems for traffic management, weigh-in-motion commercial vehicle inspections, security, parking, border control and other transportation purposes.