What is the Mapbox Vision SDK?
The Vision SDK uses highly efficient neural networks to process imagery directly on user’s mobile or embedded devices, turning any connected camera into a second set of eyes for your car. In doing so, the Vision SDK enables the following user-facing features:
- Augmented reality navigation with turn-by-turn directions
- Classification and display of regulatory and warning signs
- Object detection for vehicles, pedestrians, road signs, and traffic lights
- Semantic segmentation of the roadway into 12 different classes (Full list below)
- Distance detection that indicates spacing to lead vehicle
- Lane detection that indicates home lane, total number of lanes, and direction of travel
What can I do with the Vision SDK that I couldn’t do before?
The Vision SDK is a lightweight, multi-platform solution for live location. Situational context is interpreted with AI-powered image processing that is fast enough to run in real-time, yet efficient enough to work on today’s smartphones running completely on device. The Vision SDK is comprised of a suite of driving features that previously required specialized hardware and software systems to run.
The Vision SDK brings the developer into the driver’s seat with a comprehensive set of real-time navigation and mapping features that fit into a single mobile or embedded application. The Vision SDK unlocks augmented reality turn-by-turn directions, classification of road signs, detection and segmentation of various road features, customizable safety alerts, and more. Live interpretation of road conditions with connectivity to the rest of the Mapbox ecosystem gives drivers access to fresher, more granular driving information than ever before.
How does the Vision SDK tie into the rest of Mapbox’s products and services?
Mapbox’s live location platform incorporates dozens of different data sources to power our maps. Map data originates from sensors as far away as satellites and as close up as street level imagery. Conventionally, collected imagery requires extensive processing before a map can be created or updated. The innovation of the Vision SDK is its ability to process live data with distributed sensors, keeping up with our rapidly changing world. Changes to the driving environment are detected on the spot and uploaded to make low-latency, low-bandwidth updates to the living map.
What is the difference between the Vision SDK and the Vision Teaser?
The Vision Teaser is a reference app that we are sharing with customers and developers to give a sneak peak of the AI-powered computer vision features that we are building. The Vision SDK allows you to bring all of the functionality showcased in the Vision Teaser into your own applications.
How do I get started with Vision SDK?
To begin developing on the Vision SDK, you will need to apply for access on our landing page. When you receive access, we will share the location of our beta repository with you directly, which will include instructions on how to get up and running. We are constantly adding customers to the beta but the space is temporarily limited.
What are the components of the Vision SDK?
There are three components to the Vision SDK: VisionCore, VisionSDK, and VisionAR.
- VisionCore is the core logic of the system, including all machine learning models; it exists as compiled library for each platform with a user-facing API.
- VisionSDK is a framework written in native language (Kotlin for Android, Swift for iOS) that encapsulates core utilization and platform-dependent tasks. It calls VisionCore.
- VisionAR is a native framework with dependency on the Mapbox Navigation SDK. It takes information from the specified navigation route, transfers it to VisionCore via VisionSDK, receives instructions on displaying the route, and then finally renders it on top of camera frame using native instruments.
What platforms are supported by the Vision SDK so far?
- iOS 11.2 and newer
- Swift 4.1
- Recommended devices for development:
- iPhone 7 // 7+ // 8 // 8+ // X
- Android 6 (API 23) and higher
- QC Snapdragon 650 // 710 // 8xx with Open CL support
- Recommended devices for development:
- Samsung Galaxy S8, S8+ // S9, S9+ // Note 8
- Xiaomi Mi 6 // 8
- HTC U11, U11+ // U12, U12+
- OnePlus 5 // 6
What other platforms will the Vision SDK support?
Support for other Android chipsets, as well as embedded Linux, is coming by the end of the year.
Do I need to use my data plan to utilize the Vision SDK?
Yes. Augmented reality navigation and automated feature extraction require connectivity to work. However, the neural networks used to run classification, detection, and segmentation all run on-device without needing cloud resources.
How much data does the Vision SDK use?
The Vision SDK beta uses at most 30 MB of data per hour. For reference, this is less than half of what Apple Music uses on the lowest quality setting.
Can the Vision SDK read all road signs?
The latest version of the Vision SDK recognizes over 80 of the most common road signs today, including speed limits (5 - 120 mph or kph), regulatory signs (merges, turn restrictions, no passing, etc.), warning signs (traffic signal ahead, bicycle crossing, narrow road, etc.), and many others. The Vision SDK does not read individual letters or words on signs (OCR), but rather learns to recognize each sign type holistically. As a result, it generally cannot interpret guide signs (e.g. “Mariposa St. Next Exit”).
What type of road network data is Mapbox getting back from the Vision SDK?
The Vision SDK is sending back limited telemetry data from the device, including road feature detections and certain road imagery, at a data rate not to exceed 30 MB per hour. This data is being used to improve our products and services, including the Vision SDK.
Where can I use the Vision SDK?
- Semantic segmentation, object detection, and following distance detection will work on virtually any road.
- The core functionality of the augmented reality navigation with turn-by-turn directions is supported globally. AR navigation with live traffic is supported in over 40 countries, covering all of North America, most of Europe, Japan, South Korea, and several other markets.
- Sign classification is currently optimized for North America, with some limited support in other regions. Sign classification for other regions is under development.
What is “classification”?
In computer vision, classification is the process by which an algorithm identifies the presence of a feature in an image. For example, the Vision SDK classifies whether there are certain road signs in a given image.
What is “detection”?
In computer vision, detection is similar to classification - except instead of only identifying whether a given feature is present, a detection algorithm also identifies where in the image the feature occurred. For example, the Vision SDK detects vehicles in each image, and indicates where it sees them with green “bounding boxes.”
What is “segmentation”?
In computer vision, segmentation is the process by which each pixel in an image is assigned to a different category, or “class”. For example, the Vision SDK analyzes each frame of road imagery and paints the pixels different colors corresponding its the underlying class. The Vision SDK supports the following classes: roadway, painted roadway (e.g. crosswalk), lane boundaries, traffic lights, traffic signs, vehicles, cyclists, pedestrians, buildings, vegetation, sky, other.
What is the difference between detection and segmentation?
Detection identifies discrete objects (e.g., individual vehicles) and draws bounding boxes around each one that is found. The number of detections in an image changes from one image to the next, depending on what appears. Segmentation, on the other hand, goes pixel-by-pixel and assigns each to a different category. For a given segmentation model, the same number of pixels are classified and colored in every image. Features from segmentation can be any shape describable by a 2-d pixel grid, while features from object detection are indicated with boxes.
How accurate is the feature extraction?
Accuracy of the feature extraction depends on several factors. As with a navigation app, your phone’s GPS is used to figure out where you are when a given image is seen. The accuracy of your location estimate depends on where you are. Location accuracy works better when you are outside with a lot of the sky visible. It is generally poor indoors, in tunnels, or when you are surrounded by lots of tall buildings. Once the Vision SDK has an estimate of its location and orientation (aka “pose”), it estimates the location of detected objects based on the properties of the camera being used (for example, the focal length of the lens). When traveling on a flat surface, the location of different parts of an image, (x,y) in pixels, correspond closely to different parts of the real world (latitude, longitude). Under good GPS conditions (decent amount of sky visible), the Vision SDK is able to determine the location of a detected object or feature to within a few meters. This accuracy is currently being improved; in the future, existing data from Mapbox’s location platform will be able to refine your location estimate further.
Can I rely on the Vision SDK to make driving decisions?
No. The Vision SDK is designed to provide context to aid driving, but does not replace any part of the driving task. During beta, feature detection is still being tested.
Can I use the Vision SDK to make my car drive itself?
No. The Vision SDK can be used to issue safety alerts and provide augmented reality navigation instructions and other features, but does not make any driving decisions.
What is the best way for my users to mount their phones when using the Vision SDK?
- The Vision SDK works best when your phone is mounted either to the windshield or the dashboard of your car with a good view of the road. We’ve tested a lot of mounts; here are a few of our favorites:
Some things to consider when choosing and setting up a mount:
Generally, shorter length mounts will vibrate less. Mounting to your windshield or to the dashboard itself are both options.
The Vision SDK will work best when the phone is near/behind where your rear view mirror is, but please note your local jurisdiction’s limits on where mounts may be placed.
Make sure the phone’s camera view is unobstructed (you will be able to tell with any of the video screens open).
Will the Vision SDK drain my battery?
The Vision SDK consumes CPU, GPU and other resources to process road imagery on-the-fly. Just as with any other navigation or video application, we recommend having your phone plugged in if you are going to use it for over 30 minutes at a time.
Will my phone get hot if I run the Vision SDK for a long time?
The phone will get warmer as the application consumes a decent amount of resources. However, we have not run into any heat issues with moderate-to-heavy use.
Will the Vision SDK work in countries that drive on the left?
Does the Vision SDK work in the rain and/or snow?
Yes. Just like human eyes, however, the Vision SDK works better the better it can see.
Does the Vision SDK work at night?
The Vision SDK works best under good lighting conditions. However, it does function at night, depending on how well the road is illuminated. In cities with ample street lighting, for example, the Vision SDK still performs quite well.