Dangerous failures in modern cars

Imagine that you are driving your dream car along a beautiful highway and entering a curve. One of the suspension sensors stops responding. What is going to happen? How did the software developers of safety critical systems influence this?

Most of the time, embedded systems work without major disruptions. The ideal situation would be that the entire device or part of the system we are responsible for never experiences any perturbation. Unfortunately, the real world is quite different. Errors can occur in any system and in reality such situations are inevitable.

In safety-critical projects this issue requires much more attention than it seems. The devices for which those projects are implemented, apart from the obvious means of transport (cars, airplanes), are also energy, medicine and many other areas where reliability and safety are highly important.

In safety-critical embedded systems the most common problems that can be encountered are communication errors. Regardless of whether it is about typical network connections (Ethernet) or communication between external systems (sensors). When such error occurs, in addition to its detection, the natural consequence is to handle the error. Of course, it can be assumed in advance that the system will go into “safe mode” in this case, but it is often not the best choice.

Let’s take the example of a sports car. We can consider one of the sensors installed in such a car. From the point of view of the algorithm, it does not matter which sensor is selected for analysis: one related to active suspension control, engine operation, turbo compressor or brake system control.

In the first case, you would have to define what “safe mode” means. If we do not receive the signal within the specified time, what kind of reaction should be performed? Should the mixture flow to the engine be completely shut off? If so, will it happen in a critical moment, e.g. when overtaking? If the suspension position sensor does not return the response on time, can the suspension mode be changed without surprising the driver with a change in the driving characteristics of the car? Such questions keep multiplying and the system architects and developers should try to envision different scenarios.

At the very beginning, however, the developer has to answer a fundamental question. Is it possible to react to the error without disturbing the operation of the system to a significant degree? When can you say with certainty that the situation is critical, and you should definitely enter “Safe Mode”?

Consider the communication error mentioned above. If for some reason information from a subsystem is missing, what can be done? The ideal situation would be to know if we are dealing with a temporary failure or a permanent one. Unfortunately, no one can predict the future, so the algorithm should be designed to handle both cases. In the event of a permanent failure (damaged cable, failure of the subsystem itself), there is nothing to do but go into “emergency mode” in the previously planned manner. Another procedure may be developed during a temporary accident.

The simplest mechanism is to pass the previous value to the system and wait for the next sample. The number of failures can be counted and after exceeding a certain threshold a fatal error can be reported. But another question arises: What if the correct and incorrect samples are intertwined? Then a simple counting algorithm has to be modified into a time window counting algorithm. If the number of errors does not exceed the set threshold, normal work can be continued.

The above solution seems to cover the problem with one exception. The input signal is a slowly changing signal. This means that the difference of successive values is within the assumed framework. Then sending the previous value should not disturb the operation of the rest of the system in a significant way. What if we have concerns with fast-changing signals? In this case, the response time to the sampling time will be important. If we can afford to work on one sample “backwards” then we can use the polynomial interpolation algorithm. Such a “substitution” of the missing sample by calculating it based on historical data and the latest sample gives very good results. From experience, I do not recommend extrapolation attempts, as you can introduce a big error into the system.

The above considerations apply only to one problem in the field of “safety critical” systems that should be considered when designing such systems. Each such algorithm should be matched to the requirements set for it. The above example concerns only a very narrow issue, but it outlines the challenges that the development team must face during the work.

A good practice before starting to program is to analyse the many cases that a programmer has to deal with. Predict many scenarios, often also those that seem very unlikely. Plan the entire operating strategy, especially handling errors.

Many companies that outsource the implementation of “safety critical” projects also provide quite detailed requirements regarding the issues of error handling. However, at Codelab, apart from the implementation of the scope ordered by the clients, we also try to support them with our experience gained over many years of working on this type of projects. Such cooperation in this area contributes to creating better and, most importantly, safer solutions.

Cookie	Duration	Description
_dc_gtm_UA-174303230–1	1 minute	Google Tag used for analytics.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site’s analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
LeadInfo	2 years	Leadinfo places two 1st party cookies that only provides Codelab insights into the behaviour on the website. These cookies will not be shared with other parties.
LeadInfo Session Cookie	current session	Leadinfo places two 1st party cookies that only provides Codelab insights into the behaviour on the website. These cookies will not be shared with other parties.

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
Consent	16 years 9 months 22 days 11 hours 2 minutes	This cookie is essential for managing and recording your consent preferences for our website. It stores your choices about the use of cookies on your device, ensuring that only the cookies you have approved are activated during your visit. This cookie does not collect personal data and is strictly necessary for compliance with legal requirements regarding privacy and cookie usage
cookielawinfo-checkbox-analytics	1 year	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category “Analytics”.
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category “Functional”.
cookielawinfo-checkbox-others	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category “Other”.
cookielawinfo-checkbox-performance	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category “Performance”.

Find us at