Computer system errors and failures are very expensive -- both in terms of time, money and in some cases, human lives [1]. As society turns to computer automation to improve efficiency, it has to ask (1) itself whether or not it is worth it. Human operators are prone to errors so society mitigates that risk by requiring those human operators to go under special training, compensate them more for their successful efforts and sanction them with fines, malpractice suits and even prison time for failures. Computer systems aren't treated the same way -- they aren't rewarded or punished but that's because (2) threatening a program with jail isn't going to affect their overall performance. However, computer systems can be scrutinized on their qualifications and the decision of whether or not one should get the job should be based on how well it can do it.
(3) It is still theoretically possible to create a completely reliable software system. However, such as system would have to be very simple and have a small number of finite states so that it is testable. In other words such a system would also be either impractical or so complex that it is practically impossible.
Mellor defines the reliability of a computer system "as the probability that it will not fail" during a period of time under certain conditions. Actually this probability in itself is not reliability but something that can be used to measure it. It would probably be up to independent bodies or standards commissions to measure and certify any rating in order for it to be credible to the general public. Perhaps such a body should exist and each so-called critical system should be required to have its reliability probability rated and published. Then the general user could be made more aware of the reliability of a system and then make an informed choice of how much s/he should actually depend on it. (4)
Unfortunately this is currently not feasible because there isn't any good (or even agreed upon) way to evaluate software reliability before having it go live and on line. Evaluating a system by running a (5) real-world simulation should not be acceptable. The simulations would be subject to the same biases and assumptions that the developers had in the first place when they built the system -- there would always be the danger that the designer forgot to take some situation into account.
For now, the best way to improve quality is to make sure the system has few errors as realistically possible when it is released. Ideally, software would be released with no errors at all. However, formally proving a complex software program through empirical testing each and every state is impossible, as it would take more time then the entire life span of the universe [1]. The next best step is exhaustive testing of each line of code but this is expensive and time consuming.
In the end it may be up to the end-users to decide how much they are willing to spend on development to improve quality and reliability. There are no easy answers here and the solution is essentially a trade off between cost and risk.
References:
[1] Forester, Tom and Perry Morrison. "Unreliable computers." In Computer Ethics: Cautionary Tales and Ethical Dilemmas in Computing. Cambridge, MA: The MIT Press, 1994. Chapter 5, Pages 105-129.
I agree with you that there is a trade-off between cost and risk. From an ethical standpoint, we can ask the question of how much risk is acceptable. For many other products, such as cars and prescription drugs, there are clearly defined standards of acceptable and unacceptable risk (whether there are the "right" standards is a different question). I cannot decide, for example, that I want to buy a car whose brakes have not been tested. I think for life-critical applications, standards such as these would be appropriate for software - the spending habits of end-users should not be the only determinant. On the other hand, perhaps for non-critical software reliability is not such an important concern.
(1) Society cannot ask; only its members can.
(2) Personifying software in this way is an unusual way of looking at it! Usually the designer or programmer is the one who's scrutinized for a position.
(3) Actually, it is only possible to prove that the code satisfies the design requirements, and performs in a way that is "correct" - that is according to specifications. Those proofs alone do not imply that the system will be reliable - for example, the system requirements could be correct and incomplete, or the physical components of the system running the software could fail.
(4) I like this idea - something like CSA ratings for software. It would be especially useful for libraries of functions so developers and users could have confidence in applications built with third-party components.
(5) What do you mean by a "real-world simulation"? Do you mean a simulation in an artificial environment, or a closely monitored test in a controlled "real-world" environment, or something else? The closeness of the test environment to the real environment in which the software will be used would make a bid difference to the usefulness of test as a predictor of the software performance. The problem of designer biases that you mention could be reduced by having independent testers.
9/10