Anyhow, thanks for posting! Saved this for a deeper dive for later just on the account that it seems like an exceptionally well written and nicely illustrated paper.
the addition of the match statement in Python 3.10 in ~2022
MyPy which has been around since ~2014 I think
You can now write OCaml/Rust-ish code in Python :) I can definitely see that some people will find it ugly, and I might have had that opinion 10 years ago.
But I find it is pretty fun and easy to write little compilers and evaluators in this style. The style depends on the problem you’re solving, maybe even more than the language you’re writing in.
A few months ago I hand-translated a Rust program from BurntSushi to Python, and it’s all statically typed. It works pretty well, and the code is pretty much a direct translation, one-for-one.
This is also kinda how Oils is written, with the added benefit that we automatically translate the program to C++, for a predictable 30x-50x speeedup.
We use a schema language Zephyr ASDL to express algebraic data types, not the verbose dataclass stuff. And we have an idiom for switch using the with statement.
A bit off-topic perhaps, but I browsed quickly through the code you linked and lines 283-312 caught my eye: Maybe a dumb question :), but what exactly does the following part do?
out: Optional[‘State’]
Was not aware that you can also provide a string to Optional. Does it somehow “forward-declare” the type State you define a few lines later and the string gets later matched to the actual type?
Technically it hasn’t been needed since 3.7 because you can enable from __future__ import annotations which makes all annotations lazy. Although for type aliases you need to combine it with typing.TYPE_CHECKING.
Interesting article, thanks for posting! (Hadn’t noticed the email yet…)
The article and the CWE-comment from alilleybrinker reminded me of the typical taxonomies used in safety engineering/dependable computing literature and standards for errors/faults/failures. You might already be aware of all this, but this paper by Avizienis et al (PDF) is pretty good (see section 3.2). It’s quite high level/not SW-specific though, maybe there’s some more work out there to further categorize/define different types of bugs? John Knight’s book “Fundamentals of dependable computing for software engineers” also discusses the taxonomy of Avizienis et al a bit, but doesn’t go much further, he mainly categorises all SW faults as either design faults or “byzantine” faults (just another type of design fault).
Got all excited for an open Simulink-alternative, but it’s unfortunately a cloud-based simulation product. They seem to have the requisite ML-buzzwords included. The claims about accelerating simulation with their approach, if true, are quite impressive. Wonder if the actual users have a traditional model as a validation backup in case the ML-accelerated model shows something interesting/behaves unexpectedly. Seems that it would be sensible to somehow check the results if this is going to be used for anything remotely critical like pharmaceutical modeling as they mention in the article.
Thanks for posting! The technical side of things is quite interesting, but the writing style of the author is rather pompous.
“What if there were a way to automatically generate C-code, complete with formal proofs of correctness, removing any need to trust fallible human developers?” .. and what about the fallible humans writing the specs?
I’ll add that there’s been many specification errors in verified programs. Mutation testing was finding issues in them, too. We’ll need mamy eyeballs on them. Checking specs/proofs in general math is a social process but mathematicians don’t like computer-focused math. So, in a way, we’re moving the problems from code nobody was reviewing or analyzing to an esoteric language fewer people will check.
It’s still great work due to the ubiquity of C, good tools for seperation logic, removing attack surface in spec-to-code gap, and pushing synthesis forward. Might as well keep putting good stuff together to see what happens.
Far as general problem, I’m a bigger fan of doing stuff like this with notations like ASM’s that wider audience could use and review. Or languages like ML or SPARK with C generated from them. If algorithmic, a side benefit is one can retarget static analyzers and automated provers to them to confirm some properties without really getting into formal logic.
Weird. Interesting, but weird. Thanks for posting. Don’t really “get it”, though, and unsure whether it’s worth further time investment.
I watched a couple of the introductory videos [1,2] they have on the page. From the first one I somehow got a cybernetics-vibe. In both they keep giving examples from quantum physics, biology to ovens and robotics. But then they always refer that this stuff somehow provides a foundation for fundamental physics. The paper, which apparently kicked it off [3], seems to be (based on a very brief browse) about marrying physics and computation together somehow. Is this trying to be a new foundation for physics, or is this a general theory of everything, or the next iteration of physicists implying that all other disciplines are just physics? Can anyone give a brief explanation for an engineer?
My sense is that this is proposed as a new foundation for physics (similar to the situation in mathematics where category theory and homotopy type theory are proposed as alternatives to set theory as a foundation for mathematics). So you would use constructor theory as a framework for defining physical theories. It is suggested that constructor theory is general enough to be used as a framework for theories outside of just physics.
The deep connexion with computation and information theory is interesting. So it ought to have applications in computer science. Maybe it will turn out to be an alternative to category theory: both constructor theory and category theory are built on top of algebra.
The difference is that category is a well established field, and using it as a foundation has gotten us interesting mathematical insights in general. Neither seems to be true here.
Thank you for the clarification. That makes some sense. I feel there’s the typical danger here of proposing a very general model of everything, which in the end doesn’t bring anything new to the table. But I guess if it’s a lens through which you can describe and try to understand some general underlying patterns then it will have served its purpose.
Just throwing some ideas out there, the following is likely complete nonsense:
On the perpetually growing reading list I have Whitehead’s Process and Reality. I wonder if some parallels could be made between the focus here on transformations, or constructions, and Whiteheads’ views of entities as processes.
i have used the following style for state-machine implementation to great advantage. for larger more complicated cases i have found that HSM or hierarchical-state-machines are also pretty useful toolkit to have around.
yes, i am familiar with harel’s paper, and have used that as a basis for design & development of hsm’s used @day-job. thanks for the book link on behavior trees. looks pretty interesting.
as a side note, for parsing, specifically network protocols where things are strictly (a very loose definition of strict fwiw) specified through rfc’s etc., i have relied mostly on ragel. would love it hear your (and other folks as well!) experiences/wisdom in this regard as well.
Galois itself has a lot of projects on Github: https://github.com/GaloisInc/ , would recommend to check those out. They’ve built quite a few tools themselves (seems to be mostly in Haskell), and seem to work quite a bit with Coq and SMT-Solvers like Z3.
Every now and then I’ve used LTSpice (free to use, but not open source) to do circuit simulations for work projects (industrial control systems). I first learned the GUI during my studies. Not sure at exactly which point I found out that actually the graphical models are translated into text before they are fed to the simulator, but figuring this out opened quite a few possibilities.
Recently for a work project we had to figure out whether a particular circuit design would end up in an unsafe state as the result of a different fault modes/combinations of faults. The circuit was complex enough that it didn’t really yield to normal “stare at circuit diagram and think what could happen”-style FMEA. We ended up building fault simulations into the model and then adding parameters to switch different faults on and off. Because the whole model was just a single text file, it was then possible with simple search and replace-style script to change the parameters of the simulation. This allowed to programmatically simulate (you can call the simulator with a SPICE-netlist through the command line and it dumps the simulation results to a .raw-file, which is binary, but there are libraries to read it) through all the interesting fault modes, combinations of them and combinations with different circuit control states and for all the simulations dump the results into a bunch of figures and tables for easy analysis. We easily found a couple of unsafe fault modes which were very unintuitive and would not have been found without the simulation. Coding the whole thing initially took quite a few hours, but combined with a couple of circuit revisions ended up saving a lot of tedious manual work.
Of course similar things could be done with other simulators, but usually there’s some clunky API I would have to work through, with the text-based models creating own tools and scripts on top is rather nice.
In another case we were trying to simulate another circuit which used a couple of ICs. We found a SPICE-model for one of the ICs online, the model was dated to 1993. After only a minor modification it was possible to run it on the 2020 simulator, which is by itself quite a feat. It didn’t quite contain all the functions that the modern version of the IC had, but since it was text-based and not encrypted (like some manufacturers like to do), we were able to reverse engineer the model and found out you could simulate the modern version of the IC at least to some approximation by combining the building blocks provided by the 1993 library.
Had the model from 1993 been in a binary format for some GUI program, I doubt we could’ve ran it or figured out how to open the thing at all to be able to figure out what the thing was actually doing.
Where nuclear fission – the splitting of atoms to release energy – has proven incredibly powerful but insanely destructive when things go wrong…
I disagree. The chernobyl meltdown (which is what I’m assuming the author is referring to) occured solely because of human error. In addition to the reactor being poorly designed (it lacked a proper containment vessel, for pete’s sake), it’s operators were performing an unauthorized experiment. They also shut down the coolant system and ignored computer warnings and the protests of a technician at the scene.
EDIT: well, it looks like the experiment was authorized. See below.
The experiment was authorized, and that’s the problem—the supervisors were deaf to the safety concerns, and refusing to proceed would cost the operators their jobs, if not their party membership and with it a prospect of finding any decent job until the breakdown of the USSR.
The factor that made so many disasters possible is that operators didn’t have the authority to veto an unsafe action. At least once a high ranking officer who insisted on an unsafe action died himself.
Not necessarily related to the posted article, but I wanted to reply regarding your statement of the meltdown occuring solely because of human error.
Nuclear systems are rather complex, and so are the accidents that occur with them. Traditionally it was typical to explain these things with human errors, but in a more modern safety view there might be human errors, but it’s usually only one factor in a larger chain. Usually the operators are trying to do their best in the given situation. The questions should be asked, what lead to them doing the mistake? This could be stress, the quality of the HMI, stressful situation, pressure from higher ups, etc. Slide 9 in this presentation sums it up very nicely. For further discussion you can look up Nancy’s book (free download under “Open Access”, or Sidney Dekker’s writings.
There’s already a bit older reference regarding TMI, but very enlightening nonethless, Charles Perrow’s Normal Accidents. I read it over the last holidays, and it was still very interesting, even if at some points the age (original from 1984) showed a bit. Can’t recommend the Y2K-discussion at the end though..
For a better discussion about Chernobyl, I found a few articles which provide a bit more modern view on the accident on a quick glance:
The latter one is unfortunately behind a paywall, and I couldn’t find a freely accessible version, but there’s a certain site, one might call it a hub for science, where one can find it with the DOI-link..
I personally agree that given the current climate situation we should keep using existing nuclear plants and improving the technology while figuring out alternatives. But after reading e.g. Perrow’s book, I would say the risk associated with these things isn’t necessarily exaggerated, we should take greater care with them than we have historically done.
If I recall correctly, very little contamination was released at the Three Mile Island accident (compared to the Chernobyl accident, anyway – any amount of radiation is harmful). Also, the plant’s operations were engaged in some dangerous experiments, which led to the meltdown.
The fallout from Three Mile Island was political/cultural, rather than radioactive. It really shook the public’s faith in the security of nuclear power.
Which, I think is rather sad. IMHO Nuclear fission is a great alternative to coal-based power (it produces much, much, much less toxic waste per year, for one thing), but because of accidents like this which have been exaggerated by some environmentalists and the media it just isn’t being used to it’s full potential.
I understand that but I barely trust engineers with making good choices when it comes to software, how am I ever going to trust software that handles a nuclear power plant?
And I certainly wouldn’t live near one located in most countries in today’s world.
I’m sure the distribution of software projects have several underlying distributions. So saying it’s just “log normal” underestimates things. eg. Some projects have high levels of technical risk… ie. Some tasks are just not feasible and the time taken to prove it’s not feasible is not finite.
I’m sure the distribution is truncated fat tailed. ie. The only reason the mean time is finite is a lot of projects get truncated at the far right due to budget limitations.
The whole field suffers from the Ludic Fallacy. Projects aren’t a game operating to strict rules. Shit of every conceivable flavour happens in real life all the time.
There is a fundamental disconnect between contracts, “I promise to deliver X by hard deadline Y in return for payment Z” and estimating “If I do X, how long will it take me and how much will it cost?”
Business is fundamentally a betting game because “how long will it take me” and “how much will it cost” are fundamentally random variables.
There is irreducible risk in collapsing a probabilistic estimate into a hard contract.
You can reduce that risk, but you cannot remove it.
If you’re uncomfortable with that you need to be in a different game.
I’ve started the Black swan and Antifragile both a couple of times but didn’t manage to get through either one of them. One problem was Taleb’s writing style, but also a lack of background knowledge in statistics. (I don’t have a very good understanding of statistics outside of what they teach during engineering programs in a university.) Trying to get even the part 2.1 of the linked doc proved somewhat difficult.
You seem to be quite knowledgeable in these topics. Would you have any pointers, where one could possibly even get started, if the goal would be to self-study the required background knowledge for this?
He’s core point is most of the “background knowledge in statistics” you’d get in the average stats courses at uni are focused on that small subset of problems that are amenable to analysis.
Preferable those with closed form solutions, preferable teachable in a course that has to be endured.
(Typically by engineers or economics or psychology students or whatever that actually want to be studying something else, but this damn stats course is a prerequisite.)
Alas, as he hammers home almost to the point of tedium, most of reality does not satisfy the requirements for those classical of statistical methods to be appropriate.
Which leaves us in somewhat empty….
Fortunately Taleb does try hand us some digestible take aways, I’d focus on the section “2.3 the main consequences” and google around to try make sense of each point.
Alas Science traditionally has the “All Other Things being Equal” caveat…. which in an industrial software project is never ever true, which makes drawing valid conclusions very hard. https://leanpub.com/leprechauns/read
The assumption in all traditional stats is the data isn’t truncated. Real world projects are always truncated in several dimensions. (Insufficient money, CPU power, project becomes irrelevant, …..)
“They could have designed a two-channel system. Or they could have tested the value of angle of attack on the ground,” said Lemme. “I don’t know why they didn’t.”
Partially for the same reason the FAA deferred certification to Boeing: deadlines and management pressure. I highly doubt this is the end of the story; there are deeper issues here.
I think they also touched another point in the article (Disclaimer: I’m not an expert in safety regulation for aviation, but I assume the principles are rather similar to those in industrial systems). The safety analysis of the MCAS system classified the risk associated with the system too low. The single channel system fulfilled the regulatory requirements for the lower estimated risk level, had the risk analysis classified the associated risk higher, they would’ve had to build this particular system with a two-channel architecture. To be honest, I was surprised when I read that it was a single-channel system, I was under the impression that mostly everything in aeroplanes is two- or three-channel. :)
But this was also just a single problem in the whole engineering process. I spotted this article on the System safety mailing list, some people there have rightly noted, that the problem is with the overall systems engineering in this case.
The point of outsourcing the certification to Boeing was also a surprise to me. I’ve heard that major corporations can pull such tricks off, where they can argue that a separate organisation of the company is sufficiently independent of the development team to be able to achieve the required independence when executing the risk analysis and “third-party” review. But somehow the way it was done here seemed quite “unclean”.
A STAMP investigation would surely be interesting, we’ll see if that happens, I have the feeling that it’s still not that widely used even in this particular industry.
Thanks for posting! The article was interesting, even if at some points it felt a bit overly dramatic.
If someone wants to know a bit more about the technical side, there’s a report from Dragos available and a short talk from Schneider Electric on Youtube.
There are some interesting details related to the incident, the whole thing wouldn’t have been possible, if a physical switch on the safety PLC was used to set the device to ‘run’-mode (prevents software changes), as instructed by the manufacturer, instead of being left in ‘program’-mode.
Selfishly, I’d recommend my talk on YouTube over Schneider’s. I was on site for the incident response and led the investigation of the controls systems side of the breach. Our team took over from Julian’s after they identified that there was malware present on the engineering workstation.
Part of me says Cool…. part of me says it’s so fuzzy and so glossing over details as to be provably useless. (Look at any system which produces chaotic behaviour).
This seems like it might be a good use for it….
ie. Use systems theory to identify all contributory factors and take actions on all of them instead of a single scape goated “root cause”.
But then that niggle about vagueness and fuzzyness and how it doesnt fit with any software intensive system I work with….
What’s good about the presentation in this handbook is, that the STPA-method is presented as a step-by-step process, without much of the fuzzy stuff. Leveson & co. claim to have reached pretty good results with the methodology, usually exceeding what was reached by FMEAs or HAZOPs. Leveson typically argues replacing the old methodology completely with STPA. A couple of presentations I’ve seen about it in an industrial setting usually suggest using it for a high level analysis in combination with e.g. FMEAs for low-level stuff. The hope there is, that you catch the more complicated causal scenarios (possibly involving humans with “fuzzy” behaviour) but for circuit board-level stuff remain with the traditional methods.
You might be interested in a recent PhD thesis from Uni Stuttgart, where they combined STPA into a systems engineering approach for software-intensive systems in an automotive setting. These slides provide a nice concise summary of that work. The approach uses STPA to identify system specifications, which are then translated into something more formal for model-checking / software testing.
I’ve had similar feeling when reading Leveson’s stuff. I think however, the analysis method they’ve built is powerful, also outside the domain of safety engineering. maybe especially when incorporated in a similar way as suggested in the thesis above, where it’s used to formulate specifications early on, as writing specs might often be the more fuzzy part of the system engineering process?
One thing, which would’ve been a cool addition, would’ve been a control graph of the system being analysed. There’s a lot of literature with examples of STPA/CAST-analyses for safety, but very few for security, and all of them are related to analysing a system around some physical process. Would’ve been interesting to see your take on it here. There’s a bit about extending STPA for Security e.g. here and here.
Another nice and a bit more practical resource for learning about STPA (The hazard analysis method based on STAMP) is the STPA Handbook (pdf) Leveson and Thomas published last year. Leveson’s 2012 book, at least for me, was at points a bit long winded (nonetheless a very good book), but I’ve found the handbook a better reference for analyses. It also covers a lot of typical common mistakes etc.
If I may nitpick a bit, STAMP is an accident model, STPA and CAST are analysis methods based on it :) Leveson extended on ideas from Jens Rasmussen, as she talks about e.g. in this publication. If you’re interested, check e.g. this influential paper from Rasmussen.
PS. The link to Leveson’s homepage at the end of the article is not working.
I’ll try to take a more detailed look later, but would be interesting to see how their approach scales to something actually useful. The system they are using as an example is rather simple, and they mention having used a 150s time limit for the solver. I guess this is only for the marginal/instable cases with gain >= 1, for which the solver apparently timed out? Wonder how much time the solver needed for the proven cases.
While searching, found a follow-up paper with an interesting comparison of different model checkers for similar stuff from 2015. Hopefully you weren’t planning to post this next. :)
I didn’t have it! So, thanks. I actually wasn’t looking for Simulink/Stateflow stuff this year. I just stumble on it periodically because of how I word searches. I might spend more time looking for them next year due to importance in embedded. Are there any open tools that are comparable to those and high quality enough for professionals?
EDIT: Also interesting is it’s two women. I tried to find out what their other work was to see what’s industrially applicable. Both had pages at Britol, here and here, with Araiza-Illan now at ARTC. It’s heavy on industrial application. Araiza-Illan also posted four hours of verification proceedings to YouTube. Interestingly, Eder worked at XMOS on power-aware design. As Bristol is Cadence’s lead university per that page, she’s probably also feeding ideas into their verification groups. If not, she at least started with real-experience per this interview which I’m still watching right now.
So, you found not only a paper but two women doing interesting work with strong ties to industry and/or real-world experience. Extra interesting. Also, now on my list to potentially hire if I get into verification. :)
Re: Open alternatives - This is a topic where I’ve wanted to look into lately. Probably a large portion of normal data analysis people might do in plain old Matlab would be easily replaceable with Python and the common analysis libraries for it.
As for Simulink/Stateflow:
I’ve been meaning to play around with OpenModelica, as I would really like to have an open alternative available if I don’t have access to Simulink in the future. The tool has been in development for a long time (still very active, with EU-funded research projects developing it further), has a consortium backing it with major players involved, and has been used as a basis for industrial tools. It’s the only thing I would seriously consider as a realistic open source alternative. There’s also JModelica, which is another Modelica-language based open source tool from a swedish company (apparently they base their commercial tools on the open core developed with couple of Unis).
Simulink was always a nice tool for building models of dynamic systems, continuous or discrete, as the graphical notation makes sense for the sort of stuff in control engineering or signal processing, where you would anyway draw block diagrams or similar charts to try to clarify things. However, at least in my experience, building any sort of non-trivial software-like logic with the graphical blocks always resulted in a horrible mess. The Stateflow addition with hierarchical state machines has been a really nice addition to help with that aspect (before moving to software-in-the-loop with generated code or custom code through the C-interface). I think OpenModelica should have an equally good system/solver for the typical control/signal processing stuff (I read somewhere that the solver should be somehow even better/faster than the Simulink-solvers due to a different approach internally), I’m not sure how well supported the FSM-stuff is. At least there should also be a C-interface, so it should fill most of my use cases. For use with HW-in-the-loop-simulators, both systems support the FMI/FMU-standards, so ideally that should also be covered, but I’ve only worked with dSpace, which is rather heavily integrated with Matlab/Simulink.
I don’t know how the usability of the graphical editor with OpenModelica is. You could always just write Modelica directly, but somehow I was always put off by the syntax of the language. :)
I submitted OpenModelica here before. I don’t know much about it past the description sounded really cool and general-purpose. Yeah, expanding it to have parity with or better methods than Simulink/Stateflow makes sense. I didn’t know about FMI/FMU. Looking at this article, it says FMI is Modelica project. It has a nice pro’s/con’s section comparing to Simulink where improvements might be made.
I’m not sure about FMU’s. It looks like they’re like modules or executables produced as a result of doing FMI. Like class or jar files in Java maybe. Or am I pretty far off?
re syntax. Add it to list of Better Language extracts to and integrates with Supported Language. :)
I also have pretty much no experience actually using the FMI-standard, as I’ve been stuck with Simulink (and the S-functions also mentioned in that Wikipedia-article). My undertanding is, that the FMI-standard specifies a way to describe the interfaces (IO-points) and the internal logic, and FMUs are then models/modules implemented according to this standard. Modelica is going for an object-oriented approach, so I think thinking of FMUs as classes is probably a good analogy.
Typically, at least in the S-function world, and I think it’s the same here, one defines the time derivative of each continuous state within the model, which are fed back to the solver/simulator executing the model, and it handles the numerical integration and feeds to integrals back to the FMU (See p.69 here). For discrete stuff in the S-function world one defines a step-function, which is executed by the simulator every x seconds to solve the next outputs, FMI-standard seems to have an event-based approach, but “time-based events” probably boils down to the same thing.
I tried to dig up some meaningful examples outside the actual spec, but they were a bit hard to find.
This paper has an example of an FMU-description in Modelica: http://www.ep.liu.se/ecp/056/003/ecp1105603.pdf (I assume then this could be used to generate the XML/C-interface described in the FMI-standard?)
It looks pretty complicated haha. I’ll probably have to try OpenModelica or Simulink some time in the future on small, realistic examples to learn how they work. Especially the thinking process behind it. Probably different than what I’m used to. Thanks for the examples, though. The transmission example particularly shows me how they use these tools.
It’s now happened a couple of times that I’m surprised by texts claiming that developers haven’t heard of state machines. At least for me the concept was introduced in an embedded systems course during the uni, and has been quite a common thing during my working life in that area (I’ve worked mainly with different embedded/PLC-based systems for machine controls), both in the control logic as well as with the fieldbus communication (e.g. the CANOpen state machine comes to mind). It’s often been a rather natural presentation when working with anything physical which has different modes and moving parts etc., just like the demo here.
Currently I’m working sometimes with the Stateflow extension of Matlab Simulink, which offers a nice utility for hierarchical state machines, though behind a rather expensive paywall. I had not heard of this language before, and it looks like they have code generators for different languages, nice. I’ll try to give it a go at some point. Thanks for posting!
Now if there only would be nice integration with something like OpenModelica to enable easily sketching out control logic against a process model..
This is an interesting piece of work. I had the chance to see a presentation from AbsInt at conference (for industry in safety-critical application areas, more industry people than academics) earlier this year. They presented mainly the same results as discussed in this paper. The company is doing the commercialisation for the tool.
For the discussion about which parts are formally proven or part of the “Trusted Computing Base” and which not, I think slide 34 from this presentation summarises it quite nicely.
If I understood it right, I think the main argument here is, that for industrial usage especially in x-critical applications, CompCert can provide one trusted part in the whole toolchain to produce an executable which is correct with high confidence. Key benefits being reduced need for testing/validation (esp. of the compiler itself) as well as the possibility to use compiler optimisations, which are usually not allowed,or only low levels, as part of the compiler certification/validation for safety-critical use. Of course the toolchain doesn’t remove the need for testing the end result, but at least they could trust the compiler itself.
They use several additional tools together with CompCert to achieve the high confidence in the end. Like they discuss in the paper, CompCert doesn’t do pre-processing itself. They use a “proven-in-use” gcc-version for that, which they additionally test themselves for the subset allowed by their coding rules (derived from MISRA C). The coding rules are then enforced by another tool, so they can have high confidence that the pre-processing is correct. They use another tool to check for runtime errors and undefined behavior. Lastly, they again use “proven-in-use” gcc for assembling and linking, but the translation is checked again with another tool.
CompCert uses “proven-in-use” tool (CIL) to map the C source output from the “proven-in-use” gcc to their lower level language Clight, then they generate assembler followed by “proven-in-use”, gcc, to assemble and link the code.
What is wrong with the gcc’s “proven-in-use” code generator?
The paper is not just about CompCert, it’s about how the main authors used the compiler in a specific safety-critical application as part of the toolchain, and what benefits it had in that case and how they managed the non-proven parts of the toolchain, where CompCert is just a single building block.
The problem in safety-critical applications, where the produced system is certified by a third-party (e.g. TÜV or Exida) for some specific use according to some standard (e.g. IEC 61508), is that the used tools need to be validated somehow. The problem then with gcc is, that it was not developed which such a use case in mind, so the user of the compiler needs to somehow validate, that the compiler is doing what he expects it to do. Other option is to purchase an expensive compiler, which is developed and certified for safety-related use. Some manufacturers are also offering certificates when using their libraries with a specific piece of their hardware with a specific compiler, with a specific language subset, which they have pre-validated for the user.
In this case by using CompCert as part of the toolchain, they had to do less validation for the compiler and were able to use higher compiler optimisation settings, as they had the proofs to show, that with reasonable high confidence they can be trusted. Since the other parts of the toolchain were not formally verified, they used different measures to qualify them for the use case, which was equivalent to SIL 3 acc. to IEC 61508 if I’m not mistaken. As a reference point, that criticality level usually means that if the system fails, it’s possible that several people are injured or die. Normally for such a use case one has to do a lot of verification effort for the used tools to prove to the third-party, that the tools are suitable for the job and with a high level of confidence don’t introduce errors in the end product.
Again, at least that’s my understanding of the matter.
Edit: couple of typos.
Edit 2: Forgot to answer your first remark. In my understanding, the gcc in the beginning is used only for the preprocessing (.c-files -> .i-files), so there in the first entry point to CompCert, it’s still plain old C. If I understand right, the first step CompCert itself does is to translate the C to Clight, but I’m not really an expert on the internal workings, I’m more of a potential user for the tool.
I understand about compiler validation. As far as I know CompCert has not passed a compiler validation suite.
The C semantics pass of CompCert is based on CIL. which last I checked fails the C validation suites.
What does this claim mean: “The CompCert front-end and back-end compilation passes are all formally proved to be free of miscompilation errors”?
What is a miscompilation error? Does it mean the source of CompCert does not contain any compilation errors? Since the semantic analysis of CompCert is based on a tool that has not been proved correct in anyway, I don’t see how this claim relates to C source.
Interesting text about compiler validation. I wonder if and how the validation you write about differs from the tool qualification done today for e.g. IEC 61508.
I have to admit I didn’t completely understand your previous comment, as I wasn’t aware of CIL. As mentioned, I’m not really an expert, but I tried to do some digging.
There’s a paper from 2012 where it’s mentioned that CompCert used an adapted CIL-library for parsing. In this paper they don’t mention CIL and only discuss about using gcc as preprocessor. There seems to be some documentation about the current compiler structure in this paper from 2017 (check esp. part 3) and in more detail about the steps in the current docs. The 2017 paper also discusses the semantic preservation. At least I dont find any references to CIL there. Maybe there’s an answer to your question there, I don’t feel qualified to answer it. :)
The slide has a nice breakdown of steps but is a little misleading. The formal methodists sometimes count anything proven to not be in the TCB or be untrusted. They had me in that habit a while back before I realized how much they screwed up. The truth is all of that is in the TCB. They’ve just applied formal proof to the stuff they’re not counting. It can still be wrong due to spec errors like the 3 or so found in Csmith testing. There’s also stuff it doesn’t model, esp during runtime, that can cause failures. Bit-flips in RAM or bit rot on HD’s are examples. The best they can say is certain parts inspire higher confidence they work. Whereas, this is an example of qualifying verification claims the right way.
“CompCert can provide one trusted part in the whole toolchain to produce an executable which is correct with high confidence. “
Good descriptions about its implementation aspects. Yeah, it’s supposed to be one link in the chain. There was another slideshow I saw showing everything that might be used in a DO-178B evaluation. They had something for requirements, a modeling tool for high-level design, something like Astree Analyzer for checking code, a test generator, and then maybe CompCert kicks in at last part. One tool in a whole, assurance-producing chain.
Yes, I realised it was actually a bit backwards in that slide after I posted the comment. Here is a version of the same thing they used in the presentation at the conference where I participated. If you’re interested in the slideset, I can somehow share it over PM, it’s in principle under copyright, so I don’t really want to post it somewhere openly.
They also used Astrée in this paper, the whole toolchain is summarised in the Figure 6. In the presentation I saw they also discussed the modeling tools etc shortly.. What’s discussed in this paper is funnily mostly the toolchain, where AbsInt can sell you solutions for the problem. :)
Edit:
Re: HW Failures during runtime
Yes, they didn’t mention that much in the paper. I would assume that in this case they also use redundancy, ECC-memory and all manners of self-tests implemented in software to combat and self-diagnose all the failures that could happen in the electronics during operation.
Here is a version of the same thing they used in the presentation at the conference where I participated.
That’s much better. It just says what parts they did a proof on without any false claims. Yeah, send it to me in case I need it for someone.
“What’s discussed in this paper is funnily mostly the toolchain, where AbsInt can sell you solutions for the problem. :)”
You’d think they’re motivated primarily by money instead of public good. This is the thing I hate most about CompCert: it was assigned to a private company as proprietary software. INRIA’s brightest build one of the most amazing tools in field of verification to have it partly locked up. Then, all the academics contributing to CompCert, which is free for them and maybe GPL stuff, are contributing to a proprietary product instead of an open one. Had they gave it to C0 or something totally free, then all the improvements in that slide would be totally free with companies optionally selling them. CompCert should’ve been Apache, LGPL, or something. It’s too important.
After one company did one in weeks, I got it in my head to eventually attempt an open project using ASM’s as an intermediate form since they have a ton of potential. They’ve already been used as IR in both hardware and software but not free or well-developed. If using them, one can target all the analyses and transforms on them with code generated afterwords or hand-optimized with equivalence check. As prior work shows, the equivalence check can happen at assembly level.
Last time I checked, CompCert did not do any formal analysis comparing its behavior and the semantics of the C language. The formal verification stuff is all back end, i.e., code generation.
Using the CompCert proof ‘methodology’, I could translate Visual Basic to the CompCert low level code (from which machine code is generated) and claim to have a formally verified Visual Basic compiler (i.e., no connection to the formal semantics of any language is required).
I thought I told you or you told me that CompCert uses its own dialect of C that they formalized. Its a subset with specific interpretations of anything with wiggle room. Most of their work centers on Clight as stsrting point. Most efforts targeting CompCert target it. One could check the MISRA-C to Clight translation by eye and equivalence tests if concerned.
I do think they should be more clear on exactly what they support at various times.
It’s possible my wording is wrong. I’m not sure how different something has to be from the original before it qualifies as its own version or dialect. Usually, once a non-standard change is introduced, people often refer to it as something different sometimes using “dialect.”
CompCert C has never been C itself. It was always its own easier-to-verify thing. For example, its memory model has evolved over time. These verifiable subsets of C not only have a subset of functionality: they expect it to be used in specific ways/semantics that facilitate their analyses and algorithms.
Don’t these subsets that have specific, unusual interpretations and usages become new dialects? Or is my own interpretation of what a dialect is off?
Note: That was first link I could find about this in DDG. Original was on Lobsters but its search isn’t giving it up.
The compcert memory model is shared with C from what you linked. Also from the compcert front page
The main result of the project is the CompCert C verified compiler, a high-assurance compiler for almost all of the > C language (ISO C99), generating efficient code for the PowerPC, ARM, RISC-V and x86 processors.
A dialect is different, a subset is a subset. also see the words ‘almost all’ .
CompCert C supports all of ISO C 99, with the following exceptions:
-switch statements must be structured as in MISRA-C; unstructured “switch”, as in Duff’s device, is not supported.
-longjmp and setjmp are not guaranteed to work.
-Variable-length array types are not supported.
There is a specific mention that they cover all of MISRA C 2004, which is already quite good considering that this tool might be especially interesting for industry application in safety-critical context, like in the linked paper.
I could be recalling incorrectly what made it a dialect or something about C which I’m far from an expert on. I’ll say subset for now unless I can cite a difference deserving the word dialect.
Clight looks like it might be a very small subset of C (I have not checked). It might also qualify as a very small subset of Java and perhaps other languages.
Since Clight is defined by its creators and whatever it does, by default, is correct. What exactly does a proof of the correctness of Clight mean?
The implementation does what the spec says on all inputs.
In CompCert’s case, they mainly want to specify the meaning of the languages, the operations (esp transformations) they perform on them, and prove they have equal meaning after those operations. The Csmith testing empirically confirmed high quality for at least those type of tests. Found a few spec errors while non-verified compilers had piles of failures.
Personally, Id like to see a testing suite of every feature, common combinations (esp w/ nesting and aliasing) and random combos. Compare how each compiler handles programs generated from those templates.
you can formally verify C code with things like frama-c or verifast… But once you have verified code, you need a verified compiler to convert it to assembly, that is where compcert comes in. If you compile verified code with some buggy compiler it might do you little good.
Frame-C and Verifast are essentially static checkers of source code. They have no means of formally verifying C source against the requirements contained in the C Standard.
Where is the proof showing that CompCert implements the semantics specified in the C Standard? There could be a fault in CompCert’s handling of C semantics (just like in other compilers) which causes it to generate incorrect code.
Frama-C WP plugin absolutely allows you to write formal specifications of your code, and checks the C code matches that specification with relation to the C standard. https://frama-c.com/wp.html
The whole point of compcert is that it is a formally verified translator of the C standard to assembly…
Obviously there could be bugs in the compcert proofs, but they are probably less likely than many bugs in other compilers.
Where is the proof showing that CompCert implements the semantics specified in the C Standard?
I don’t see anything in the frame-c link that makes any claim about “relation to the C Standard”. In fact, the word ‘standard’ doe snot appear in the link.
In frama-c, you write you own specifications and the tool checks the source code against what you have written. No connection to the C standard provided or required.
Please provide a link to the files involving proofs against the C Standard, I cannot find them.
The ANSI-C just come from the fact that it targets ANSI C. ASCL itself is just a language for formally specifying behavioral properties of code written in ANSI C. The book is here with tons of examples.
The only one I know that can check C code against its specification is KCC that RV-Match is built on. That’s because it’s pretty much the only complete, open specification of C that can catch undefined behavior and be used as a push-button tool (if their docs are honest). It also passed the parts of the GCC Torture Test that their semantics support, RV-Match is getting good results on benchmarks. So, you could say it’s only formal semantics with significant, empirical evidence on top of claims in formal specs/proofs.
From 1989, probably not authored by @bollu. :)
Anyhow, thanks for posting! Saved this for a deeper dive for later just on the account that it seems like an exceptionally well written and nicely illustrated paper.
Nice post! Yes I did notice that with
You can now write OCaml/Rust-ish code in Python :) I can definitely see that some people will find it ugly, and I might have had that opinion 10 years ago.
But I find it is pretty fun and easy to write little compilers and evaluators in this style. The style depends on the problem you’re solving, maybe even more than the language you’re writing in.
A few months ago I hand-translated a Rust program from BurntSushi to Python, and it’s all statically typed. It works pretty well, and the code is pretty much a direct translation, one-for-one.
https://github.com/andychu/rsc-regexp/tree/master/py (the Rust program being in turn hand-translated from a C program)
This is also kinda how Oils is written, with the added benefit that we automatically translate the program to C++, for a predictable 30x-50x speeedup.
We use a schema language Zephyr ASDL to express algebraic data types, not the verbose
dataclass
stuff. And we have an idiom forswitch
using thewith
statement.A bit off-topic perhaps, but I browsed quickly through the code you linked and lines 283-312 caught my eye: Maybe a dumb question :), but what exactly does the following part do?
Was not aware that you can also provide a string to Optional. Does it somehow “forward-declare” the type State you define a few lines later and the string gets later matched to the actual type?
That’s right. And with the new
type
statement you don’t have to use strings to do forward declarations.Technically it hasn’t been needed since 3.7 because you can enable
from __future__ import annotations
which makes all annotations lazy. Although for type aliases you need to combine it withtyping.TYPE_CHECKING
.Thank you!
I didn’t know about the ASDL stuff and how it’s used in CPython, thanks for sharing!
Interesting article, thanks for posting! (Hadn’t noticed the email yet…)
The article and the CWE-comment from alilleybrinker reminded me of the typical taxonomies used in safety engineering/dependable computing literature and standards for errors/faults/failures. You might already be aware of all this, but this paper by Avizienis et al (PDF) is pretty good (see section 3.2). It’s quite high level/not SW-specific though, maybe there’s some more work out there to further categorize/define different types of bugs? John Knight’s book “Fundamentals of dependable computing for software engineers” also discusses the taxonomy of Avizienis et al a bit, but doesn’t go much further, he mainly categorises all SW faults as either design faults or “byzantine” faults (just another type of design fault).
Interesting article, thanks for posting!
Kept staring at this for a while. For the rest of us dummies, this is because for integers:
x*y > MAX <–> x > MAX/y.
So the test is shorter form for (y > 0) && (x > UINTMAX/y).
Well, yes, but it’s specifically short for
y != 0 && ...
to prevent division by 0.Got all excited for an open Simulink-alternative, but it’s unfortunately a cloud-based simulation product. They seem to have the requisite ML-buzzwords included. The claims about accelerating simulation with their approach, if true, are quite impressive. Wonder if the actual users have a traditional model as a validation backup in case the ML-accelerated model shows something interesting/behaves unexpectedly. Seems that it would be sensible to somehow check the results if this is going to be used for anything remotely critical like pharmaceutical modeling as they mention in the article.
Thanks for posting! The technical side of things is quite interesting, but the writing style of the author is rather pompous.
“What if there were a way to automatically generate C-code, complete with formal proofs of correctness, removing any need to trust fallible human developers?” .. and what about the fallible humans writing the specs?
Here’s a link directly to their paper (he also provides other references at the bottom): https://gopiandcode.uk/pdfs/CySuSLik-icfp21.pdf
I’ll add that there’s been many specification errors in verified programs. Mutation testing was finding issues in them, too. We’ll need mamy eyeballs on them. Checking specs/proofs in general math is a social process but mathematicians don’t like computer-focused math. So, in a way, we’re moving the problems from code nobody was reviewing or analyzing to an esoteric language fewer people will check.
It’s still great work due to the ubiquity of C, good tools for seperation logic, removing attack surface in spec-to-code gap, and pushing synthesis forward. Might as well keep putting good stuff together to see what happens.
Far as general problem, I’m a bigger fan of doing stuff like this with notations like ASM’s that wider audience could use and review. Or languages like ML or SPARK with C generated from them. If algorithmic, a side benefit is one can retarget static analyzers and automated provers to them to confirm some properties without really getting into formal logic.
There are so many memory unsafety bugs. Most of these are caused by selecting memory-unsafe languages, and C is the worst offender by far.
Weird. Interesting, but weird. Thanks for posting. Don’t really “get it”, though, and unsure whether it’s worth further time investment.
I watched a couple of the introductory videos [1,2] they have on the page. From the first one I somehow got a cybernetics-vibe. In both they keep giving examples from quantum physics, biology to ovens and robotics. But then they always refer that this stuff somehow provides a foundation for fundamental physics. The paper, which apparently kicked it off [3], seems to be (based on a very brief browse) about marrying physics and computation together somehow. Is this trying to be a new foundation for physics, or is this a general theory of everything, or the next iteration of physicists implying that all other disciplines are just physics? Can anyone give a brief explanation for an engineer?
[1] https://www.youtube.com/watch?v=8DH2xwIYuT0
[2] https://www.youtube.com/watch?v=0zeT2npYf18
[3] http://constructortheory.org/portfolio/the-philosophy-of-constructor-theory/
My sense is that this is proposed as a new foundation for physics (similar to the situation in mathematics where category theory and homotopy type theory are proposed as alternatives to set theory as a foundation for mathematics). So you would use constructor theory as a framework for defining physical theories. It is suggested that constructor theory is general enough to be used as a framework for theories outside of just physics.
The deep connexion with computation and information theory is interesting. So it ought to have applications in computer science. Maybe it will turn out to be an alternative to category theory: both constructor theory and category theory are built on top of algebra.
The difference is that category is a well established field, and using it as a foundation has gotten us interesting mathematical insights in general. Neither seems to be true here.
Thank you for the clarification. That makes some sense. I feel there’s the typical danger here of proposing a very general model of everything, which in the end doesn’t bring anything new to the table. But I guess if it’s a lens through which you can describe and try to understand some general underlying patterns then it will have served its purpose.
Just throwing some ideas out there, the following is likely complete nonsense:
On the perpetually growing reading list I have Whitehead’s Process and Reality. I wonder if some parallels could be made between the focus here on transformations, or constructions, and Whiteheads’ views of entities as processes.
Another interesting perspective might be to hear how this theory would explain “self-organisation due to entropy” as in this article: https://www.quantamagazine.org/digital-alchemist-sharon-glotzer-seeks-rules-of-emergence-20170308/ . Is entropy a constructor?
i have used the following style for state-machine implementation to great advantage. for larger more complicated cases i have found that HSM or hierarchical-state-machines are also pretty useful toolkit to have around.
Searching for HSMs, I stumbled across this description of behavior trees, which are new to me: https://web.stanford.edu/class/cs123/lectures/CS123_lec08_HFSM_BT.pdf. Neat construction.
Would recommend this pre-print book for a thorough discussion of behaviour trees.
Edit: And this old paper from Harel (PDF) for hierarchical state charts.
yes, i am familiar with harel’s paper, and have used that as a basis for design & development of hsm’s used @day-job. thanks for the book link on behavior trees. looks pretty interesting.
as a side note, for parsing, specifically network protocols where things are strictly (a very loose definition of strict fwiw) specified through rfc’s etc., i have relied mostly on ragel. would love it hear your (and other folks as well!) experiences/wisdom in this regard as well.
Interesting article. I know you’re not the author, but do you have any idea what these companies use for ‘proof engineering’? Is it mostly TLA+?
Galois itself has a lot of projects on Github: https://github.com/GaloisInc/ , would recommend to check those out. They’ve built quite a few tools themselves (seems to be mostly in Haskell), and seem to work quite a bit with Coq and SMT-Solvers like Z3.
Every now and then I’ve used LTSpice (free to use, but not open source) to do circuit simulations for work projects (industrial control systems). I first learned the GUI during my studies. Not sure at exactly which point I found out that actually the graphical models are translated into text before they are fed to the simulator, but figuring this out opened quite a few possibilities.
Recently for a work project we had to figure out whether a particular circuit design would end up in an unsafe state as the result of a different fault modes/combinations of faults. The circuit was complex enough that it didn’t really yield to normal “stare at circuit diagram and think what could happen”-style FMEA. We ended up building fault simulations into the model and then adding parameters to switch different faults on and off. Because the whole model was just a single text file, it was then possible with simple search and replace-style script to change the parameters of the simulation. This allowed to programmatically simulate (you can call the simulator with a SPICE-netlist through the command line and it dumps the simulation results to a .raw-file, which is binary, but there are libraries to read it) through all the interesting fault modes, combinations of them and combinations with different circuit control states and for all the simulations dump the results into a bunch of figures and tables for easy analysis. We easily found a couple of unsafe fault modes which were very unintuitive and would not have been found without the simulation. Coding the whole thing initially took quite a few hours, but combined with a couple of circuit revisions ended up saving a lot of tedious manual work.
Of course similar things could be done with other simulators, but usually there’s some clunky API I would have to work through, with the text-based models creating own tools and scripts on top is rather nice.
In another case we were trying to simulate another circuit which used a couple of ICs. We found a SPICE-model for one of the ICs online, the model was dated to 1993. After only a minor modification it was possible to run it on the 2020 simulator, which is by itself quite a feat. It didn’t quite contain all the functions that the modern version of the IC had, but since it was text-based and not encrypted (like some manufacturers like to do), we were able to reverse engineer the model and found out you could simulate the modern version of the IC at least to some approximation by combining the building blocks provided by the 1993 library.
Had the model from 1993 been in a binary format for some GUI program, I doubt we could’ve ran it or figured out how to open the thing at all to be able to figure out what the thing was actually doing.
To summarize, it sounds like LTSpice’s use of plain text files provided these benefits:
Yes. I have a tendency to write too long-winded, thanks for the TL;DR. :)
I disagree. The chernobyl meltdown (which is what I’m assuming the author is referring to) occured solely because of human error. In addition to the reactor being poorly designed (it lacked a proper containment vessel, for pete’s sake), it’s operators were performing an unauthorized experiment. They also shut down the coolant system and ignored computer warnings and the protests of a technician at the scene.
EDIT: well, it looks like the experiment was authorized. See below.
How is human error not “when things go wrong”?
The author implied that this was a fundamental fault of nuclear-fission based reactors. I might have misunderstood though…
The experiment was authorized, and that’s the problem—the supervisors were deaf to the safety concerns, and refusing to proceed would cost the operators their jobs, if not their party membership and with it a prospect of finding any decent job until the breakdown of the USSR. The factor that made so many disasters possible is that operators didn’t have the authority to veto an unsafe action. At least once a high ranking officer who insisted on an unsafe action died himself.
Not necessarily related to the posted article, but I wanted to reply regarding your statement of the meltdown occuring solely because of human error.
Nuclear systems are rather complex, and so are the accidents that occur with them. Traditionally it was typical to explain these things with human errors, but in a more modern safety view there might be human errors, but it’s usually only one factor in a larger chain. Usually the operators are trying to do their best in the given situation. The questions should be asked, what lead to them doing the mistake? This could be stress, the quality of the HMI, stressful situation, pressure from higher ups, etc. Slide 9 in this presentation sums it up very nicely. For further discussion you can look up Nancy’s book (free download under “Open Access”, or Sidney Dekker’s writings.
There’s already a bit older reference regarding TMI, but very enlightening nonethless, Charles Perrow’s Normal Accidents. I read it over the last holidays, and it was still very interesting, even if at some points the age (original from 1984) showed a bit. Can’t recommend the Y2K-discussion at the end though..
For a better discussion about Chernobyl, I found a few articles which provide a bit more modern view on the accident on a quick glance:
Three Decades after Chernobyl: Technical or Human Causes?
Chernobyl - System Accident or Human Error?
The latter one is unfortunately behind a paywall, and I couldn’t find a freely accessible version, but there’s a certain site, one might call it a hub for science, where one can find it with the DOI-link..
I personally agree that given the current climate situation we should keep using existing nuclear plants and improving the technology while figuring out alternatives. But after reading e.g. Perrow’s book, I would say the risk associated with these things isn’t necessarily exaggerated, we should take greater care with them than we have historically done.
There have been many other destructive events caused by fission reactors: Fukushima Daiichi and Three mile island are the first two that spring to mind by Wikipedia has a whole list: List of civilian nuclear accidents.
Both of the specific disasters I mentioned were destructive in ways that it sounds like this hydrogen-boron reactor won’t be capable of.
If I recall correctly, very little contamination was released at the Three Mile Island accident (compared to the Chernobyl accident, anyway – any amount of radiation is harmful). Also, the plant’s operations were engaged in some dangerous experiments, which led to the meltdown.
The fallout from Three Mile Island was political/cultural, rather than radioactive. It really shook the public’s faith in the security of nuclear power.
Which, I think is rather sad. IMHO Nuclear fission is a great alternative to coal-based power (it produces much, much, much less toxic waste per year, for one thing), but because of accidents like this which have been exaggerated by some environmentalists and the media it just isn’t being used to it’s full potential.
I understand that but I barely trust engineers with making good choices when it comes to software, how am I ever going to trust software that handles a nuclear power plant?
And I certainly wouldn’t live near one located in most countries in today’s world.
What are examples of bad choices made by engineers when it comes to sofware?
Some would argue the persisting choice in languages with potential footguns like buffer over/underflows.
Every language has it’s own footguns.
Required reading for this should be Taleb’s
Black Swan book.The statistical consequences of Fat TailsA couple of observations….
I’m sure the distribution of software projects have several underlying distributions. So saying it’s just “log normal” underestimates things. eg. Some projects have high levels of technical risk… ie. Some tasks are just not feasible and the time taken to prove it’s not feasible is not finite.
I’m sure the distribution is truncated fat tailed. ie. The only reason the mean time is finite is a lot of projects get truncated at the far right due to budget limitations.
The whole field suffers from the Ludic Fallacy. Projects aren’t a game operating to strict rules. Shit of every conceivable flavour happens in real life all the time.
There is a fundamental disconnect between contracts, “I promise to deliver X by hard deadline Y in return for payment Z” and estimating “If I do X, how long will it take me and how much will it cost?”
Business is fundamentally a betting game because “how long will it take me” and “how much will it cost” are fundamentally random variables.
There is irreducible risk in collapsing a probabilistic estimate into a hard contract.
You can reduce that risk, but you cannot remove it.
If you’re uncomfortable with that you need to be in a different game.
Thanks for the link.
I’ve started the Black swan and Antifragile both a couple of times but didn’t manage to get through either one of them. One problem was Taleb’s writing style, but also a lack of background knowledge in statistics. (I don’t have a very good understanding of statistics outside of what they teach during engineering programs in a university.) Trying to get even the part 2.1 of the linked doc proved somewhat difficult.
You seem to be quite knowledgeable in these topics. Would you have any pointers, where one could possibly even get started, if the goal would be to self-study the required background knowledge for this?
What statistical techniques are useful for software engineering data?
Practical statistics books for software engineers
Sigh. Yes.
He’s core point is most of the “background knowledge in statistics” you’d get in the average stats courses at uni are focused on that small subset of problems that are amenable to analysis.
Preferable those with closed form solutions, preferable teachable in a course that has to be endured.
(Typically by engineers or economics or psychology students or whatever that actually want to be studying something else, but this damn stats course is a prerequisite.)
Alas, as he hammers home almost to the point of tedium, most of reality does not satisfy the requirements for those classical of statistical methods to be appropriate.
Which leaves us in somewhat empty….
Fortunately Taleb does try hand us some digestible take aways, I’d focus on the section “2.3 the main consequences” and google around to try make sense of each point.
Certainly I’d also keep an eye on what Derek Jones is doing (add his blog to your RSS feed (I have)) https://lobste.rs/s/srgwbv/why_software_projects_take_longer_than#c_u9ia1a
Alas Science traditionally has the “All Other Things being Equal” caveat…. which in an industrial software project is never ever true, which makes drawing valid conclusions very hard. https://leanpub.com/leprechauns/read
The assumption in all traditional stats is the data isn’t truncated. Real world projects are always truncated in several dimensions. (Insufficient money, CPU power, project becomes irrelevant, …..)
Partially for the same reason the FAA deferred certification to Boeing: deadlines and management pressure. I highly doubt this is the end of the story; there are deeper issues here.
I look forward to the final STAMP investigation.
I think they also touched another point in the article (Disclaimer: I’m not an expert in safety regulation for aviation, but I assume the principles are rather similar to those in industrial systems). The safety analysis of the MCAS system classified the risk associated with the system too low. The single channel system fulfilled the regulatory requirements for the lower estimated risk level, had the risk analysis classified the associated risk higher, they would’ve had to build this particular system with a two-channel architecture. To be honest, I was surprised when I read that it was a single-channel system, I was under the impression that mostly everything in aeroplanes is two- or three-channel. :)
But this was also just a single problem in the whole engineering process. I spotted this article on the System safety mailing list, some people there have rightly noted, that the problem is with the overall systems engineering in this case.
The point of outsourcing the certification to Boeing was also a surprise to me. I’ve heard that major corporations can pull such tricks off, where they can argue that a separate organisation of the company is sufficiently independent of the development team to be able to achieve the required independence when executing the risk analysis and “third-party” review. But somehow the way it was done here seemed quite “unclean”.
A STAMP investigation would surely be interesting, we’ll see if that happens, I have the feeling that it’s still not that widely used even in this particular industry.
Thanks for posting! The article was interesting, even if at some points it felt a bit overly dramatic.
If someone wants to know a bit more about the technical side, there’s a report from Dragos available and a short talk from Schneider Electric on Youtube.
There are some interesting details related to the incident, the whole thing wouldn’t have been possible, if a physical switch on the safety PLC was used to set the device to ‘run’-mode (prevents software changes), as instructed by the manufacturer, instead of being left in ‘program’-mode.
Selfishly, I’d recommend my talk on YouTube over Schneider’s. I was on site for the incident response and led the investigation of the controls systems side of the breach. Our team took over from Julian’s after they identified that there was malware present on the engineering workstation.
I’m always conflicted about Systems Theory….
Part of me says Cool…. part of me says it’s so fuzzy and so glossing over details as to be provably useless. (Look at any system which produces chaotic behaviour).
This seems like it might be a good use for it….
ie. Use systems theory to identify all contributory factors and take actions on all of them instead of a single scape goated “root cause”.
But then that niggle about vagueness and fuzzyness and how it doesnt fit with any software intensive system I work with….
What’s good about the presentation in this handbook is, that the STPA-method is presented as a step-by-step process, without much of the fuzzy stuff. Leveson & co. claim to have reached pretty good results with the methodology, usually exceeding what was reached by FMEAs or HAZOPs. Leveson typically argues replacing the old methodology completely with STPA. A couple of presentations I’ve seen about it in an industrial setting usually suggest using it for a high level analysis in combination with e.g. FMEAs for low-level stuff. The hope there is, that you catch the more complicated causal scenarios (possibly involving humans with “fuzzy” behaviour) but for circuit board-level stuff remain with the traditional methods.
You might be interested in a recent PhD thesis from Uni Stuttgart, where they combined STPA into a systems engineering approach for software-intensive systems in an automotive setting. These slides provide a nice concise summary of that work. The approach uses STPA to identify system specifications, which are then translated into something more formal for model-checking / software testing.
I’ve had similar feeling when reading Leveson’s stuff. I think however, the analysis method they’ve built is powerful, also outside the domain of safety engineering. maybe especially when incorporated in a similar way as suggested in the thesis above, where it’s used to formulate specifications early on, as writing specs might often be the more fuzzy part of the system engineering process?
This was an interesting read. Thanks!
One thing, which would’ve been a cool addition, would’ve been a control graph of the system being analysed. There’s a lot of literature with examples of STPA/CAST-analyses for safety, but very few for security, and all of them are related to analysing a system around some physical process. Would’ve been interesting to see your take on it here. There’s a bit about extending STPA for Security e.g. here and here.
Another nice and a bit more practical resource for learning about STPA (The hazard analysis method based on STAMP) is the STPA Handbook (pdf) Leveson and Thomas published last year. Leveson’s 2012 book, at least for me, was at points a bit long winded (nonetheless a very good book), but I’ve found the handbook a better reference for analyses. It also covers a lot of typical common mistakes etc.
If I may nitpick a bit, STAMP is an accident model, STPA and CAST are analysis methods based on it :) Leveson extended on ideas from Jens Rasmussen, as she talks about e.g. in this publication. If you’re interested, check e.g. this influential paper from Rasmussen.
PS. The link to Leveson’s homepage at the end of the article is not working.
Wow, that is a really good handbook. You should submit it as a lobsters link!
Also, link fixed =)
(I know I should include a control graph but I’m kinda procrastinating on making one :/)
Interesting paper, thanks for posting.
I’ll try to take a more detailed look later, but would be interesting to see how their approach scales to something actually useful. The system they are using as an example is rather simple, and they mention having used a 150s time limit for the solver. I guess this is only for the marginal/instable cases with gain >= 1, for which the solver apparently timed out? Wonder how much time the solver needed for the proven cases.
While searching, found a follow-up paper with an interesting comparison of different model checkers for similar stuff from 2015. Hopefully you weren’t planning to post this next. :)
I didn’t have it! So, thanks. I actually wasn’t looking for Simulink/Stateflow stuff this year. I just stumble on it periodically because of how I word searches. I might spend more time looking for them next year due to importance in embedded. Are there any open tools that are comparable to those and high quality enough for professionals?
EDIT: Also interesting is it’s two women. I tried to find out what their other work was to see what’s industrially applicable. Both had pages at Britol, here and here, with Araiza-Illan now at ARTC. It’s heavy on industrial application. Araiza-Illan also posted four hours of verification proceedings to YouTube. Interestingly, Eder worked at XMOS on power-aware design. As Bristol is Cadence’s lead university per that page, she’s probably also feeding ideas into their verification groups. If not, she at least started with real-experience per this interview which I’m still watching right now.
So, you found not only a paper but two women doing interesting work with strong ties to industry and/or real-world experience. Extra interesting. Also, now on my list to potentially hire if I get into verification. :)
Re: Open alternatives - This is a topic where I’ve wanted to look into lately. Probably a large portion of normal data analysis people might do in plain old Matlab would be easily replaceable with Python and the common analysis libraries for it.
As for Simulink/Stateflow: I’ve been meaning to play around with OpenModelica, as I would really like to have an open alternative available if I don’t have access to Simulink in the future. The tool has been in development for a long time (still very active, with EU-funded research projects developing it further), has a consortium backing it with major players involved, and has been used as a basis for industrial tools. It’s the only thing I would seriously consider as a realistic open source alternative. There’s also JModelica, which is another Modelica-language based open source tool from a swedish company (apparently they base their commercial tools on the open core developed with couple of Unis).
Simulink was always a nice tool for building models of dynamic systems, continuous or discrete, as the graphical notation makes sense for the sort of stuff in control engineering or signal processing, where you would anyway draw block diagrams or similar charts to try to clarify things. However, at least in my experience, building any sort of non-trivial software-like logic with the graphical blocks always resulted in a horrible mess. The Stateflow addition with hierarchical state machines has been a really nice addition to help with that aspect (before moving to software-in-the-loop with generated code or custom code through the C-interface). I think OpenModelica should have an equally good system/solver for the typical control/signal processing stuff (I read somewhere that the solver should be somehow even better/faster than the Simulink-solvers due to a different approach internally), I’m not sure how well supported the FSM-stuff is. At least there should also be a C-interface, so it should fill most of my use cases. For use with HW-in-the-loop-simulators, both systems support the FMI/FMU-standards, so ideally that should also be covered, but I’ve only worked with dSpace, which is rather heavily integrated with Matlab/Simulink.
I don’t know how the usability of the graphical editor with OpenModelica is. You could always just write Modelica directly, but somehow I was always put off by the syntax of the language. :)
I submitted OpenModelica here before. I don’t know much about it past the description sounded really cool and general-purpose. Yeah, expanding it to have parity with or better methods than Simulink/Stateflow makes sense. I didn’t know about FMI/FMU. Looking at this article, it says FMI is Modelica project. It has a nice pro’s/con’s section comparing to Simulink where improvements might be made.
I’m not sure about FMU’s. It looks like they’re like modules or executables produced as a result of doing FMI. Like class or jar files in Java maybe. Or am I pretty far off?
re syntax. Add it to list of Better Language extracts to and integrates with Supported Language. :)
I also have pretty much no experience actually using the FMI-standard, as I’ve been stuck with Simulink (and the S-functions also mentioned in that Wikipedia-article). My undertanding is, that the FMI-standard specifies a way to describe the interfaces (IO-points) and the internal logic, and FMUs are then models/modules implemented according to this standard. Modelica is going for an object-oriented approach, so I think thinking of FMUs as classes is probably a good analogy.
Typically, at least in the S-function world, and I think it’s the same here, one defines the time derivative of each continuous state within the model, which are fed back to the solver/simulator executing the model, and it handles the numerical integration and feeds to integrals back to the FMU (See p.69 here). For discrete stuff in the S-function world one defines a step-function, which is executed by the simulator every x seconds to solve the next outputs, FMI-standard seems to have an event-based approach, but “time-based events” probably boils down to the same thing.
I tried to dig up some meaningful examples outside the actual spec, but they were a bit hard to find.
This paper has an example of an FMU-description in Modelica: http://www.ep.liu.se/ecp/056/003/ecp1105603.pdf (I assume then this could be used to generate the XML/C-interface described in the FMI-standard?)
There are some tutorial slides here.
It looks pretty complicated haha. I’ll probably have to try OpenModelica or Simulink some time in the future on small, realistic examples to learn how they work. Especially the thinking process behind it. Probably different than what I’m used to. Thanks for the examples, though. The transmission example particularly shows me how they use these tools.
It’s now happened a couple of times that I’m surprised by texts claiming that developers haven’t heard of state machines. At least for me the concept was introduced in an embedded systems course during the uni, and has been quite a common thing during my working life in that area (I’ve worked mainly with different embedded/PLC-based systems for machine controls), both in the control logic as well as with the fieldbus communication (e.g. the CANOpen state machine comes to mind). It’s often been a rather natural presentation when working with anything physical which has different modes and moving parts etc., just like the demo here.
Currently I’m working sometimes with the Stateflow extension of Matlab Simulink, which offers a nice utility for hierarchical state machines, though behind a rather expensive paywall. I had not heard of this language before, and it looks like they have code generators for different languages, nice. I’ll try to give it a go at some point. Thanks for posting!
Now if there only would be nice integration with something like OpenModelica to enable easily sketching out control logic against a process model..
This is an interesting piece of work. I had the chance to see a presentation from AbsInt at conference (for industry in safety-critical application areas, more industry people than academics) earlier this year. They presented mainly the same results as discussed in this paper. The company is doing the commercialisation for the tool.
For the discussion about which parts are formally proven or part of the “Trusted Computing Base” and which not, I think slide 34 from this presentation summarises it quite nicely.
If I understood it right, I think the main argument here is, that for industrial usage especially in x-critical applications, CompCert can provide one trusted part in the whole toolchain to produce an executable which is correct with high confidence. Key benefits being reduced need for testing/validation (esp. of the compiler itself) as well as the possibility to use compiler optimisations, which are usually not allowed,or only low levels, as part of the compiler certification/validation for safety-critical use. Of course the toolchain doesn’t remove the need for testing the end result, but at least they could trust the compiler itself.
They use several additional tools together with CompCert to achieve the high confidence in the end. Like they discuss in the paper, CompCert doesn’t do pre-processing itself. They use a “proven-in-use” gcc-version for that, which they additionally test themselves for the subset allowed by their coding rules (derived from MISRA C). The coding rules are then enforced by another tool, so they can have high confidence that the pre-processing is correct. They use another tool to check for runtime errors and undefined behavior. Lastly, they again use “proven-in-use” gcc for assembling and linking, but the translation is checked again with another tool.
CompCert uses “proven-in-use” tool (CIL) to map the C source output from the “proven-in-use” gcc to their lower level language Clight, then they generate assembler followed by “proven-in-use”, gcc, to assemble and link the code.
What is wrong with the gcc’s “proven-in-use” code generator?
The paper is not just about CompCert, it’s about how the main authors used the compiler in a specific safety-critical application as part of the toolchain, and what benefits it had in that case and how they managed the non-proven parts of the toolchain, where CompCert is just a single building block.
The problem in safety-critical applications, where the produced system is certified by a third-party (e.g. TÜV or Exida) for some specific use according to some standard (e.g. IEC 61508), is that the used tools need to be validated somehow. The problem then with gcc is, that it was not developed which such a use case in mind, so the user of the compiler needs to somehow validate, that the compiler is doing what he expects it to do. Other option is to purchase an expensive compiler, which is developed and certified for safety-related use. Some manufacturers are also offering certificates when using their libraries with a specific piece of their hardware with a specific compiler, with a specific language subset, which they have pre-validated for the user.
In this case by using CompCert as part of the toolchain, they had to do less validation for the compiler and were able to use higher compiler optimisation settings, as they had the proofs to show, that with reasonable high confidence they can be trusted. Since the other parts of the toolchain were not formally verified, they used different measures to qualify them for the use case, which was equivalent to SIL 3 acc. to IEC 61508 if I’m not mistaken. As a reference point, that criticality level usually means that if the system fails, it’s possible that several people are injured or die. Normally for such a use case one has to do a lot of verification effort for the used tools to prove to the third-party, that the tools are suitable for the job and with a high level of confidence don’t introduce errors in the end product.
Again, at least that’s my understanding of the matter.
Edit: couple of typos.
Edit 2: Forgot to answer your first remark. In my understanding, the gcc in the beginning is used only for the preprocessing (.c-files -> .i-files), so there in the first entry point to CompCert, it’s still plain old C. If I understand right, the first step CompCert itself does is to translate the C to Clight, but I’m not really an expert on the internal workings, I’m more of a potential user for the tool.
I understand about compiler validation. As far as I know CompCert has not passed a compiler validation suite.
The C semantics pass of CompCert is based on CIL. which last I checked fails the C validation suites.
What does this claim mean: “The CompCert front-end and back-end compilation passes are all formally proved to be free of miscompilation errors”? What is a miscompilation error? Does it mean the source of CompCert does not contain any compilation errors? Since the semantic analysis of CompCert is based on a tool that has not been proved correct in anyway, I don’t see how this claim relates to C source.
Typo in my earlier response Clight -> Cminor
Interesting text about compiler validation. I wonder if and how the validation you write about differs from the tool qualification done today for e.g. IEC 61508.
I have to admit I didn’t completely understand your previous comment, as I wasn’t aware of CIL. As mentioned, I’m not really an expert, but I tried to do some digging.
There’s a paper from 2012 where it’s mentioned that CompCert used an adapted CIL-library for parsing. In this paper they don’t mention CIL and only discuss about using gcc as preprocessor. There seems to be some documentation about the current compiler structure in this paper from 2017 (check esp. part 3) and in more detail about the steps in the current docs. The 2017 paper also discusses the semantic preservation. At least I dont find any references to CIL there. Maybe there’s an answer to your question there, I don’t feel qualified to answer it. :)
The slide has a nice breakdown of steps but is a little misleading. The formal methodists sometimes count anything proven to not be in the TCB or be untrusted. They had me in that habit a while back before I realized how much they screwed up. The truth is all of that is in the TCB. They’ve just applied formal proof to the stuff they’re not counting. It can still be wrong due to spec errors like the 3 or so found in Csmith testing. There’s also stuff it doesn’t model, esp during runtime, that can cause failures. Bit-flips in RAM or bit rot on HD’s are examples. The best they can say is certain parts inspire higher confidence they work. Whereas, this is an example of qualifying verification claims the right way.
“CompCert can provide one trusted part in the whole toolchain to produce an executable which is correct with high confidence. “
Good descriptions about its implementation aspects. Yeah, it’s supposed to be one link in the chain. There was another slideshow I saw showing everything that might be used in a DO-178B evaluation. They had something for requirements, a modeling tool for high-level design, something like Astree Analyzer for checking code, a test generator, and then maybe CompCert kicks in at last part. One tool in a whole, assurance-producing chain.
Yes, I realised it was actually a bit backwards in that slide after I posted the comment. Here is a version of the same thing they used in the presentation at the conference where I participated. If you’re interested in the slideset, I can somehow share it over PM, it’s in principle under copyright, so I don’t really want to post it somewhere openly.
They also used Astrée in this paper, the whole toolchain is summarised in the Figure 6. In the presentation I saw they also discussed the modeling tools etc shortly.. What’s discussed in this paper is funnily mostly the toolchain, where AbsInt can sell you solutions for the problem. :)
Edit: Re: HW Failures during runtime
Yes, they didn’t mention that much in the paper. I would assume that in this case they also use redundancy, ECC-memory and all manners of self-tests implemented in software to combat and self-diagnose all the failures that could happen in the electronics during operation.
That’s much better. It just says what parts they did a proof on without any false claims. Yeah, send it to me in case I need it for someone.
“What’s discussed in this paper is funnily mostly the toolchain, where AbsInt can sell you solutions for the problem. :)”
You’d think they’re motivated primarily by money instead of public good. This is the thing I hate most about CompCert: it was assigned to a private company as proprietary software. INRIA’s brightest build one of the most amazing tools in field of verification to have it partly locked up. Then, all the academics contributing to CompCert, which is free for them and maybe GPL stuff, are contributing to a proprietary product instead of an open one. Had they gave it to C0 or something totally free, then all the improvements in that slide would be totally free with companies optionally selling them. CompCert should’ve been Apache, LGPL, or something. It’s too important.
After one company did one in weeks, I got it in my head to eventually attempt an open project using ASM’s as an intermediate form since they have a ton of potential. They’ve already been used as IR in both hardware and software but not free or well-developed. If using them, one can target all the analyses and transforms on them with code generated afterwords or hand-optimized with equivalence check. As prior work shows, the equivalence check can happen at assembly level.
Last time I checked, CompCert did not do any formal analysis comparing its behavior and the semantics of the C language. The formal verification stuff is all back end, i.e., code generation.
More CompCert soap powder advertising.
Using the CompCert proof ‘methodology’, I could translate Visual Basic to the CompCert low level code (from which machine code is generated) and claim to have a formally verified Visual Basic compiler (i.e., no connection to the formal semantics of any language is required).
I thought I told you or you told me that CompCert uses its own dialect of C that they formalized. Its a subset with specific interpretations of anything with wiggle room. Most of their work centers on Clight as stsrting point. Most efforts targeting CompCert target it. One could check the MISRA-C to Clight translation by eye and equivalence tests if concerned.
I do think they should be more clear on exactly what they support at various times.
Incorrect, a subset is not a dialect. compcert implements a very large subset…
It’s possible my wording is wrong. I’m not sure how different something has to be from the original before it qualifies as its own version or dialect. Usually, once a non-standard change is introduced, people often refer to it as something different sometimes using “dialect.”
CompCert C has never been C itself. It was always its own easier-to-verify thing. For example, its memory model has evolved over time. These verifiable subsets of C not only have a subset of functionality: they expect it to be used in specific ways/semantics that facilitate their analyses and algorithms.
Don’t these subsets that have specific, unusual interpretations and usages become new dialects? Or is my own interpretation of what a dialect is off?
Note: That was first link I could find about this in DDG. Original was on Lobsters but its search isn’t giving it up.
The compcert memory model is shared with C from what you linked. Also from the compcert front page
A dialect is different, a subset is a subset. also see the words ‘almost all’ .
There is a bit more information available here.
There is a specific mention that they cover all of MISRA C 2004, which is already quite good considering that this tool might be especially interesting for industry application in safety-critical context, like in the linked paper.
I could be recalling incorrectly what made it a dialect or something about C which I’m far from an expert on. I’ll say subset for now unless I can cite a difference deserving the word dialect.
Clight looks like it might be a very small subset of C (I have not checked). It might also qualify as a very small subset of Java and perhaps other languages.
Since Clight is defined by its creators and whatever it does, by default, is correct. What exactly does a proof of the correctness of Clight mean?
The proofs in CompCert typically mean:
The specs possess a specific property.
The implementation does what the spec says on all inputs.
In CompCert’s case, they mainly want to specify the meaning of the languages, the operations (esp transformations) they perform on them, and prove they have equal meaning after those operations. The Csmith testing empirically confirmed high quality for at least those type of tests. Found a few spec errors while non-verified compilers had piles of failures.
Personally, Id like to see a testing suite of every feature, common combinations (esp w/ nesting and aliasing) and random combos. Compare how each compiler handles programs generated from those templates.
you can formally verify C code with things like frama-c or verifast… But once you have verified code, you need a verified compiler to convert it to assembly, that is where compcert comes in. If you compile verified code with some buggy compiler it might do you little good.
Frame-C and Verifast are essentially static checkers of source code. They have no means of formally verifying C source against the requirements contained in the C Standard.
Where is the proof showing that CompCert implements the semantics specified in the C Standard? There could be a fault in CompCert’s handling of C semantics (just like in other compilers) which causes it to generate incorrect code.
Frama-C WP plugin absolutely allows you to write formal specifications of your code, and checks the C code matches that specification with relation to the C standard. https://frama-c.com/wp.html
The whole point of compcert is that it is a formally verified translator of the C standard to assembly… Obviously there could be bugs in the compcert proofs, but they are probably less likely than many bugs in other compilers.
The proofs are in the git repository…
I don’t see anything in the frame-c link that makes any claim about “relation to the C Standard”. In fact, the word ‘standard’ doe snot appear in the link.
In frama-c, you write you own specifications and the tool checks the source code against what you have written. No connection to the C standard provided or required.
Please provide a link to the files involving proofs against the C Standard, I cannot find them.
What language do you think you are checking against? ASCL (what WP checks the C code against) stands for ANSI/ISO C Specification.
What language do you think it checks against if not standard C? Here is the wiki page - https://en.wikipedia.org/wiki/ANSI/ISO_C_Specification_Language
Are you claiming they invented their own language that is not C to check the spec against?
The ANSI-C just come from the fact that it targets ANSI C. ASCL itself is just a language for formally specifying behavioral properties of code written in ANSI C. The book is here with tons of examples.
The only one I know that can check C code against its specification is KCC that RV-Match is built on. That’s because it’s pretty much the only complete, open specification of C that can catch undefined behavior and be used as a push-button tool (if their docs are honest). It also passed the parts of the GCC Torture Test that their semantics support, RV-Match is getting good results on benchmarks. So, you could say it’s only formal semantics with significant, empirical evidence on top of claims in formal specs/proofs.