Dramatically Reducing Software Vulnerabilities [pdf]

gwu78 · on Oct 5, 2016

Of the many recurring stupid ideas for startups I have had is a software company that does not write code. Instead it edits and removes code. This has been called "negative coding."

Few people write small programs anymore. UNIX, the OS that runs the internet and on which the "cloud" depends, is a time-capsule full of small programs. What this might mean is left as an exercise for the reader.

Today, large programs are celebrated without question.

Gratuitous complexity can be disguised as "productivity" and pitched as value creation. At the very least it can be a source of fascination for futurists. Any liabilities created in the process can be easily explained away.

Among the many things that can be done with this output, one of them to is to make small programs from the larger ones. As far as I can tell, this is not being done.

For myself, I have taken large, open source programs and gutted all the needless features in order to produce the small programs I want.

My goal is less complexity. Simplicity is a loaded term. For myself, this process of incrementally reducing complexity "creates value". It is an intellectual exercise, there is a sense of productivity, and yet almost no code is written.

Software companies are full of developers who cut, paste and compile. Today many of them are working with open sourced code. The software company I aim to start is comprised of "programmers" who also work with such open sourced code, they only cut and compile.

EDIT: Whenever I see the "Who's hiring" thread I always wish I could put in an entry for this. The vision I have is a group of "early employees" with day jobs. Absent connections and name dropping no VC would ever fund this. No one will be paid to do this "negative coding" work until we secure our first government or Fortune 100 contract and they see the results. And even then I could not guarantee the results would be appropriately valued. But were this fictional company to succeed, those early employees could claim they were an integral part of a monumental paradigm shift.

EDIT 2: The fictional company does negative coding on single programs to reduce them in size or to break them up into smaller programs and is not limited to the embedded market or driven by resource constraints. Selecting parts of an open source OS distribution to create a new distribution is a different idea and of course many companies have been doing that for many years.

schoen · on Oct 5, 2016

Google has a big "negative coding" project with BoringSSL, a fork of OpenSSL that tries to remove tons of code:

https://www.imperialviolet.org/2015/10/17/boringssl.html

Seems like it's been a useful concept.

nickpsecurity · on Oct 5, 2016

Maybe you should apply to CERIAS's Poly2 that's been getting paid for negative coding for years. ;)

http://projects.cerias.purdue.edu/poly2/

Especially see Section 3.6 of the main paper:

https://www.acsa-admin.org/2003/papers/72.pdf

It's an old technique of mine, too. Just edit out all that shit so it returns with an error message telling me what the hackers were trying to use with whatever access they got and from where. :)

roymurdock · on Oct 5, 2016

> Of the many recurring stupid ideas for startups I have had is a software company that does not write code. Instead it edits and removes code. This has been called "negative coding."

This is prevalent in the embedded OS market with vendors that sell commercial Linux licenses. Many will take a stable Yocto distribution and strip out unnecessary pieces to reduce its weight to suit customer specifications.

Often times the OS vendor will then have to go in and add code to meet standard/certification requirements set by the customer. They'll also generally have to add certain libraries and snippets of code for hardware/middleware-specific integration.

I'm not an embedded engineer, but from my convos with engineers/execs I have to believe its exceedingly rare that a company could simply strip out pieces from an open source distribution, not add anything proprietary, and sell the resulting code to a customer.

AtheistOfFail · on Oct 5, 2016

You know. I like refactoring broken code... You know, just in case you're looking for someone.

neves · on Oct 6, 2016

A personal joke is that when I was young I liked to write code, nowadays I find deleting code much more satisfying.

Less code = less complexity, less side effects, less maintenance, and less bugs.

Sure, you have to do it without loosing any useful functionality. As a software ages, it becomes bloated, there is a lot of copy'n paste, features aren't useful anymore because the business changed.

vu3rdd · on Oct 6, 2016

People who believe in these ideas are a minority, it seems. Every 'aptitude safe-upgrade' tells me that more and more disk space is required to install the packages.

OpenBSD project seem to be doing the right thing (as they always did, imho) by deleting a lot of code from the base and not letting it to rot.

newman314 · on Oct 5, 2016

I even have a tag line for you: "Do better with less"

guessmyname · on Oct 5, 2016

Ironically, that's the jQuery slogan [1]: Write less, do more

[1] https://jquery.com/

userbinator · on Oct 6, 2016

...while relying on more.

Using frameworks/libraries often can reduce the amount of code one has to write to perform a certain task, but it also can greatly increase the total complexity of the system. The OP is talking about reducing total complexity, which may mean increasing local complexity slightly.

usgroup · on Oct 5, 2016

Shut up and take my money! :)

0xdeadbeefbabe · on Oct 5, 2016

Money doesn't cause this; it's caused by people raised during the microwave generation, and people who don't know about Julia Childs.

trprog · on Oct 6, 2016

>gutted all the needless features in order to produce the small programs I want.

Which is fine for you to do of course but it is probably inaccurate to assign no value to the code you are removing. Someone took the time to write it. People don't spend their time writing code for no reason. That code may have no value to you personally but it has value to someone.

userbinator · on Oct 6, 2016

Some consultants do the same thing; they might not be always removing code, but it does happen. I personally did some of this in regards to optimisation, it's always fun to see your clients' reactions when you dramatically cut down the resource usage of some massively bloated system they were using.

gwu78 · on Oct 8, 2016

Interestingly, no one mentioned a project like BusyBox.

0xdeadbeefbabe · on Oct 5, 2016

This is probably not ok to do, but one of suckless.org's criteria for software that sucks less seems to be fewer lines of code. You might think their software sucks in other obvious ways though. I don't of course; I wouldn't dare.

jwmullally · on Oct 5, 2016

They should have pasted this diagram all over the document: https://en.wikipedia.org/wiki/Automata_theory

I'm disappointed that the "Formal Methods" section makes only a passing reference to Finite State Machines. FSMs are arguably the easiest to understand and use formal methodology, and probably the most prevalent. Especially worth emphasizing given how difficult and opaque a lot of formal methods remain to even people with 4 year CS bachelor degrees.

This old thread https://news.ycombinator.com/item?id=2949543 had lots of interesting discussions on FSMs, particularly their current widespread use in (sometimes mission-critical) embedded systems. An ongoing problem seems to be a lack of robust first-class support for FSMs in every day programming languages, which can limit debugging, static analysis, standard tools etc.

Disclaimer: I know nothing about security or formal methods.

schoen · on Oct 5, 2016

People making arguments about automata theory in security are often affiliated with the LANGSEC program/project/movement.

http://langsec.org/

There are lots of compelling ideas there (and LANGSEC researchers would tend to agree that if a functionality can be implemented correctly as an FSM, that would be a safer option).

A related phenomenon that people talk about is the unexpected Turing-completeness (where people have been able to prove that so many different parts of computing are Turing-complete -- things that were never intended to be programming languages). LANGSEC people and others refer to the negative security implications of some of this as "weird machines", where you really didn't want Turing-completeness but you got it by accident or by default anyway, and it might be possible for an attacker who can corrupt control flow or other kinds of state to then perform arbitrary operations. For security people an example may be return-oriented programming, but there are evidently others that can be thought of in the same way.

ngneer · on Oct 6, 2016

Glad someone mentioned this. LANGSEC encourages you to think about computational power as privilege, and the principle of least privilege will guide you to constructing machines that are easier to reason about, leaving less room for ambiguity.

triplesec · on Oct 6, 2016

This is fascinating. Do you have any examples or stories of such 'weird machines'?

schoen · on Oct 6, 2016

The concept is presented in

http://www.langsec.org/papers/Bratus.pdf

I think people have found a number of other, well, weird examples. I found

https://www.usenix.org/conference/woot15/workshop-program/pr...

via Google Scholar and there are more papers on this theme if you search for "weird machines" there. It seems like a helpful way to think about this issue!

ChoHag · on Oct 6, 2016

At a higher level than the other guy, and not precisely the same thing:

http://blog.checkpoint.com/2016/02/02/ebay-platform-exposed-...

lawnchair_larry · on Oct 6, 2016

https://www.usenix.org/system/files/conference/woot13/woot13...

triplesec · on Oct 6, 2016

many thanks, peoples

nickpsecurity · on Oct 5, 2016

High-assurance security from 80's-90's used them too in both formal specifications and coding approach. It was more like looking at the kernel as a giant, state machine but you can do protocols or app components, too. Many will complain about unreadability of C code. However, a proper spec or DSL for state machines makes them extremely readable with the C code autogenerated. They're also extremely easy to formally verify as you pointed out.

On page 15, you can see Rockwell-Collins combining model-driven development with Z, SPARK, and state-machines. It's a nice example of power of interacting, state machine approach combined with some other elements.

https://www.nsa.gov/resources/everyone/digital-media-center/...

nitrogen · on Oct 5, 2016

If each state is a function that simply waits for events and returns the next state, the C code doesn't look that bad.

nickpsecurity · on Oct 5, 2016

It doesn't look bad versus regular C. You still have the C-specific risks in the module and at module interfaces that other language reduce. It's also harder to parse, analyze, and compile in a secure way. With SPARK & formal tools, one can get a strong analysis that the code will always meet the spec. The ability to fully do that for C has only come online in the past few years. It appears to take more work, too.

ArkyBeagle · on Oct 6, 2016

"Lack of robust first-class support" is no reason not to become well-versed in them. Static analysis is just as possible with them as with any other style in the chosen toolset.

Debugging is particularly straightforward - events cross states, yo. :) All you need is logging. Event detection code is particularly amenable to rigorous verification.

Toolchains like ObjecTime and Rational Rose had direct support for "coding using graphic representations of FSM" but the tools themselves were somewhat unwieldy and expensive and they sort of dinosaured.

jefffoster · on Oct 5, 2016

I wonder if this'll (or something like it) be the tipping point for governments to introduce more regulations for software? We already have regulations like PCI-DSS for dealing with payment card, maybe this'll end up with regulations for dealing with security?

The conclusion suggests better specification, better testing and more resilient architectures are the strategies that could make a difference. Perhaps software engineering will have to agree on a set of "best" practices that make security problems less likely to occur?

(typos)

jandrewrogers · on Oct 5, 2016

Regulations in software tend to encourage compliance theater rather than delivering the spirit of their intent in my experience. It isn't that difficult to conform to regulations technically with an awful and careless implementation, and I've seen it happen many times. It is "best effort", in part because it is obscenely onerous or impossible to ensure some properties in a strict sense.

Best practices also drift over time and vary with context. A challenge is that there are very old, very large code bases where as a practical matter will never be modernized because it is tantamount to a rewrite, with all of the bug creation and cost implied.

Robust software is definitely possible today even in languages like C++, but the development process for software that rarely fails isn't that compatible with the popular "fast iteration" development model nor the current computer science skills of most software engineers.

throwaway729 · on Oct 5, 2016

All of this is true, which is why effective policy interventions would punish bad outcomes, not prescribe methodology.

Make bad software expensive and software quality will eventually increase.

ArkyBeagle · on Oct 6, 2016

Bad software is already expensive. This is just poorly understood.

It's simply not fair to non-specialists to expect them to have a working knowledge of how cost works in software.

throwaway729 · on Oct 6, 2016

Fines and vulnerability to class actions make the costs far more visible (and increase it besides). If non-specialists routinely make bad decisions because they don't understand how cost works, then the solution is to make the cost of bad software dead obvious.

nickpsecurity · on Oct 5, 2016

They did before and it worked until they made it unwork:

http://lukemuehlhauser.com/wp-content/uploads/Bell-Looking-B...

I wrote a post on Schneier's blog describing a few ways security evaluation has been done plus how I would do it:

https://www.schneier.com/blog/archives/2014/04/friday_squid_...

My model can be used for regulations or liability. Another proposal I had was enforcing a minimum amount of assurance activities proven to knock out defects. Likely a subset of techniques in this post:

https://news.ycombinator.com/item?id=10734477

Note that U.S. government has already given prior guidance on robust software that mentioned some of those with specific tools. A number of companies in safety-critical are doing things like formal specifications for precise requirements, review of various items in lifecycle, automated generation of tests covering every path, and implementation in Ada/SPARK to knock out classes of errors there. There's also tools like Softbound+CETS plus minor modifications to processors that make most classes of errors or attack impossible. So, it's not theoretical so much as something practical that many avoid for convenience or extra profit. ;)

mildweed · on Oct 5, 2016

I can definitely see more standards and certifications coming to pass, which then later distill down to one or two primary certifications that everybody aspires to get. Currently there are multiple privacy/security standards (PCI, HIPPA, FERPA, etc). We'll probably have to get more opinions from more sensitive-data industries, and then for them all to find the common ground.

But how would we incentivize businesses properly to be [WHATEVER] compliant, and pay for the regular security scans? Social proof / public shame hasn't worked yet, despite MANY high profile examples. We may have to impose some sort of carrot/stick combo that says: "If your infrastructure consists of N# of networked devices, you must meet these standards. If you fail to, and get caught in a data breach (like what happens with PCI), fines that could put you out of business get levied. Additionally, if you show proof of your security certification every year, you get a tax break."

ryanmarsh · on Oct 5, 2016

Regulatory enforcement.

Even though PCI DSS is enforced by a private regulatory body, it still carries weight. HIPAA has HHS which does investigate and lead to corrective action. Both of these bodies scare the crap out of a lot of companies. I've worked in both sectors (banking/credit, healthcare) and the fear is palpable.

If the CFPB had the ability to enforce regulatory requirements for handling of PII you bet your ass companies would fall in line. Nobody wants trouble with the feds.

schoen · on Oct 5, 2016

These regulations have done a lot of good, but there's sometimes the way in which they seem to devolve into checklists (not to stigmatize checklists, which are valuable tools!), often without corresponding understanding and capability.

I've seen that particularly in the way that PCI rules led people to get certificates and turn on HTTPS, without knowing what a certificate, private key, public key, TLS, or HTTPS were or what they were for. While I prefer that to the alternative of no HTTPS at all -- which is probably what we would have seen without the rules! -- it's still a bit disappointing and concerning that for so many organizations it only got to the "have to satisfy PCI rules by doing this weird arbitrary thing" stage.

alaithea · on Oct 5, 2016

The U.S. Federal Government already has [NIST-800-53](https://en.wikipedia.org/wiki/NIST_Special_Publication_800-5...) (for software), and [FedRAMP](https://en.wikipedia.org/wiki/FedRAMP) (for cloud infrastructures). The regulations and certifications are there, it's a matter of implementing them properly in a way that actually ensures real-world security, not just that there's a huge, 200-300 pg. document somewhere describing a secure state of an application.

But this is what OpenControl is meant to solve. https://github.com/opencontrol/

agentultra · on Oct 5, 2016

> "best" practices

I'd rather a professional guild makes the call on what other capital-E Engineering disciplines call, the state of the art, is. We do have something like that, a document published under the IEEE called SWEBOK [0]... but it needs greater industry adoption and a the establishment of a profession that governs our ethics, IMHO. Presently it's nice to know but my employer pretty much dictates what corners I will cut and what mistakes are acceptable, etc. Since there's no liability involved and no actuaries adjusting the insurance rates based on our technology and process choices my employer doesn't have an immediate dollar-figure that gives consequence to those risky decisions.

[0] https://en.wikipedia.org/wiki/Software_Engineering_Body_of_K...

realkitkat · on Oct 5, 2016

Although it does mention some risks of using 3rd party code/components in passing under 'Understanding and Controlling Dependencies' (835-), I would have expected them to make a stronger case for OWASP 2013-TOP10-A09: 'using components with known vulnerabilities'[1] type of analysis. This applies to almost any kind of modern application, not just web apps, but especially to those written in native languages (C/C++/...) as we increasingly build our applications by bringing in varying amounts of 3rd party code; open source or commercial. Like traditional static code analysis, more or less mature tooling exists in open source[2][3][4] and commercial capacity to perform static binary and source code analysis to discover and track 'offending' 3rd party code, which does make it practical to include this kind of analysis to sdl/sdlc/devops workflows.

[1] https://www.owasp.org/index.php/Top_10_2013-A9-Using_Compone... [2] http://www.binaryanalysis.org/en/home [3] https://www.owasp.org/index.php/OWASP_Dependency_Check [4] https://github.com/OWASP/SafeNuGet

gravypod · on Oct 5, 2016

This is an interesting read. Despite this, I still cannot understand what a Formal Method is.

It just says a method is 'proven'? How? Can someone please develop a piece of software, document/video themselves doing it and explain what/why they are doing parts of the project?

Basically walk a plebeian like me through making a program using formal methods.

dwheeler · on Oct 5, 2016

It's a slide deck, not a paper, but you might find this presentation from my class helpful: http://www.dwheeler.com/secure-class/presentations/Secure-So...

A specific example of software developed using formal methods is Tokeneer: http://www.adacore.com/sparkpro/tokeneer/

There's a severe lack of publicly-worked examples. Most formal methods applications are private, which inhibits improvement and use. I'm an advocate for the creation of more "open proofs" where the tools, software being proved, and the proofs are all available as open source software. If we had more open proofs then it'd be a lot easier to understand, apply, and improve them.

nickpsecurity · on Oct 5, 2016

Ha! I just posted your FLOSS page in same thread. I appreciate work you did on high-assurance SCM and FLOSS summaries. I've been posting them for years in security forums to try to get some adoption. People still ignoring it with the reproducible builds fad being a recent example of people missing the bigger picture [that was already fleshed out in high-assurance field with methods that work, FOSS tools, and industrial examples].

Still an uphill battle but recent results are amazing. Do check out Myreen et al's work and COGENT's filesystem paper if you haven't seen them:

https://www.cl.cam.ac.uk/~mom22/

https://ts.data61.csiro.au/projects/TS/cogent.pml

gravypod · on Oct 5, 2016

I'll definitely give this a look.

I've been making posts every time this comes up and no one has been able to answer, or even quantify, what 'formal methods' where. The last time someone tried to sell me on them was accompanied by the pitch that "it means that your code proves itself that it's correct and that you will never have a bug ever"

Naturally I laughed at that as I can see someone peddling bullshit when I see it.

Do you mind if I email you about this powerpoint after giving it a look?

retox · on Oct 5, 2016

I believe the capability of ada to be formally proven was one of the reasons it has seen use in defence and real-time systems.

nickpsecurity · on Oct 5, 2016

Well, there was the mandate plus the language itself was better than others at safety & programming in the large. The bigger reason, though, was that it was systematically designed to eliminate error from the syntax to memory handling to integrations. I mean, really thorough. Good description of all that in case anyone wants to cherry-pick safety features for use in a simpler language.

http://www.adacore.com/knowledge/technical-papers/safe-and-s...

nabla9 · on Oct 5, 2016

Formal method means that the software can be formally proved to be correct (just like logic).

seL4 microkernel is formally verified, see: Comprehensive Formal Verification of an OS Microkernel, http://www.nicta.com.au/pub?doc=7371

It can be very costly to verify program after the fact, 250 lines of code per man-year is expensive. Program derivation is another way to do it, but it's also slow and significantly more complex process. Type systems, compilers and linters perform 'verification lite', they check statically for some easy to catch problems.

stonemetal · on Oct 5, 2016

A Formal Method means to create a Formal System and construct a formal proof.

https://en.wikipedia.org/wiki/Formal_system

As a small example of using Hoare Logic, which takes every single command (aka line of code) in your program and turns it in to a Hoare Triple. A Hoare Triple looks like {precondition} command {postcondition}.

As an example let's prove the output is not 476.

    int numb;
    .....
    numb = numb * 2;  
    numb = numb + 1; 
    print numb

For print numb to print 476 we have the precondition that numb must be 476.

{numb = 476} print numb {We don't care what happens after you print}

If numb is 476 on the line above then the line before this one must have a post condition identical to this line's precondition.

{} numb = numb + 1; {numb = 476}

For this line to be true it must have a precondition that numb = 475

so {numb = 475} numb + 1; {numb = 476}

Again we chain this line's precondition to the previous line's post condition:

{} numb = numb * 2; {numb = 475}

Given that numb is defined as an integer there is no possible precondition capable of making this post condition true. We have formally proven this program is incapable of printing 476. Further since the stuff in the brackets can be as complex as we like we could have proven this program is incapable of printing even numbers.

https://en.wikipedia.org/wiki/Hoare_logic

gravypod · on Oct 5, 2016

Doesn't that rely on, often incorrect, assumptions of underlying hardware? Doesn't this change based on the environment that the system is going to be put in? There are things people are saying are "guaranteed" about formal systems that just aren't true.

To have a formal system actually be "correct" it needs to have never touched "incorrect" software or hardware. This contains, but is not limited to, a "correct" compiler and hardware implementation.

This is, I'd say, often not the case.

vilhelm_s · on Oct 6, 2016

Right, whenever you try to use mathematics/logic to reason about the real world, you risk that some of your assumptions may be wrong. Every formally verified project will have some set of trusted assumptions. But hopefully this can still dramatically reduce software vulnerabilities---for example, systems more often fail because of bugs in the program logic than bugs in the x86 processor, so assuming a specification for the hardware and proving the program correct with respect to that should still be helpful.

(As an aside, hardware and compilers are actually comparatively tractable things to verify--the specifications are well-defined, and they are used by lots of clients so the effort to verify them pays off better. I think in the future, we are more likely to have unproven application programs running on top of proven compilers & hardware than the opposite.)

nickpsecurity · on Oct 5, 2016

Here's the two, best books on it that I know of that are available for free:

http://adam.chlipala.net/cpdt/cpdt.pdf

https://www.cis.upenn.edu/~bcpierce/sf/current/index.html

Dependent-types are a more lightweight formal method. They're pretty useful as Chlipala et al showed in many developments. IDRIS and ATS languages use them for systems. Software Foundations is about the full thing.

Tried to find you an easy intro. Harder than I thought. However, this PowerPoint is an excellent intro to formal specifications in terms of what they are, why you want them, and many examples.

http://www.cs.nott.ac.uk/~pszrq/files/1FSPintroduction.ppt

Once you see that, it becomes obvious that proving implementation matches such specs is beneficial. Took an hour to find an intro to that which was worthwhile and easy to digest. Here's one:

http://www2.imm.dtu.dk/~aeha/Railway/index/TSnotat.pdf

Anyone following along that wants a list of tools, esp open-source, should check out Wheeler's reference on high-assurance FLOSS:

http://www.dwheeler.com/essays/high-assurance-floss.html

joshmarlow · on Oct 5, 2016

I'm about to start reading "The Little Prover" (https://mitpress.mit.edu/books/little-prover) which sounds like a good intro.

nickpsecurity · on Oct 5, 2016

The Amazon reviews on it look good. I like how it's very incremental and more manual than modern provers. The latter being advantages as the beginner will have a glimpse of what they're doing or why. I might order a copy.

tptacek · on Oct 5, 2016

Take your software.

Rigorously define security goals ("no user can see another user's file; no input can cause this handler to access variables other than this specific list").

Convert all or, more likely, part of your program into a model; for instance, predicate logic.

Run analysis tools to verify that the model achieves the security goals. Once expressed in a mathematical model, "proving" it becomes a search problem, achievable with code. Intuition and cleverness can, if your model is accurate and complete, be eliminated.

It's sort of the software security equivalent of teaching a computer to play chess.

alaithea · on Oct 5, 2016

This was already started by 18F. If you haven't heard of them, they are the first U.S. federal gov digital consultancy, and the primary force behind a lot of security/certification improvements already.

https://github.com/opencontrol/ https://github.com/opencontrol/compliance-masonry

tptacek · on Oct 5, 2016

This stuff is neat, but it's more of a formal methods approach to security management than it is to reducing software vulnerabilities.

To apply formal methods to software development, you do one of two things:

* You transform code in conventional programming languages into machine-readable formats understood by model checkers

* You write your code in a language that itself admits to being model checked.

I appreciate this link, though; I was unaware of what 18F was doing here.

alaithea · on Oct 5, 2016

I'm pretty sure what you're talking about exists in the OpenControl repos somewhere-- I'm just not sure which one (sorry). My former 18F colleague was recently talking about translating controls about password specifications into actual automated tests on the codebase and/or app.

The reason all that "formal methods" stuff is in there as well (and indeed it's what the more prominent repos in there are about) is so that we can eventually connect the dots between effective security practices in code and the management of the security controls on the administrative side, and at least until the former catches up to the software methods, take the manual work of generating the copious documentation regarding all of this out of the developers' hands. Case in point: https://compliance.cloud.gov/ (<- all auto-gen, AFAIK)

agentultra · on Oct 5, 2016

> Can someone please develop a piece of software, document/video themselves doing it and explain what/why they are doing parts of the project?

iirc I think Leslie Lamport might've been thinking about making just such a series on TLA+ on the ML... TLA+ is a software specification system he wrote that was popularized by Amazon and Microsoft. It's open and the tools, while rough, are free to use.

If he doesn't go through with it I've been looking for something to teach and I'd like almost nothing more than to teach TLA+ to more people.

quantumhobbit · on Oct 5, 2016

I'm no expert but the way I understand it is you are mathematically proving certain things about the software. Basically you prove that there won't be a division by zero or buffer overflow or whatever in a certain function.

Usually there are extensions to programming languages that let you annotate these proofs and software can check it for you.

Like with unit tests you only get these assurances if you write out the proofs. Historically writing them has been really time consuming, but the paper claims it is getting much more convenient.

johncolanduoni · on Oct 5, 2016

Some of the methods they list (like SAT solvers and model checkers) mostly obviate the actual proving part for simple statements. Instead you only need to input what you want to prove, and the computer attempts to find a proof or disproof. For many of the things you would want to prove in software, this is sufficient.

udkl · on Oct 5, 2016

Here is a paper by the AWS team that describes their approach to formal methods and which you could find 'tangible' : http://research.microsoft.com/en-us/um/people/lamport/tla/fo...

adrianN · on Oct 5, 2016

Not exactly what you're looking for, but maybe a good starting point

http://cs.stackexchange.com/questions/13785/formal-program-v...

jk4930 · on Oct 6, 2016

Recent article and discussion: https://news.ycombinator.com/item?id=12612856 (A Hoare Logic for Rust)

jerf · on Oct 5, 2016

Program proofs can get arbitrarily clever/complicated.

But one of the fundamental maneuvers is to examine the full input space of a given function, and assert that the fully output space of a function has certain properties, which then may get fed in to later uses of the results of that function to further assert things.

The "full input space" of a function is generally what you'd expect it to be, but the "full output space" of a function is worth discussing. It is not just the return values of the function in the sense the local language means, it is also the full state of visible changes to the environment, any exceptions, etc. So, for instance, in the psuedocode function

     // "int" is the conventional wrapping machine int here
     function sample (int x, int y) int {
         if (x == 25) {
             throw DontLike25Error("I can't stand the number 25")
         }
         if (y == 0) {
             external yZeros; // imagine I'm importing a global variable
             yZeros++;
         }
         return x + y;
     }

A human programmer will conceptualize this as "a function that sums two numbers with a weird bug", and probably ignore the yZeros entirely. A prover sees this as a function that "returns the exception DontLike25Error OR returns a new yZeros and the wrapped sum of the two input values". You generally conceive of exceptions as a return value, and global variable updates as a new value, etc.

So, if you want to prove a function will never divide by zero, it is at least conceptually a case of iterating over all possible inputs and verifying that the property holds that a DivideByZero exception is never returned:

     function divide (num float, denom float) float {
         if (denom == 0) {
             return 0; // actually a bad idea, but bear with me
         }
         return num / denom;
     }

We can now easily prove this never divides by zero, because we can examine all the possible nums and denoms and verify we always get a "float".

At least... if == 0 correctly detects all possible zeros, including the local negative zero if applicable, and doesn't return true for any other possible float value, and if the division operator doesn't yield a DivisionByZero for any other value accidentally or deliberately. And our proof system would still have to bear in mind the possibility of underflow (a non-zero numerator may still yield a 0 if the denominator is sufficiently large) or possibly an Inf overflow if the denominator is sufficiently small.

And with that paragraph, you begin to see both the problems with creating proof systems, and using them. We programmers want to slop down a division operator and assume that it's going to divide two numbers and yield a third number and we don't want to care about the exceptions. A proof system makes you care. As the program scales up, this only gets more interesting; the previous paragraph was just about one usage of the division operator. Pretty much every operator and function you've ever used comes with a similarly intense list of caveats, exceptions, and weirdnesses if you want to be really rigorous about what happens in your code. Especially including things like "allocating memory" or "allocating disk" as functions, which even before we're trying to prove things are often so difficult to deal with we programmers just punt entirely.

From these simple building blocks, you can start building you way up to more interesting assertions. But this is the foundation that the whole system is built on.

It's also worth pointing out that from this point of view, a prover is doing what even the sloppiest programmer is doing in their head anyhow. You have to at least to some extent be thinking about what goes into a function and exactly what comes out. You just probably aren't considering anywhere near all the options. But you must be thinking like this in some manner, no matter how incompletely and sloppily, to be a programmer at all. Proving things about code is less foreign than it should seem at first.

You can also even see how dynamic languages can start being difficult to deal with at scale if you don't apply a lot of discipline external to the dynamic language itself, because dynamically-typed languages have very "large" functions when you view them this way; with no way to constrain the inputs, it can be difficult to "range across all possible inputs" conceptually. This has perfectly practical real effects for things like tracing optimizing interpreters, which must in principle always be verifying incoming types to functions in a way a static language doesn't need to, no matter how optimized that check may be in practice.

You can put your toe into this world simply by learning how to use your type system to its maximum power. Type systems are in general not all that powerful compared to what a full prover can prove. (Not even Haskell's.) But it's a start, and it's relatively easy, and has good bang for the buck.

Incidentally, once you start looking at your code this way, it becomes obvious why functional programming is much easier to prove things about; in the purest case, you can't get surprise global variable updates and you don't get random unions of what can be returned by a function with things like "exceptions". You've got many fewer cases to worry about because the functions are what their type signature says they are, instead of their type signature unioned with a large set of exceptions unioned with arbitrary IO unioned with arbitrary global variable updates unioned with arbitrary code updates (in dynamic languages). However, this pretty image is often not fully realized; even Haskell can throw exceptions anywhere, even if it is discouraged. For math reasons, Haskell programmers say anything can be "bottom", which is this additional "crash" value that Haskell doesn't eliminate.

hyperion2010 · on Oct 5, 2016

I find this hilarious in light of what we know about the NSA. All solid recommendations, but they are bandaids when compared to the sheer weight of capital that is being misused by hoarding known vulns. Unfunded mandates are fine for grandstanding.

schoen · on Oct 5, 2016

I strongly oppose vulnerability hoarding, but I'd point out that if we don't think it's easy to make political or cultural progress against it quickly, a technical alternative to make people safer is actually finding ways to make safe software. Plus, if "bugs are dense" advocates are right, we need major qualitative improvements to software development in order to make much of an impact (because they argue that different hoarders, or even different from-scratch researchers, would primarily know about different bugs).

So it's not unreasonable for NIST (which actually wants people to enjoy computer security) to investigate what could be done in this area!

Edit: but it's true that it would be awesome to see government do something to make these improvements actually happen -- and a research report probably isn't that.

bryant · on Oct 5, 2016

Incidentally, BSIMM 7 (https://www.bsimm.com/download/) also came out this month. Whereas BSIMM is descriptive and OpenSAMM (http://www.opensamm.org/) is prescriptive, I'm curious to see how NIST tackles the secure development lifecycle.

roymurdock · on Oct 5, 2016

Goal:

The goal of this report is to present a list of specific approaches that have the potential to make a dramatic difference reducing vulnerabilities – by stopping them before they occur, by finding them before they are exploited or by reducing their impact.

Impetus:

In response to the February 2016 Federal Cybersecurity Research and Development Strategic Plan, NIST was asked to identify ways to dramatically reduce software vulnerabilities. NIST worked with the software assurance community to identify five promising approaches. This report presents some background for each of the approaches along a summary statement of the maturity of the approach and the rationale for why it might make a dramatic difference. Further reading was provided for each approach. Hopefully other approaches will be identified in the future.

Current Cybersecurity Approaches:

1. Formal Methods - how do you check your program while writing code? (static analysis, model checkers, pragmas, model-based programming, and more recently contracts)

2. System Level Security - how do you partition components in a system? (containers/hypervisors/virtualization with smart API calls)

3. Additive Software Analysis - how do you analyze completed software to check for problems? (expression and exchange standards, better IDE plugins, advanced static analysis tools)

4. Mature Development Standards - how can you modify and reuse tried and tested code? (find, understand, learn, and combine the most relevant frameworks)

5. Moving Target Defense & Artificial Diversity - how can you minimize your program's attack surface? (compile-time diversification, cryptography, OS interface scrambling)

6. On Metrics - how do we even define and measure a bug? ("There are far too many proposed measures to evaluate or even list here. We can state that, as alluded to above, metrics and measures should be firmly based on well-established science and have a rational foundation in metrology to have the greatest utility.")

Government Calls to Action:

1. Engage the research community (funding, contests and prizes, infrastructure)

2. Education & training for the dev community (school curriculum, training programs)

3. Policy, Legal, etc. (procurement standards, liability for failures, insurance, standards, code repositories)

---

In my opinion, this is a good piece to skim through as you'll probably come across 1 or 2 practices/fields you hadn't heard of before. The recommendations are (intentionally) vague.

Functional security is already taken very seriously in many life and safety-critical industries. One potentially-troubling trend we are seeing in the automotive/aerospace & defense industry is the consolidation of traditionally disparate systems into heterogeneous, multicore processors in order to save on hardware costs. These systems are relatively complex, and have the potential to mix safety-critical environments (brakes, steering, acceleration ECUs running an RTOS) with application environments (infotainment system running Linux). Partitioning is big here. There are well-defined and accepted standards for the majority of industrial automation, transportation, medical, automotive, and aerospace & defense targeted systems.

Data security and privacy is not taken seriously by the majority of consumer corporations (and even some government agencies), and won't be until the legal repercussions of failing to protect client information are a serious threat to the companies that build and maintain these databases and systems.

jwmullally · on Oct 5, 2016

To back up your point, this was a great blog post on the fatal consequences of failure in mission-critical car software: http://embeddedgurus.com/state-space/2014/02/are-we-shooting...

After reading one may think: Why the fuck is there any dynamic memory allocation (stack) going on there? Why isn't the car software implemented as one big pre-allocated hierarchical state machine? You wouldn't expect such execution dynamism in hard circuits or even FPGAs. But in any case I think non-deterministic systems (especially complex CPUs+programs) should be outlawed for such mission critical signal paths.

ArkyBeagle · on Oct 6, 2016

You got me as to why this is considered remotely acceptable. This being said, you sort-of need a stack for a context switch ( or possibly an ISR ) in many RTOS offerings.

This also being said, I've had engineers react rather violently to the very concept of FSMs. I can't say why that is.

nickpeterson · on Oct 5, 2016

My affection for F# likes that ML and F# are explicitly mentioned in section 3.4.2 "Better Code".