"Running leading LLMs across the eleven systems shows that LLMs are great at producing correct TLA+ syntax but struggle to ensure conformance and appropriate invariants."
zihotki 5 hours ago [-]
I think that's what author implies by this sentence in the intro:
> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.
nyrikki 2 hours ago [-]
As TLA+ uses state machines that can define infinite state spaces, checking arbitrary temporal logic formulas is undecidable in the general case.
Using a TLC model checker to verify invariants or properties, finding a valid counterexample can scale non-elementarily.
In fact, in some situations it can end up being Ackermann-complete[0], but even recursively enumerable problems can be Tower-complete.
I would say that unless you focus on more detailed temporal logic pitfalls you may have issues.
So IMHO this will stay a use case specific solution, that you choose based on context.
Even a common solution like adding Circumscription causes counterintuitive changes [1][2].
IMHO, if you want to use TLA+ as a primary method, you will need some depth or be ready to abandon it by time boxing etc…
Remember that we know the open domain frame problem in [2] is equal to HALT, it will not universally apply.
It is just another tool that works well when it works well.
I found Quint to be a good pairing with LLM. Easier syntax to learn so you are actually collaborating instead of relying on hope.
oa335 53 minutes ago [-]
What are the downsides as compared to TLA+?
esafak 2 hours ago [-]
What are you doing with it? Any write up?
arpinum 2 hours ago [-]
data pipelines and data repair.
aerodexis 4 hours ago [-]
I did this myself a few weeks ago and the technique that helped me the most was to compare the TLA output against the race-conditions I could construct by hand. I worked iteratively and unit-tested the model by constructing a model that didn't have certain race protection mechanisms and validating that the model generated the expected race.
You can also work backwards from the races it generates and ensure that they're real races.
esafak 3 hours ago [-]
What did you do to get the TLA output?
aerodexis 3 hours ago [-]
I was working from a design-doc, not code.
"look at @design-document and generate a TLA+ specification for the interactions between local and remote"
sigbottle 2 hours ago [-]
Sorry, I thought this would've been a method for finding invariants, not one just for expressing them? I guess I should think about TLA+ as ultimately some kind of solver - give it a configuration, it tells me if it's defined well or not, the point is to make sure I'm not making mistakes, but not necessarily automated innovation?
vbernat 1 hours ago [-]
Isn't this line incorrect?
BB == b > 1 /\ b' = b - 2 /\ w' = w + 1 \* Picked 2 black
It should b > 2, otherwise you'll get in an invalid state.
Jtsummers 16 minutes ago [-]
The state would be valid. If b > 1 then there are at least two (b > 1 is equivalent to b >= 2 here since b must be a non-negative integer). Remove two (b' = b - 2) and add one white (w' = w + 1). If b == 2, then we still end in a valid state, it'll just be (b = 0 & w' = w + 1).
leoqa 4 hours ago [-]
Unfortunately the benefit of TLA+ is the act of modeling your system painstakingly. The actual checker helps confirm your hypothesis, etc. But skipping the modeling and outsourcing it is not ideal. I’ve always struggled reasoning about models my team mates wrote, and will often have to mentally go through the process of arriving at the same abstractions/invariants etc before I can understand it.
UltraSane 3 hours ago [-]
Exactly. I view a complete TLA+ specification as a kind of metalanguage that can be used with LLMs to generate code from.
UltraSane 3 hours ago [-]
TLA+ should be generated by hand as a extremely detailed spec for LLMs to use to generate code
relativeadv 3 hours ago [-]
right? I read the kleppman post sometime ago about formal verification taking off but i could never square away who verifies the verifier.
nextos 2 hours ago [-]
That might emerge as one of the main tasks of future software engineers, writing formal specifications by hand.
It could be the case that Tony Hoare was right, just too early.
"Running leading LLMs across the eleven systems shows that LLMs are great at producing correct TLA+ syntax but struggle to ensure conformance and appropriate invariants."
> It’s still your responsibility to understand your system and define what “correctness” means, and you need a high-level understanding of temporal logic.
Using a TLC model checker to verify invariants or properties, finding a valid counterexample can scale non-elementarily.
In fact, in some situations it can end up being Ackermann-complete[0], but even recursively enumerable problems can be Tower-complete.
I would say that unless you focus on more detailed temporal logic pitfalls you may have issues.
So IMHO this will stay a use case specific solution, that you choose based on context.
Even a common solution like adding Circumscription causes counterintuitive changes [1][2].
IMHO, if you want to use TLA+ as a primary method, you will need some depth or be ready to abandon it by time boxing etc…
Remember that we know the open domain frame problem in [2] is equal to HALT, it will not universally apply.
It is just another tool that works well when it works well.
[0] https://arxiv.org/abs/2104.13866
[1] https://arxiv.org/abs/2407.20822
[2] https://en.wikipedia.org/wiki/Circumscription_(logic)
You can also work backwards from the races it generates and ensure that they're real races.
"look at @design-document and generate a TLA+ specification for the interactions between local and remote"
It could be the case that Tony Hoare was right, just too early.