Stream: Beginner Questions

Topic: Parser for Isabelle


view this post on Zulip Minh D (Feb 21 2024 at 05:20):

I am wondering if there is a way to parse the syntax of Isabelle/Isar into an AST that I can implement in Isabelle/jEdit, I know that there is a parser in Isabelle/ML in the source of Isabelle, but how can I see the actual parsed tree?

view this post on Zulip Fabian Huch (Feb 21 2024 at 08:17):

There is no parse tree - parsing is done via parsing combinators without ever creating one. As the syntax is not static but user-extensible, parsing is dependent on the theory context.

Not sure what you're trying to implement but you can view the markup tree already in jEdit in the Sidekick panel (select isabelle-markup).

view this post on Zulip Manuel Eberl (Feb 21 2024 at 10:18):

Just for the sake of completeness: There is an AST for the inner syntax. But Isar is part of the outer syntax, which indeed does not have an AST.

view this post on Zulip Minh D (Feb 21 2024 at 13:49):

Thank you both for the answers.

As the syntax is not static but user-extensible, parsing is dependent on the theory context.

As I am new to this, what do you mean by user-extensible? Isn't the syntax for Isabelle/Isar is already mentioned in the Isabelle/Isar reference manual?

There is an AST for the inner syntax.

How can I get to produce the AST for the inner syntax?

view this post on Zulip Manuel Eberl (Feb 21 2024 at 15:46):

At almost any point in an Isabelle theory, you can open an ML block and do some stuff there. Such as: declaring new commands (like proof, lemma, by, qed, fun), new methods (like simp, auto) etc., and when you do that you can specify a parser (in the form of a parser combinator) to parse the arguments of that command or method.

view this post on Zulip Manuel Eberl (Feb 21 2024 at 15:46):

The "built-in" Isar commands like "have", "also", etc. are not special in that regard. You could define your own Isar-style commands in your theories, and indeed some people do.

view this post on Zulip Manuel Eberl (Feb 21 2024 at 15:47):

The reference manual just lists the canonical pre-defined Isar commands.

view this post on Zulip Manuel Eberl (Feb 21 2024 at 15:48):

The only thing that is really fixed (as far as I am aware) is the tokenisation.

view this post on Zulip Manuel Eberl (Feb 21 2024 at 15:54):

I actually have no idea how to look at the inner syntax AST. I just know it exists because there are parse/print translations (kind of "hooks" that can manipulate syntax after parsing) that operate on it. Chapter 8 of the reference manual has some information about this. But this is quite obscure internal stuff that only very few Isabelle users actually care about.

view this post on Zulip Yutaka Nagashima (Feb 21 2024 at 23:16):

Dear @Minh D

Do you already know what you intend to do with the inner syntax ASTs?

I have some experience in analyzing them to build AI tools for Isabelle. Perhaps I am familiar with the APIs you are searching for.

view this post on Zulip Minh D (Feb 22 2024 at 02:42):

Thank you @Manuel Eberl for the detailed explanation, I finally see what it means to say "user-extensible" syntax for Isabelle.
@Yutaka Nagashima Thank you, it would be great if I can understand how to work around with the inner syntax AST, can I DM you?

view this post on Zulip Yutaka Nagashima (Feb 22 2024 at 20:07):

@Minh D
Sure. I have a full-time job, so please be aware that my replies may be slow.

view this post on Zulip Chengsong Tan (Mar 18 2024 at 08:33):

Yutaka Nagashima said:

Dear Minh D

Do you already know what you intend to do with the inner syntax ASTs?

I have some experience in analyzing them to build AI tools for Isabelle. Perhaps I am familiar with the APIs you are searching for.

Dear @Yutaka Nagashima

Could I get some help from you as well, please?
I have similar questions of needing a parser for Isabelle to do some source-to-source transformation.
For instance, some search-and-replace cannot be intuitively implemented using regex only, such as replacing occurrences of variables that are not inside a lambda expression.
Could I DM you about the details?
Thanks in advance!

Best wishes,
Chengsong

view this post on Zulip Manuel Eberl (Mar 18 2024 at 09:15):

There is some prior work on that. Some students at TUM were working on building refactoring tools for Isabelle/jEdit. As I recall they used the markup generated by PIDE, i.e. the same stuff Isabelle/jEdit uses for highlighting and annotations as well. But I honestly don't know anything about this stuff. Maybe @Fabian Huch knows?

view this post on Zulip Fabian Huch (Mar 18 2024 at 15:19):

I don't know of any students doing that.

view this post on Zulip Manuel Eberl (Mar 19 2024 at 16:54):

Hm, someone ought to know. Sounds like the kind of thing you'd have supervised. Maybe ask Tobias.

view this post on Zulip Yutaka Nagashima (Mar 19 2024 at 23:12):

Chengsong Tan said:

Yutaka Nagashima said:

Dear Minh D

Do you already know what you intend to do with the inner syntax ASTs?

I have some experience in analyzing them to build AI tools for Isabelle. Perhaps I am familiar with the APIs you are searching for.

Dear Yutaka Nagashima

Could I get some help from you as well, please?
I have similar questions of needing a parser for Isabelle to do some source-to-source transformation.
For instance, some search-and-replace cannot be intuitively implemented using regex only, such as replacing occurrences of variables that are not inside a lambda expression.
Could I DM you about the details?
Thanks in advance!

Best wishes,
Chengsong

Sure, you can send me direct messages. It might take a bit longer to respond since I have a full-time job.

However, if you are working on a non-confidential project, I think it's better to have our conversation in public. That way, others can also read the messages. They might learn something new or may know something that I don't.

view this post on Zulip Chengsong Tan (Mar 26 2024 at 22:47):

(deleted)

view this post on Zulip Chengsong Tan (Mar 26 2024 at 22:51):

Yutaka Nagashima said:

Chengsong Tan said:

Yutaka Nagashima said:

Dear Minh D

Do you already know what you intend to do with the inner syntax ASTs?

I have some experience in analyzing them to build AI tools for Isabelle. Perhaps I am familiar with the APIs you are searching for.

Dear Yutaka Nagashima

Could I get some help from you as well, please?
I have similar questions of needing a parser for Isabelle to do some source-to-source transformation.
For instance, some search-and-replace cannot be intuitively implemented using regex only, such as replacing occurrences of variables that are not inside a lambda expression.
Could I DM you about the details?
Thanks in advance!

Best wishes,
Chengsong

Sure, you can send me direct messages. It might take a bit longer to respond since I have a full-time job.

However, if you are working on a non-confidential project, I think it's better to have our conversation in public. That way, others can also read the messages. They might learn something new or may know something that I don't.

Yes sure, I am happy to discuss in public. Do you know of any existing Isabelle parsers? I essentially want to do some source-to-source transformations in Isabelle, and to do that I need to be able to parse the code such that I can refer to the constructs like a lemma, a conjunct in a logic formula, a subgoal, a named fact inside a lemma and etc. It would be great if there's already something that can be used out of the box. Or if you have any experience in hand-rolling such a parser and know the caveats it would be very helpful.

Thanks a lot,
Chengsong

view this post on Zulip Yutaka Nagashima (Apr 07 2024 at 16:43):

@Chengsong Tan

Hi Chengsong,

Sorry for this late reply.

I do not have a monolithic framework for this.

However, in my projects, I worked on source-to-source transformation, partly using a domain-specific language (DSL).

The ML code I used for this resides here: https://github.com/data61/PSL/tree/master/SeLFiE.
The DSL is described in this paper: https://arxiv.org/abs/2010.10296.

need to be able to parse the code such that I can refer to the constructs like a lemma, a conjunct in a logic formula, a subgoal, a named fact inside a lemma and etc.

I am familiar with referring to constructs like lemmas, conjuncts in a logic formula, subgoals, and named facts inside a lemma, but in the context of Isabelle/ML.

Would you prefer to work on Isabelle/ML?

Regards,
Yutaka

view this post on Zulip Chengsong Tan (Apr 25 2024 at 22:40):

Yutaka Nagashima said:

Chengsong Tan

Hi Chengsong,

Sorry for this late reply.

I do not have a monolithic framework for this.

However, in my projects, I worked on source-to-source transformation, partly using a domain-specific language (DSL).

The ML code I used for this resides here: https://github.com/data61/PSL/tree/master/SeLFiE.
The DSL is described in this paper: https://arxiv.org/abs/2010.10296.

need to be able to parse the code such that I can refer to the constructs like a lemma, a conjunct in a logic formula, a subgoal, a named fact inside a lemma and etc.

I am familiar with referring to constructs like lemmas, conjuncts in a logic formula, subgoals, and named facts inside a lemma, but in the context of Isabelle/ML.

Would you prefer to work on Isabelle/ML?

Regards,
Yutaka

Hi Yukata,

Thanks a lot for replying!

Yes I can work with Isabelle/ML, any tool that gets the job done would be great (I haven't used it though).

So if I have an Isabelle theory file, what's the quickest way to set up a parser and play with it by manipulating the .thy file?

I am now starting to read your paper, please feel free to tell me the best way to get started on this.

Best wishes,
Chengsong

view this post on Zulip Yutaka Nagashima (May 02 2024 at 22:11):

Hi @Chengsong Tan ,

(Sorry for the late reply.)

This might sound strange, but I believe you don't need a new parser. You can use Isabelle's default parser and manipulate the abstract (inner-)syntax tree.

You previously mentioned wanting to perform source-to-source transformations, specifically "replacing occurrences of variables that are not inside a lambda expression."

For most cases, manipulating the inner-syntax trees should suffice.

A good first step might be to familiarize yourself with the abstract syntax tree by using the ML functions in this file: src/Pure/term.ML.

I personally found that I had to transform this datatype into a format more amenable for analysis and developed this file for that purpose: https://github.com/data61/PSL/blob/master/SeLFiE/SeLFiE.thy.

view this post on Zulip Chengsong Tan (May 06 2024 at 23:09):

Yutaka Nagashima said:

Hi Chengsong Tan ,

(Sorry for the late reply.)

This might sound strange, but I believe you don't need a new parser. You can use Isabelle's default parser and manipulate the abstract (inner-)syntax tree.

You previously mentioned wanting to perform source-to-source transformations, specifically "replacing occurrences of variables that are not inside a lambda expression."

For most cases, manipulating the inner-syntax trees should suffice.

A good first step might be to familiarize yourself with the abstract syntax tree by using the ML functions in this file: src/Pure/term.ML.

I personally found that I had to transform this datatype into a format more amenable for analysis and developed this file for that purpose: https://github.com/data61/PSL/blob/master/SeLFiE/SeLFiE.thy.

Hi Yukata,

Thanks a lot for your reply!
I have been looking at the file term.ML as well as parse.ML, but hvae not yet found a function which turns a string into a theory AST, as to my understanding, there should be some sort of parser in Isaebelle that parses an entire .thy file, and returns a list of statements/code blocks (or AST trees), but where can I find them?

Thanks a lot,
Chengsong

view this post on Zulip Yutaka Nagashima (May 07 2024 at 01:56):

@Chengsong Tan

as to my understanding, there should be some sort of parser in Isaebelle that parses an entire .thy file

:smiling_face_with_tear:

Maybe the previous comments from @Fabian Huch and @Manuel Eberl could be helpful.

If I understand correctly, the entire Isabelle framework does not function as you, I, and other ML enthusiasts would wish.

I presume that the simple parser-AST mechanism has been traded off for a flexible user interface and ubiquitous parallelism. :thinking:

On a fundamental level, I have the impression that the original Isabelle framework was built with a focus on proof (script) checking rather than on proof construction in the 1980s.

And this design choice seems to permeate from low-level inference to the Isar language.

On the bright side, creating a proof obligation of type 'thm' or 'Proof.state' from a string describing only the inner syntax is relatively straightforward. :smiling_face:

view this post on Zulip Fabian Huch (May 10 2024 at 16:25):

Chengsong Tan said:

I have been looking at the file term.ML as well as parse.ML, but hvae not yet found a function which turns a string into a theory AST, as to my understanding, there should be some sort of parser in Isaebelle that parses an entire .thy file, and returns a list of statements/code blocks (or AST trees), but where can I find them?

There is a parser that turns it into a list of statements (we call them commands). An AST does not exist conceptually, as outer syntax parsers are shallowly embedded. You can find the parser code in src/Pure/Isar/token.ML and src/Pure/Isar/token.scala.

view this post on Zulip Fabian Huch (May 10 2024 at 16:33):

Yutaka Nagashima said:

If I understand correctly, the entire Isabelle framework does not function as you, I, and other ML enthusiasts would wish.

I presume that the simple parser-AST mechanism has been traded off for a flexible user interface and ubiquitous parallelism. :thinking:

Mostly for a flexible user interface. With a shallow embedding for parsers, one still could gain that, but then you still have the problem that there is no closed set of syntax.

Yutaka Nagashima said:

On a fundamental level, I have the impression that the original Isabelle framework was built with a focus on proof (script) checking rather than on proof construction in the 1980s.

Isar was developed much later (2004 ish). The focus was human-readable proofs. For writing proofs, one usually does not need source text transformations (especially at the time).

view this post on Zulip Jamie (May 11 2024 at 15:20):

Hello everyone,
I am trying to implement a pluraity voting system and i have problem when adding the ballots...
I have the isabelle code below with which I want to take a sorted Preference-List (most favoured at top, least at bottom) and add a 1 which was encrypted with aencrypt from "Game_Based_Crypto.Elgamal" to another list. The other list has Pairs which consist of one element like the ones in the Preference-List and a second one which holds the encrypted numeric value of how many times someone voted for it:

theory verifyPlurality imports
"Game_Based_Crypto.Elgamal"
"...lot of folders.../Preference_List"
begin
context elgamal_base
begin

(*The Preference_List is already encrypted,
 this function takes the encrypted Preference_List finds its first element in
 the Pair-List representing the Ballot where the votes are accumulated and adds 1
 (encrypted 1 because of the homomorphic adding)*)
fun add_plurality_ballot :: "'grp pub_key ⇒'a Preference_List ⇒ ('a × 'b) list ⇒ ('a × 'b) list"
    where
    "add_plurality_ballot _ [] (s) = s"|
    "add_plurality_ballot pk (x # xs) (s) =
    (case find (λ(y,c). (x = y)) s of
      None ⇒ add_plurality_ballot pk xs (s) |
      Some (y, c) ⇒ (y,c + (aencrypt pk 1)) # remove1 (λ(a,_).a = y) s)"

end
end

A Preference_List is just this with some functions: type_synonym 'a Preference_List = " 'a list"
I get this error:

Type unification failed: Variable 'grp::type not of sort one

Type error in application: incompatible operand type

Operator:  aencrypt pk :: 'grp  ('grp × 'grp) spmf
Operand:   1 :: ??'a

Coercion Inference:

Local coercion insertion on the operand failed:
Variable 'grp::type not of sort one

Now trying to infer coercions globally.

Coercion inference failed:
weak unification of subtype constraints fails
Clash of types "_ × _" and "_ ⇒ _"

I mean, I get these are not the same types but is there a way to convert one or is there an altogether different/better solution?

Thank you for the help :)

view this post on Zulip Yutaka Nagashima (May 12 2024 at 01:34):

Fabian Huch said:

Isar was developed much later (2004 ish). The focus was human-readable proofs. For writing proofs, one usually does not need source text transformations (especially at the time).

Yes, Isar was developed in the 2000s, later than the original Isabelle framework.

When writing the comment, I assumed that recently many people have been trying to obtain abstract-syntax trees of 'proofs' from Isabelle for machine learning.

In that sense, the problem stems from the original design of Isabelle, and Isar adds another level of complexity to that. Of course, this does not imply that the designs of Isabelle or Isar were bad; they were good for allowing users to check their statements by interacting with Isabelle.

However, Isabelle is not designed to produce what machine learning enthusiasts are seeking these days.

(I do not know about @Chengsong Tan 's motives, though. :sweat_smile: )

view this post on Zulip Yutaka Nagashima (May 12 2024 at 01:39):

Jamie said:

Hello everyone,
I am trying to implement a pluraity voting system and i have problem when adding the ballots...
I have the isabelle code below with which I want to take a sorted Preference-List (most favoured at top, least at bottom) and add a 1 which was encrypted with aencrypt from "Game_Based_Crypto.Elgamal" to another list. The other list has Pairs which consist of one element like the ones in the Preference-List and a second one which holds the encrypted numeric value of how many times someone voted for it:

theory verifyPlurality imports
"Game_Based_Crypto.Elgamal"
"...lot of folders.../Preference_List"
begin
context elgamal_base
begin

(*The Preference_List is already encrypted,
 this function takes the encrypted Preference_List finds its first element in
 the Pair-List representing the Ballot where the votes are accumulated and adds 1
 (encrypted 1 because of the homomorphic adding)*)
fun add_plurality_ballot :: "'grp pub_key ⇒'a Preference_List ⇒ ('a × 'b) list ⇒ ('a × 'b) list"
    where
    "add_plurality_ballot _ [] (s) = s"|
    "add_plurality_ballot pk (x # xs) (s) =
    (case find (λ(y,c). (x = y)) s of
      None ⇒ add_plurality_ballot pk xs (s) |
      Some (y, c) ⇒ (y,c + (aencrypt pk 1)) # remove1 (λ(a,_).a = y) s)"

end
end

A Preference_List is just this with some functions: type_synonym 'a Preference_List = " 'a list"
I get this error:

Type unification failed: Variable 'grp::type not of sort one

Type error in application: incompatible operand type

Operator:  aencrypt pk :: 'grp  ('grp × 'grp) spmf
Operand:   1 :: ??'a

Coercion Inference:

Local coercion insertion on the operand failed:
Variable 'grp::type not of sort one

Now trying to infer coercions globally.

Coercion inference failed:
weak unification of subtype constraints fails
Clash of types "_ × _" and "_ ⇒ _"

I mean, I get these are not the same types but is there a way to convert one or is there an altogether different/better solution?

Thank you for the help :)

Hi @Jamie ,

I guess this question is not so much about the ‘Parser for Isabelle.’
So, I suggest you open a separate question in ‘Beginner Questions’ for maximum visibility.

:smile:


Last updated: May 19 2024 at 01:13 UTC