0. How to read this file ======================== There are three directories. - Bench : benchmark files with the axiomatisation from the TPTP archive - Bench2 : benchmark files with an axiomatisation in TPTP format by Hoefner et al - Prover9_tests : benchmark files for Prover9, with an hand-crafted axiomatisation (Should be similar to Benc2) - KLE.v : a single file containing the relevant tests in coq. The main difference between these axiomatisation seems to be the status of the inequalities. You should create a directory to hold the installations of the various provers. We used: - Prover9, version 2009-11A - SPASS, version 3.5 - Vampire10, from CASC-22 http://www.cs.miami.edu/~tptp/CASC/22/Systems.tgz 1.-a Tests of various ATP ======================= For each directory, we provide a makefile that will launch all computations. To test the various systems, we use a single entry in TPTP format, and use the makefile to launch all computations, one at a time. To configure the systems used, please modify the header of the makefile to point to the locations of Spass, Prover9, Vampire on your system. $ make all then launch the computations, and collect the results in a summary. Some tests are from the TPTP libary, some are our typical use cases. The + indicates a failure to prove the theorem in a given amount of time (400 or 600 seconds) 1.-b Tests in Coq ================= Open the Coq file, with a suitable instance of either Coqide or Proofgeneral, and run interactively the examples. Set the path to the correct location of the Benchmark module. 2. Results ========== +------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ |Prover9 | Axiomatisation from Bench | Directory Bench2 | | ad-hoc | | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | | Coq |Prover9 | Prover9 | Vampire10 | Spass | Vam10 3 | P9 3 | Spass 3 | | | | ad-hoc | | | | | | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE001 | algebra | | 0.02 | 0 | 0.05 | 0 | 0.02 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE002 | algebra | | 0.02 | 0 | 0.24 | 0 | 0.02 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE036 | 0.01 | | 171.63 | 0 | + | 0 | 0.02 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE037 | 0 | 0.02 | 0.06 | 0 | 0.07 | 0 | 0.02 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE038 | 0.004 | 0.16 | 0.11 | 9.8 | 11.71 | 0.1 | 0.04 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE040 | 0 | 0.13 | 20.03/0.08 | 0.4 / 0.4 | 5.25 / 1.56 | 0 | 0.05 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE041 | star_incr | | 12.64 | 9 | 103.94 | 8.1 | 0.30 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE043 | 0.004 | 0.02 | + / 14.26 | + | + | 0 | 0.43 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE044 | 0 | 27.92 | 20.52 / 10.66 | 9.9 / 9.6 | + | 2.2 | 6.46 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE045 |semi_comm_iter| | 25.57 | + | + | | 12.99 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ |KLE046 / Church-Rosser| | 2.38 | + | + | + | | + | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | KLE047 / BubbleSort | | 373.48 | + | + | + | | + | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | Hindley-Rosen | | + | + | + | + | | + | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | Staridem | 0 | 0.16 | 20.05 | 9.8 | 02.70 | 1.7 | 0.04 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | T1 / Stardistr | 0.044 | + / 1.84 | + / + | + | + | + | + | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ |T2 / Sliding / KLE042 | 0.044 | + / 24.90 | + | + | + | + | + | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | T3 | 0.008 | 29.88 | 20.15 / 10.38 | 9.9 / 9.6 | + | 2 | 4.65 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | T4 | 0.008 | 0.14 | 4.74 | 1.2 | 06.37 | 0.1 | 0.10 | 5.34 | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | T5 | 0.008 | 0.58 | 43.28 | 89.3 | + | 74.6 | 4.44 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | T6 | 0.012 | 341.96 | + | + | + | | | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | T7 | 0.008 | 64.63 | + | 91.2 | + | 49.6 | 5.50 | | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ | T8 | 0.008 | 0.38 | 0.12 | 0.6 | 12.11 | 0.1 | 0.04 | 1.56 | +----------------------+--------------+------------+---------------+-------------+-----------------+-----------+--------------+-----------------+ KLE001: x<=y -> z+x <= z+x (the typo comes from TPTP) KLE002: x<=y -> z*x <= z*y KLE036: a# <= 1+a*a# KLE037: 1 <= a# KLE038: a <= a# KLE039: a ## = a# KLE040: a # * a# = a# KLE041: x <= y -> x# <= y# : star_incr KLE043: x# * y = y + x*x#*y KLE044: (1+x)# = x# KLE045: x *z <= z * y -> x# * z <= z * y # KLE046: ChurchRosser KLE047: Bubble Sort T1: (x+y)# = x# * (y* x#)# T2: x * (y*x)# = (x * y)# * x T3: x# = (x+1)# T4: (1+x*x#*y#+x#*y#*y#) <= x# * y# T5: y#*x# <= (x+y)# T6: y*y#*x# <= (x+y)*(x+y)# T7: (x+y)# = (x# + y#)# T8: x+x#*(y+1) <= x#*y+x# 4. Paper. ========= In the paper, we have shown the results of the best ATPs only. Therefore, we omitted the data for SPASS, and chose the best column for Vampire and Prover9.