With this ends the seventh week of the official coding period. During the end of 6th week and the beginning of 7th week, I was mostly travelling, so I was not able to write a blog for the sixth week. Instead, I will try to summarize my work during the last two weeks here.
For the last few weeks, I have been focused on optimizing the code of new assumptions to enhance its performance. Most of my work has been exploratory, as Aaron says 😅. Indeed I have dry-run, backtracked, profiled, and ran the same code with a debugger too many times to understand the slow parts and the improvements I can make here and there. Mostly the code is optimized given the class structure of SymPy. But it is also the class structure that is adding up to the performance issues. Already noted in my last blog, classes like And
and Or
sorts their args, hence take a great amount of time. But other SymPy class constructors also take significant time.
With the success of fifth week’s attempt, I have been desperate to bring down the execution time 😅. Some of the attempts I have made, which are included in #17144, are:
- I have modified CNF class, which essentially is a low-level implementation for the cnf of any boolean expression. CNF object holds a set of clauses. These clauses are themselves
frozenset
of Literal objects. Literal class is being implemented just to reduce the unnecessary creation ofNot
objects (It takes significant execution time and is called many times). - I have also modified the code of sympify(), it appeared to take more time than expected when the argument is a SymPy object already. Consider this, almost one-third execution time.
1
2
3
4
5
6
7
In [2]: x = Symbol('x')
In [3]: timeit sympify(x) # before change
601 ns ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [3]: timeit sympify(x) # after change
239 ns ± 11.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
- Finally, I rewrote
to_cnf()
for CNF objects. By using mostly Python’s built-ins and removing any SymPy object construction during its execution. The performance gain is quite subtle 😎.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
In [1]: from sympy import *
In [2]: from sympy.abc import a,b,c,x,y,z
In [3]: k = (x & y & z) | ( a & b & c)
In [4]: from sympy.logic.boolalg import CNF
# Before
In [5]: timeit CNF.from_prop(k) # It is using to_cnf()
1.41 ms ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# after
In [5]: timeit CNF.from_prop(k) # It is using the new to_CNF()
31.5 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
There is definitely a limit to performance we can get with Python. But implementing most of the things in Python built-ins we can definitely make things much faster.
For the upcoming week, I will try to complete the following:
- The CNF objects are still not simplified. I have to implement simplification to reduce the number of clauses. These have to be fed into
satisfiable
at the end, which can benefit from lesser number of clauses. rcall
used with sathandlers is another major portion having high execution time. I will try to work it out.- Clean up the code and make it ready for reviewing.
Apart from that, I will also try to shift my focus towards the other part of my project and write some theory solvers. After all, I need to enhance the assumptions not just make it faster 😎.
Peace