Function calls in Python are expensive

Method Call Performance Analysis: A Comparison of C ++, C #, Java, JavaScript, PHP, and Python

In itself, Python should be reasonably efficient for an interpreted language. But one reads again and again that calling Python functions and methods is supposed to be quite expensive. But how expensive exactly? Well, let's find out!

You can only compare if you have equivalent programs in different languages. And these must also be correctly designed and precisely measure time.

A number of such programs in different programming languages ​​are listed below. Each program defines either a class or a function, which is then called repeatedly in a loop.

To measure the time, largely equivalent timing routines were used in all programs. To do this, the time is determined directly before the start of the loop and then directly after the loop has been executed. Within the loop we execute an empty function or method call in order to be able to estimate exactly this overhead.

In almost every programming language, the time can be measured in microseconds or nanoseconds, even in PHP. Only in Java was the standard function used here, which provides the time in milliseconds. However, this is not a problem, because all loops are chosen to run between 5 and 15 seconds on an average computer system. The accuracy is therefore sufficiently high for a reasonably useful assessment, even if only milliseconds are available.

But: how fast is actually fast? In order to be able to estimate that halfway, we need a benchmark. C. would be very suitable for this. It should be noted, however, that a C compiler could possibly opt out of an empty function call. Therefore, the measurement here was largely unoptimized C ++ used as a comparison. We can thus get a rough impression of how well such a call performs in comparison with one or the other programming language.

Here are the programs in the respective programming languages ​​with which I measured the respective performance:

1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334353637383940414243
1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031
1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536
1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334

But how do you log the data? How to evaluate the data? The easiest way is for each program to save its own performance results as JSON. Which means that we have to write JSON data for each programming language. A comparison shows a surprising fact: Even if calls, time measurement and output are still relatively similar, JSON generation is completely different. The JSON generation can hardly be much more different.

After all JSON data has been written, another tool can then evaluate the information and combine it into diagrams. Simple bar charts are sufficient here.

Comparative evaluation

The evaluation shows the following performance:

Graphic 1: Time required in nanoseconds per call

What can be seen in this evaluation?

  • Calls in C # and unoptimized C ++ are similarly efficient.
  • The time for a call in the programming language Java is remarkably short. It is well below the time in C ++.
  • The calls in JavaScript are also remarkably fast. After all, twice as efficient as unoptimized machine code.
  • Calls in PHP are expected to be more expensive.
  • Method calls in python - the bar on the far right - are extremely complex.

In order to better illustrate this, I have chosen a different representation of the performance data in the following graphic:

Diagram 2: Number of calls in millions per second

This representation shows the number of measured calls per second. In millions. This is based on a standard quad-core system with 3.3 GHz.

However, the exact values ​​are irrelevant. These can vary from CPU to CPU, from implementation to implementation, and from system to system. The only decisive factor is the size. But here you can see - as a reference value, so to speak - that in C ++ around 500 million method calls are possible on such a system.

But what do 500 million views per second mean? 500 million calls correspond to a frequency of 500 MHz, i.e. 0.5 GHz. This means: With unoptimized machine code, a call requires a maximum of about 6 to 7 clock cycles. Probably a little less, because the time measurements here do not record possible time for program interruptions due to the multitasking nature of the operating system.

Outstanding about it is Java with almost 1800 million calls per second. This brings us to just under 2 clock cycles per call. This value is extremely good. It is achieved by excellent optimization of the machine code by the Java JIT compiler, perhaps even by inlining the empty method call. (And for the measurement OpenJDK was used here, possibly even better values ​​can be achieved with other JDKs.)

JavaScript With around 950 million calls, it is better than unoptimized machine code and apparently also has an excellent JIT compiler.

Are beaten in the measurement here PHP and python: Both are interpreted and only create a significantly lower performance. Especially python: Only about 15 million calls per second are possible here. But at least: 15 million views are a not inconsiderable number.

These performance differences are shown even better in the following diagram:

Figure 3: Number of calls compared to Python

Basically this is the same graph, but this time the number of views is related to python specified. Here you can see: Python is around 35 times more inefficient than unoptimized machine code. And you can also see that JIT compilers do amazing things. If Pyton had a JIT compiler, the values ​​would certainly be considerably better.

The poor performance of Python

But why is python so beaten compared to PHP? One of the main reasons for this is pythons flexibility. In python method identifiers are managed on objects in very different ways - using a dictionary but also in so-called slots. The call of a method itself runs at least potentially through various other methods that can be used to intervene in the processing. This opens up interesting possibilities in the development of pythonPrograms, but in all likelihood it is precisely in these things that the causes of this poor performance can be found.

However, that hardly matters: If a python-Program within a program that has to make 15 million calls per second is usually something fundamentally wrong. python is not there to perform highly efficient calculations in Python itself. python takes on more controlling tasks. In the area of ​​machine learning as well as in computing with, for example, it serves precisely this purpose: While the actual, computationally intensive execution takes place in optimized machine code, only the (controlling) logic of the program itself is written in Python. 15 million calls per second are sufficiently fast for this: Are python- Programs written sensibly using such machine code components do not even require a fraction of these 15 million.

Something else must not be forgotten either: The performance of the programming languages ​​is not compared here. The only comparison here is how expensive method calls (or function calls) are. For a meaningful overall assessment, a completely different series of additional performance tests would have to be carried out.

Perhaps I will realize that one day, but for my specific objective described here - to understand the price of method calls - this is sufficient here for the time being.

In this sense: Happy Coding!