我知道在Java中进行JNI调用时“跨越边界”是很慢的。

但是我想知道是什么让它变慢的? 底层jvm实现在执行JNI调用时做了什么,导致调用如此缓慢?


当前回答

When talking about JNI, there are two directions: java calling C++, and C++ calling java. Java calling C++ (or C) via the "native" keyword is very fast, around 50 clock cycles. However, C++ calling Java is somewhat slow. We do a great deal of Java/C++ integration, and my rule of thumb is 1000 clock cycles per call, so you can get around 2M calls/second. I cannot answer your actual question of "why is it slow", but I'll hazard a guess that a lot of work has to be done to transfer arguments from the native C++ stack using varargs, onto the Java stack, validate whatever conformance is needed, and vice-versa on the return value.

However, also remember that once you make a call into a Java method from C++, if that method returns a complex data structure, you'll need to make JNI calls for all accesses into the result, as well. The same applies for converting complex C++ structure to Java. We've found in practice for example that it is much faster to serialize a C++ std::map<string,string> to JSON, hand the string across JNI, and have Java deserialize it into a Map<String,String>, assuming you want the entire map converted to Java.

其他回答

When talking about JNI, there are two directions: java calling C++, and C++ calling java. Java calling C++ (or C) via the "native" keyword is very fast, around 50 clock cycles. However, C++ calling Java is somewhat slow. We do a great deal of Java/C++ integration, and my rule of thumb is 1000 clock cycles per call, so you can get around 2M calls/second. I cannot answer your actual question of "why is it slow", but I'll hazard a guess that a lot of work has to be done to transfer arguments from the native C++ stack using varargs, onto the Java stack, validate whatever conformance is needed, and vice-versa on the return value.

However, also remember that once you make a call into a Java method from C++, if that method returns a complex data structure, you'll need to make JNI calls for all accesses into the result, as well. The same applies for converting complex C++ structure to Java. We've found in practice for example that it is much faster to serialize a C++ std::map<string,string> to JSON, hand the string across JNI, and have Java deserialize it into a Map<String,String>, assuming you want the entire map converted to Java.

首先,值得注意的是,我们所说的“慢”是指可能需要几十纳秒的事情。对于普通的本地方法,2010年我在Windows桌面上测量的调用平均为40纳秒,在Mac桌面上测量的调用平均为11纳秒。除非你打很多电话,否则你是不会注意到的。

也就是说,调用本机方法可能比进行普通Java方法调用要慢。原因包括:

Native methods will not be inlined by the JVM. Nor will they be just-in-time compiled for this specific machine -- they're already compiled. A Java array may be copied for access in native code, and later copied back. The cost can be linear in the size of the array. I measured JNI copying of a 100,000 array to average about 75 microseconds on my Windows desktop, and 82 microseconds on Mac. Fortunately, direct access may be obtained via GetPrimitiveArrayCritical or NewDirectByteBuffer. If the method is passed an object, or needs to make a callback, then the native method will likely be making its own calls to the JVM. Accessing Java fields, methods and types from the native code requires something similar to reflection. Signatures are specified in strings and queried from the JVM. This is both slow and error-prone. Java Strings are objects, have length and are encoded. Accessing or creating a string may require an O(n) copy.

在Steve Wilson和Jeff Kesselman所著的2000年的“Java(tm)平台性能:策略和战术”章节“9.2:检查JNI成本”中可以找到一些可能过时的讨论。大约在这一页的三分之一处,由@Philip在下面的评论中提供。

2009年IBM developerWorks论文“使用Java本机接口的最佳实践”为避免JNI的性能缺陷提供了一些建议。

值得一提的是,并非所有标记为native的Java方法都是“慢”的。其中一些是内在的,使他们非常快。要检查哪些是固有的,哪些不是,您可以在vmSymbols.hpp中查找do_intrinsic。

基本上,JVM解释性地为每个JNI调用构造C参数,代码没有经过优化。

本文还概述了更多的细节

如果你对JNI和本机代码的基准测试感兴趣,这个项目有运行基准测试的代码。