在c++中数组有最大长度吗?
这是c++的限制还是取决于我的机器?它是否可以调整?它取决于数组的类型吗?
我能以某种方式打破这个限制吗?还是我必须寻找更好的存储信息的方式?最简单的方法是什么?
我要做的是在一个数组中存储long long int,我在Linux环境中工作。我的问题是:如果我需要存储一个包含N个长整数和N个>个10位数字的数组,我该怎么做?
我需要这个,因为我正在为学校写一些加密算法(例如p-Pollard),并遇到了整数和数组长度表示的这堵墙。
在c++中数组有最大长度吗?
这是c++的限制还是取决于我的机器?它是否可以调整?它取决于数组的类型吗?
我能以某种方式打破这个限制吗?还是我必须寻找更好的存储信息的方式?最简单的方法是什么?
我要做的是在一个数组中存储long long int,我在Linux环境中工作。我的问题是:如果我需要存储一个包含N个长整数和N个>个10位数字的数组,我该怎么做?
我需要这个,因为我正在为学校写一些加密算法(例如p-Pollard),并遇到了整数和数组长度表示的这堵墙。
当前回答
我很惊讶std::vector的max_size()成员函数在这里没有提到。
返回由于系统或库实现限制,容器能够容纳的最大元素数,即对于最大的容器std::distance(begin(), end())。
我们知道std::vector在底层是作为一个动态数组实现的,因此max_size()应该给出与您机器上动态数组的最大长度非常接近的值。
下面的程序为各种数据类型构建一个近似最大数组长度的表。
#include <iostream>
#include <vector>
#include <string>
#include <limits>
template <typename T>
std::string mx(T e) {
std::vector<T> v;
return std::to_string(v.max_size());
}
std::size_t maxColWidth(std::vector<std::string> v) {
std::size_t maxWidth = 0;
for (const auto &s: v)
if (s.length() > maxWidth)
maxWidth = s.length();
// Add 2 for space on each side
return maxWidth + 2;
}
constexpr long double maxStdSize_t = std::numeric_limits<std::size_t>::max();
// cs stands for compared to std::size_t
template <typename T>
std::string cs(T e) {
std::vector<T> v;
long double maxSize = v.max_size();
long double quotient = maxStdSize_t / maxSize;
return std::to_string(quotient);
}
int main() {
bool v0 = 0;
char v1 = 0;
int8_t v2 = 0;
int16_t v3 = 0;
int32_t v4 = 0;
int64_t v5 = 0;
uint8_t v6 = 0;
uint16_t v7 = 0;
uint32_t v8 = 0;
uint64_t v9 = 0;
std::size_t v10 = 0;
double v11 = 0;
long double v12 = 0;
std::vector<std::string> types = {"data types", "bool", "char", "int8_t", "int16_t",
"int32_t", "int64_t", "uint8_t", "uint16_t",
"uint32_t", "uint64_t", "size_t", "double",
"long double"};
std::vector<std::string> sizes = {"approx max array length", mx(v0), mx(v1), mx(v2),
mx(v3), mx(v4), mx(v5), mx(v6), mx(v7), mx(v8),
mx(v9), mx(v10), mx(v11), mx(v12)};
std::vector<std::string> quotients = {"max std::size_t / max array size", cs(v0),
cs(v1), cs(v2), cs(v3), cs(v4), cs(v5), cs(v6),
cs(v7), cs(v8), cs(v9), cs(v10), cs(v11), cs(v12)};
std::size_t max1 = maxColWidth(types);
std::size_t max2 = maxColWidth(sizes);
std::size_t max3 = maxColWidth(quotients);
for (std::size_t i = 0; i < types.size(); ++i) {
while (types[i].length() < (max1 - 1)) {
types[i] = " " + types[i];
}
types[i] += " ";
for (int j = 0; sizes[i].length() < max2; ++j)
sizes[i] = (j % 2 == 0) ? " " + sizes[i] : sizes[i] + " ";
for (int j = 0; quotients[i].length() < max3; ++j)
quotients[i] = (j % 2 == 0) ? " " + quotients[i] : quotients[i] + " ";
std::cout << "|" << types[i] << "|" << sizes[i] << "|" << quotients[i] << "|\n";
}
std::cout << std::endl;
std::cout << "N.B. max std::size_t is: " <<
std::numeric_limits<std::size_t>::max() << std::endl;
return 0;
}
在我的macOS (clang版本5.0.1)上,我得到了以下结果:
| data types | approx max array length | max std::size_t / max array size |
| bool | 9223372036854775807 | 2.000000 |
| char | 9223372036854775807 | 2.000000 |
| int8_t | 9223372036854775807 | 2.000000 |
| int16_t | 9223372036854775807 | 2.000000 |
| int32_t | 4611686018427387903 | 4.000000 |
| int64_t | 2305843009213693951 | 8.000000 |
| uint8_t | 9223372036854775807 | 2.000000 |
| uint16_t | 9223372036854775807 | 2.000000 |
| uint32_t | 4611686018427387903 | 4.000000 |
| uint64_t | 2305843009213693951 | 8.000000 |
| size_t | 2305843009213693951 | 8.000000 |
| double | 2305843009213693951 | 8.000000 |
| long double | 1152921504606846975 | 16.000000 |
N.B. max std::size_t is: 18446744073709551615
在ideone gcc 8.3我得到:
| data types | approx max array length | max std::size_t / max array size |
| bool | 9223372036854775744 | 2.000000 |
| char | 18446744073709551615 | 1.000000 |
| int8_t | 18446744073709551615 | 1.000000 |
| int16_t | 9223372036854775807 | 2.000000 |
| int32_t | 4611686018427387903 | 4.000000 |
| int64_t | 2305843009213693951 | 8.000000 |
| uint8_t | 18446744073709551615 | 1.000000 |
| uint16_t | 9223372036854775807 | 2.000000 |
| uint32_t | 4611686018427387903 | 4.000000 |
| uint64_t | 2305843009213693951 | 8.000000 |
| size_t | 2305843009213693951 | 8.000000 |
| double | 2305843009213693951 | 8.000000 |
| long double | 1152921504606846975 | 16.000000 |
N.B. max std::size_t is: 18446744073709551615
需要注意的是,这是一个理论上的限制,在大多数计算机上,您将在达到这个限制之前耗尽内存。例如,对于gcc上的char类型,最大元素数等于std::size_t的最大值。尝试这个,我们得到错误:
prog.cpp: In function ‘int main()’:
prog.cpp:5:61: error: size of array is too large
char* a1 = new char[std::numeric_limits<std::size_t>::max()];
最后,正如@MartinYork指出的,对于静态数组,最大大小受限于堆栈的大小。
其他回答
我很惊讶std::vector的max_size()成员函数在这里没有提到。
返回由于系统或库实现限制,容器能够容纳的最大元素数,即对于最大的容器std::distance(begin(), end())。
我们知道std::vector在底层是作为一个动态数组实现的,因此max_size()应该给出与您机器上动态数组的最大长度非常接近的值。
下面的程序为各种数据类型构建一个近似最大数组长度的表。
#include <iostream>
#include <vector>
#include <string>
#include <limits>
template <typename T>
std::string mx(T e) {
std::vector<T> v;
return std::to_string(v.max_size());
}
std::size_t maxColWidth(std::vector<std::string> v) {
std::size_t maxWidth = 0;
for (const auto &s: v)
if (s.length() > maxWidth)
maxWidth = s.length();
// Add 2 for space on each side
return maxWidth + 2;
}
constexpr long double maxStdSize_t = std::numeric_limits<std::size_t>::max();
// cs stands for compared to std::size_t
template <typename T>
std::string cs(T e) {
std::vector<T> v;
long double maxSize = v.max_size();
long double quotient = maxStdSize_t / maxSize;
return std::to_string(quotient);
}
int main() {
bool v0 = 0;
char v1 = 0;
int8_t v2 = 0;
int16_t v3 = 0;
int32_t v4 = 0;
int64_t v5 = 0;
uint8_t v6 = 0;
uint16_t v7 = 0;
uint32_t v8 = 0;
uint64_t v9 = 0;
std::size_t v10 = 0;
double v11 = 0;
long double v12 = 0;
std::vector<std::string> types = {"data types", "bool", "char", "int8_t", "int16_t",
"int32_t", "int64_t", "uint8_t", "uint16_t",
"uint32_t", "uint64_t", "size_t", "double",
"long double"};
std::vector<std::string> sizes = {"approx max array length", mx(v0), mx(v1), mx(v2),
mx(v3), mx(v4), mx(v5), mx(v6), mx(v7), mx(v8),
mx(v9), mx(v10), mx(v11), mx(v12)};
std::vector<std::string> quotients = {"max std::size_t / max array size", cs(v0),
cs(v1), cs(v2), cs(v3), cs(v4), cs(v5), cs(v6),
cs(v7), cs(v8), cs(v9), cs(v10), cs(v11), cs(v12)};
std::size_t max1 = maxColWidth(types);
std::size_t max2 = maxColWidth(sizes);
std::size_t max3 = maxColWidth(quotients);
for (std::size_t i = 0; i < types.size(); ++i) {
while (types[i].length() < (max1 - 1)) {
types[i] = " " + types[i];
}
types[i] += " ";
for (int j = 0; sizes[i].length() < max2; ++j)
sizes[i] = (j % 2 == 0) ? " " + sizes[i] : sizes[i] + " ";
for (int j = 0; quotients[i].length() < max3; ++j)
quotients[i] = (j % 2 == 0) ? " " + quotients[i] : quotients[i] + " ";
std::cout << "|" << types[i] << "|" << sizes[i] << "|" << quotients[i] << "|\n";
}
std::cout << std::endl;
std::cout << "N.B. max std::size_t is: " <<
std::numeric_limits<std::size_t>::max() << std::endl;
return 0;
}
在我的macOS (clang版本5.0.1)上,我得到了以下结果:
| data types | approx max array length | max std::size_t / max array size |
| bool | 9223372036854775807 | 2.000000 |
| char | 9223372036854775807 | 2.000000 |
| int8_t | 9223372036854775807 | 2.000000 |
| int16_t | 9223372036854775807 | 2.000000 |
| int32_t | 4611686018427387903 | 4.000000 |
| int64_t | 2305843009213693951 | 8.000000 |
| uint8_t | 9223372036854775807 | 2.000000 |
| uint16_t | 9223372036854775807 | 2.000000 |
| uint32_t | 4611686018427387903 | 4.000000 |
| uint64_t | 2305843009213693951 | 8.000000 |
| size_t | 2305843009213693951 | 8.000000 |
| double | 2305843009213693951 | 8.000000 |
| long double | 1152921504606846975 | 16.000000 |
N.B. max std::size_t is: 18446744073709551615
在ideone gcc 8.3我得到:
| data types | approx max array length | max std::size_t / max array size |
| bool | 9223372036854775744 | 2.000000 |
| char | 18446744073709551615 | 1.000000 |
| int8_t | 18446744073709551615 | 1.000000 |
| int16_t | 9223372036854775807 | 2.000000 |
| int32_t | 4611686018427387903 | 4.000000 |
| int64_t | 2305843009213693951 | 8.000000 |
| uint8_t | 18446744073709551615 | 1.000000 |
| uint16_t | 9223372036854775807 | 2.000000 |
| uint32_t | 4611686018427387903 | 4.000000 |
| uint64_t | 2305843009213693951 | 8.000000 |
| size_t | 2305843009213693951 | 8.000000 |
| double | 2305843009213693951 | 8.000000 |
| long double | 1152921504606846975 | 16.000000 |
N.B. max std::size_t is: 18446744073709551615
需要注意的是,这是一个理论上的限制,在大多数计算机上,您将在达到这个限制之前耗尽内存。例如,对于gcc上的char类型,最大元素数等于std::size_t的最大值。尝试这个,我们得到错误:
prog.cpp: In function ‘int main()’:
prog.cpp:5:61: error: size of array is too large
char* a1 = new char[std::numeric_limits<std::size_t>::max()];
最后,正如@MartinYork指出的,对于静态数组,最大大小受限于堆栈的大小。
To summarize the responses, extend them, and to answer your question directly: No, C++ does not impose any limits for the dimensions of an array. But as the array has to be stored somewhere in memory, so memory-related limits imposed by other parts of the computer system apply. Note that these limits do not directly relate to the dimensions (=number of elements) of the array, but rather to its size (=amount of memory taken). Dimensions (D) and in-memory size (S) of an array is not the same, as they are related by memory taken by a single element (E): S=D * E. Now E depends on:
数组元素的类型(元素可以更小也可以更大) 内存对齐(为了提高性能,元素被放置在某个值的倍数的地址上,这会引入 元素之间的“浪费空间”(填充) 对象静态部分的大小(在面向对象编程中,相同类型对象的静态组件只存储一次,与此类相同类型对象的数量无关)
Also note that you generally get different memory-related limitations by allocating the array data on stack (as an automatic variable: int t[N]), or on heap (dynamic alocation with malloc()/new or using STL mechanisms), or in the static part of process memory (as a static variable: static int t[N]). Even when allocating on heap, you still need some tiny amount of memory on stack to store references to the heap-allocated blocks of memory (but this is negligible, usually). The size of size_t type has no influence on the programmer (I assume programmer uses size_t type for indexing, as it is designed for it), as compiler provider has to typedef it to an integer type big enough to address maximal amount of memory possible for the given platform architecture. The sources of the memory-size limitations stem from
进程可用的内存数量(对于32位应用程序,即使在64位操作系统内核上,也仅限于2^32字节), 进程内存的划分(例如,为堆栈或堆设计的进程内存的数量), 物理内存的碎片化(许多分散的小的空闲内存片段不适用于存储一个整体结构), 物理内存的数量, 以及虚拟内存的数量。
They can not be ‘tweaked’ at the application level, but you are free to use a different compiler (to change stack size limits), or port your application to 64-bits, or port it to another OS, or change the physical/virtual memory configuration of the (virtual? physical?) machine. It is not uncommon (and even advisable) to treat all the above factors as external disturbances and thus as possible sources of runtime errors, and to carefully check&react to memory-allocation related errors in your program code. So finally: while C++ does not impose any limits, you still have to check for adverse memory-related conditions when running your code... :-)
有一件事我认为在之前的回答中没有提到。
当人们在设计中使用这些东西时,我总是感觉到重构的“臭味”。
这是一个巨大的数组,从效率和性能的角度来看,这可能不是表示数据的最佳方式。
欢呼,
Rob
从实际而非理论的角度来看,在32位Windows系统上,单个进程可用的最大内存总量是2 GB。您可以通过使用具有更多物理内存的64位操作系统来打破这个限制,但是是这样做还是寻找替代方案在很大程度上取决于您的预期用户和他们的预算。您还可以使用PAE对其进行某种程度的扩展。
数组的类型非常重要,因为许多编译器上的默认结构对齐是8字节,如果内存使用有问题,这是非常浪费的。如果你使用Visual c++瞄准Windows,可以使用#pragma pack指令来克服这个问题。
另一件要做的事情是看看哪些内存压缩技术可以帮助你,比如稀疏矩阵,动态压缩等等……这也是高度依赖于应用程序的。如果你编辑你的文章来提供更多关于数组中实际内容的信息,你可能会得到更有用的答案。
Edit: Given a bit more information on your exact requirements, your storage needs appear to be between 7.6 GB and 76 GB uncompressed, which would require a rather expensive 64 bit box to store as an array in memory in C++. It raises the question why do you want to store the data in memory, where one presumes for speed of access, and to allow random access. The best way to store this data outside of an array is pretty much based on how you want to access it. If you need to access array members randomly, for most applications there tend to be ways of grouping clumps of data that tend to get accessed at the same time. For example, in large GIS and spatial databases, data often gets tiled by geographic area. In C++ programming terms you can override the [] array operator to fetch portions of your data from external storage as required.
有两个限制,都不是由c++强制执行的,而是由硬件强制执行的。
第一个限制(不应达到)由用于描述数组中索引的size类型的限制(及其大小)设置。它由系统std::size_t可以接受的最大值给出。此数据类型足够大,可以包含任何对象的字节大小
另一个限制是物理内存限制。数组中的对象越大,这个限制就越快达到,因为内存已经满了。例如,给定大小为n的vector<int>通常占用的内存是vector<char>(减去一个小常量值)类型数组的数倍,因为int通常比char大。因此,在内存满之前,<char>的向量可能包含比<int>的向量更多的项。对于原始c风格数组,如int[]和char[],也是如此。
此外,这个上限可能受到用于构造vector的分配器类型的影响,因为分配器可以自由地以任何它想要的方式管理内存。一个非常奇怪但仍然可以想象的分配器可以以这样一种方式池内存,即对象的相同实例共享资源。通过这种方式,您可以将许多相同的对象插入到容器中,否则将耗尽所有可用内存。
除此之外,c++没有强制任何限制。