在 Python 中创建具有初始容量的列表[重复]-IT科技

摘要：问题描述：类似这样的代码经常会出现：l = [] while foo: # baz l.append(bar) # qux 如果您要将数千个元素添加到列表中，这会非常慢，因为必须不断调整列表大小以适应新元素。在 Java 中，您可以创建一个具有初始容量的 ArrayList。如果您知道...

问题描述：

类似这样的代码经常会出现：

l = []
while foo:
    # baz
    l.append(bar)
    # qux

如果您要将数千个元素添加到列表中，这会非常慢，因为必须不断调整列表大小以适应新元素。

在 Java 中，您可以创建一个具有初始容量的 ArrayList。如果您知道列表的大小，这将更加高效。

我知道这样的代码通常可以重构为列表推导式。但是，如果for / while循环非常复杂，则不可行。对于我们 Python 程序员来说，有类似的方法吗？

解决方案 1：

警告：此答案有争议。请参阅评论。

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

结果。（对每个函数进行 144 次评估并计算平均持续时间）

simple append 0.0102
pre-allocate  0.0098

结论。这几乎不重要。

过早的优化是一切罪恶的根源。

解决方案 2：

Python 列表没有内置预分配。如果你确实需要创建一个列表，并且需要避免追加的开销（你应该验证你是否这样做），你可以这样做：

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

也许你可以使用生成器来避免该列表：

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

这样，列表就不会全部存储在内存中，而只是根据需要生成。

解决方案 3：

简短版本：使用

pre_allocated_list = [None] * size

预分配列表（即能够处理列表中的“size”个元素，而不是通过附加逐渐形成列表）。此操作非常快，即使在大型列表中也是如此。分配稍后将分配给列表元素的新对象将花费更长的时间，并且将成为程序性能方面的瓶颈。

长版本：

我认为应该考虑初始化时间。

由于在 Python 中一切都是引用，因此将每个元素设置为None还是某个字符串并不重要 - 无论哪种方式它都只是一个引用。不过，如果您想为每个元素创建一个新的对象来引用，则需要更长的时间。

对于 Python 3.2：

import time
import copy

def print_timing (func):
  def wrapper (*arg):
    t1 = time.time()
    res = func (*arg)
    t2 = time.time ()
    print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
    return res

  return wrapper

@print_timing
def prealloc_array (size, init = None, cp = True, cpmethod = copy.deepcopy, cpargs = (), use_num = False):
  result = [None] * size
  if init is not None:
    if cp:
      for i in range (size):
          result[i] = init
    else:
      if use_num:
        for i in range (size):
            result[i] = cpmethod (i)
      else:
        for i in range (size):
            result[i] = cpmethod (cpargs)
  return result

@print_timing
def prealloc_array_by_appending (size):
  result = []
  for i in range (size):
    result.append (None)
  return result

@print_timing
def prealloc_array_by_extending (size):
  result = []
  none_list = [None]
  for i in range (size):
    result.extend (none_list)
  return result

def main ():
  n = 1000000
  x = prealloc_array_by_appending(n)
  y = prealloc_array_by_extending(n)
  a = prealloc_array(n, None)
  b = prealloc_array(n, "content", True)
  c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
  d = prealloc_array(n, "content", False, "some object {}".format, None, True)
  e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
  f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
  g = prealloc_array(n, "content", False, copy.deepcopy, [], False)

  print ("x[5] = {}".format (x[5]))
  print ("y[5] = {}".format (y[5]))
  print ("a[5] = {}".format (a[5]))
  print ("b[5] = {}".format (b[5]))
  print ("c[5] = {}".format (c[5]))
  print ("d[5] = {}".format (d[5]))
  print ("e[5] = {}".format (e[5]))
  print ("f[5] = {}".format (f[5]))
  print ("g[5] = {}".format (g[5]))

if __name__ == '__main__':
  main()

评估：

prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()

正如您所见，仅创建对同一个None对象的大量引用列表只需很少的时间。

添加或扩展需要更长的时间（我没有计算平均值，但运行几次之后我可以告诉你扩展和添加大约需要相同的时间）。

为每个元素分配新对象 - 这是最耗时的事情。而S.Lott 的答案就是这样做的 - 每次都格式化一个新字符串。这并不是严格要求的 - 如果您想预先分配一些空间，只需创建一个 None 列表，然后随意将数据分配给列表元素。无论哪种方式，生成数据所需的时间都比附加/扩展列表所需的时间要多，无论您是在创建列表时还是之后生成数据。但如果您想要一个稀疏填充的列表，那么从None列表开始肯定更快。

解决方案 4：

对此的 Pythonic 方法是：

x = [None] * numElements

或者您希望预先填充的任何默认值，例如

bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche

（Caveat Emptor：该[Beer()] * 99语法创建一个 Beer，然后用 99 个对同一个实例的引用填充一个数组）

Python 的默认方法非常高效，但随着元素数量的增加，效率会下降。

比较

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # Millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    result = []
    i = 0
    while i < Elements:
        result.append(i)
        i += 1

def doAllocate():
    result = [None] * Elements
    i = 0
    while i < Elements:
        result[i] = i
        i += 1

def doGenerator():
    return list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        x = 0
        while x < Iterations:
            fn()
            x += 1


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

和

#include <vector>
typedef std::vector<unsigned int> Vec;

static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;

void doAppend()
{
    Vec v;
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doReserve()
{
    Vec v;
    v.reserve(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doAllocate()
{
    Vec v;
    v.resize(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v[i] = i;
    }
}

#include <iostream>
#include <chrono>
using namespace std;

void test(const char* name, void(*fn)(void))
{
    cout << name << ": ";

    auto start = chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < Iterations; ++i) {
        fn();
    }
    auto end = chrono::high_resolution_clock::now();

    auto elapsed = end - start;
    cout << chrono::duration<double, milli>(elapsed).count() << "ms
";
}

int main()
{
    cout << "Elements: " << Elements << ", Iterations: " << Iterations << '
';

    test("doAppend", doAppend);
    test("doReserve", doReserve);
    test("doAllocate", doAllocate);
}

在我的 Windows 7 Core i7上，64 位 Python 给出

Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms

而 C++ 则提供了（使用Microsoft Visual C++构建，64 位，启用优化）

Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms

C++ 调试构建产生：

Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms

这里的重点是，使用 Python 您可以实现 7-8% 的性能提升，如果您认为自己正在编写高性能应用程序（或者正在编写用于 Web 服务等的东西），那么这并不是什么值得嗤之以鼻的事情，但您可能需要重新考虑您的语言选择。

另外，这里的 Python 代码并不是真正的 Python 代码。切换到真正的 Pythonesque 代码可以获得更好的性能：

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    for x in range(Iterations):
        result = []
        for i in range(Elements):
            result.append(i)

def doAllocate():
    for x in range(Iterations):
        result = [None] * Elements
        for i in range(Elements):
            result[i] = i

def doGenerator():
    for x in range(Iterations):
        result = list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        fn()


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

这给出了

Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms

（在 32 位中，doGenerator 比 doAllocate 更好）。

这里doAppend和doAllocate之间的差距明显较大。

显然，这里的差异实际上只适用于以下情况：您执行此操作多次，或者在负载很重的系统上执行此操作，其中这些数字将按数量级扩大，或者您正在处理相当大的列表。

这里的要点是：采用 Pythonic 方式执行可获得最佳性能。

但如果你担心的是一般的、高级的性能，那么 Python 就不合适了。最根本的问题是，由于 Python 的一些特性（如装饰器等），Python 函数调用速度通常比其他语言慢 300 倍。（PythonSpeed/PerformanceTips、数据聚合）

解决方案 5：

正如其他人所提到的，预先设置列表的最简单方法是使用NoneType对象。

话虽如此，在决定这是否必要之前，你应该了解 Python 列表的实际工作方式。

在列表的CPython实现中，底层数组总是在创建时预留了空间，并且大小逐渐增大( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc)，因此调整列表大小的次数几乎不会发生。

由于这种行为，大多数 list.append()函数的O(1)复杂度为，只有当跨越其中一个边界时，复杂度才会增加，此时复杂度将为O(n)。这种行为导致S.Lott 的答案中的执行时间增加最少。

来源：Python 列表实现

解决方案 6：

如果您使用的是NumPy，因为它具有更多类似 C 的数组，那么 Python 中的预分配问题就会出现。在这种情况下，预分配问题与数据的形状和默认值有关。

如果您对大量列表进行数值计算并且希望获得更好的性能，请考虑使用 NumPy。

解决方案 7：

我运行了S.Lott 的代码，并通过预分配实现了同样的 10% 性能提升。我尝试使用生成器来执行 Ned Batchelder 的想法，并且能够看到生成器的性能优于 doAllocate。对于我的项目来说，10% 的改进很重要，因此感谢大家，因为这很有帮助。

def doAppend(size=10000):
    result = []
    for i in range(size):
        message = "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate(size=10000):
    result = size*[None]
    for i in range(size):
        message = "some unique object %d" % ( i, )
        result[i] = message
    return result

def doGen(size=10000):
    return list("some unique object %d" % ( i, ) for i in xrange(size))

size = 1000
@print_timing
def testAppend():
    for i in xrange(size):
        doAppend()

@print_timing
def testAlloc():
    for i in xrange(size):
        doAllocate()

@print_timing
def testGen():
    for i in xrange(size):
        doGen()


testAppend()
testAlloc()
testGen()

输出

testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms

解决方案 8：

Pythonlist不支持预分配。Numpy 允许您预分配内存，但实际上，如果您的目标是加快程序速度，这似乎不值得。

这个测试只是将一个整数写入列表，但在实际应用程序中，您可能每次迭代都会做更复杂的事情，这进一步降低了内存分配的重要性。

import timeit
import numpy as np

def list_append(size=1_000_000):
    result = []
    for i in range(size):
        result.append(i)
    return result

def list_prealloc(size=1_000_000):
    result = [None] * size
    for i in range(size):
        result[i] = i
    return result

def numpy_prealloc(size=1_000_000):
    result = np.empty(size, np.int32)
    for i in range(size):
        result[i] = i
    return result

setup = 'from __main__ import list_append, list_prealloc, numpy_prealloc'
print(timeit.timeit('list_append()', setup=setup, number=10))     # 0.79
print(timeit.timeit('list_prealloc()', setup=setup, number=10))   # 0.62
print(timeit.timeit('numpy_prealloc()', setup=setup, number=10))  # 0.73

解决方案 9：

对于某些应用程序，字典可能正是您所需要的。例如，在 find_totient 方法中，我发现使用字典更方便，因为我没有零索引。

def totient(n):
    totient = 0

    if n == 1:
        totient = 1
    else:
        for i in range(1, n):
            if math.gcd(i, n) == 1:
                totient += 1
    return totient

def find_totients(max):
    totients = dict()
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

该问题也可以通过预分配列表来解决：

def find_totients(max):
    totients = None*(max+1)
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

我觉得这并不那么优雅并且容易出现错误，因为我存储了 None ，如果我不小心错误使用它们可能会引发异常，并且因为我需要考虑地图可以避免的边缘情况。

确实，字典不会那么高效，但正如其他人所评论的那样，速度上的微小差异并不总是值得承担重大的维护风险。

解决方案 10：

最快的方法 - 使用如 list1 = [False] 1_000_000

比较所有常用方法（列表附加、预分配、for 和 while），我发现使用 * 可以提供最有效的执行时间。

import time

large_int = 10_000_000
start_time = time.time()

# Test 1: List comprehension
l1 = [False for _ in range(large_int)]
end_time_1 = time.time()

# Test 2: Using *
l2 = [False] * large_int
end_time_2 = time.time()

# Test 3: Using append with for loop & range
l3 = []
for _ in range(large_int):
    l3.append(False)
end_time_3 = time.time()

# Test 4: Using append with while loop
l4, i = [], 0
while i < large_int:
    l4.append(False)
    i += 1
end_time_4 = time.time()

# Results
diff_1 = end_time_1 - start_time
diff_2 = end_time_2 - end_time_1
diff_3 = end_time_3 - end_time_2
diff_4 = end_time_4 - end_time_3
print(f"Test 1. {diff_1:.4f} seconds")
print(f"Test 2. {diff_2:.4f} seconds")
print(f"Test 3. {diff_3:.4f} seconds")
print(f"Test 4. {diff_4:.4f} seconds")

print("
Test 2 is faster than - ")
print(f"            Test 1 by - {(diff_1 / diff_2 * 100 - 1):,.0f}%")
print(f"            Test 3 by - {(diff_3 / diff_2 * 100 - 1):,.0f}%")
print(f"            Test 4 by - {(diff_4 / diff_2 * 100 - 1):,.0f}%")

python 列表初始化