This article explores how block-based parallelization improves the efficiency of probabilistic circuits by reducing both IO and computation overhead. Starting with fully connected sum layers, it explains how assigning indices, grouping node blocks, and padding with pseudo-nodes enable optimized kernel launches. Using dynamic programming for partitioning ensures minimal overhead while maximizing speed. Results show that larger block sizes cut IO operations dramatically, achieving up to 50x faster performance without significant cost from padded edges.This article explores how block-based parallelization improves the efficiency of probabilistic circuits by reducing both IO and computation overhead. Starting with fully connected sum layers, it explains how assigning indices, grouping node blocks, and padding with pseudo-nodes enable optimized kernel launches. Using dynamic programming for partitioning ensures minimal overhead while maximizing speed. Results show that larger block sizes cut IO operations dramatically, achieving up to 50x faster performance without significant cost from padded edges.

How Block-Based Parallelization Cuts IO and Computation Overhead

2025/08/25 07:11

Abstract and 1. Introduction

  1. Preliminaries and Related Work

  2. Key Bottlenecks in PC Parallelization

  3. Harnessing Block-Based PC Parallelization

    4.1. Fully Connected Sum Layers

    4.2. Generalizing To Practical Sum Layers

    4.3. Efficient Implementations by Compiling PC Layers

    4.4. Analysis: IO and Computation Overhead

  4. Optimizing Backpropagation with PC Flows

  5. Experiments

    6.1. Faster Models with PyJuice

    6.2. Better PCs At Scale

    6.3. Benchmarking Existing PCs

  6. Conclusion, Acknowledgements, Impact Statement, and References

A. Algorithm Details

B. Additional Technical Details

C. Experimental Details

D. Additional Experiments

\

4. Harnessing Block-Based PC Parallelization

This section takes gradual steps toward demonstrating how we can reduce both the IO and computation overhead using block-based parallelization. Specifically, we first utilize a fully connected sum layer to sketch the high-level idea (Sec. 4.1). Consequently, we move on to the general case, providing further details of the algorithm (Secs. 4.2, 4.3).

4.1. Fully Connected Sum Layers

Consider a fully connected sum layer comprised of M sum nodes, each connected to the same set of N product nodes as inputs. Under the parallelization strategy mentioned in

\ Figure 3. Illustration of block-based parallelization. A processor computes the output of 2 sum nodes, by iterating through blocks of 2 input product nodes and accumulating partial results.

\ Section 3, with a single sample, we have M processors each computing the output of a sum node. Since the layer is fully connected, every processor loads all N input log-probabilities, which results in M reloads of every input.

\

4.2. Generalizing To Practical Sum Layers

\

\ \ \ Figure 4. A sum layer (left) with a block-sparse parameter matrix (middle) is compiled into two kernels (right) each with a balanced workload. During execution, each kernel uses the compiled sum/prod/param indices to compute the outputs of m0, . . . , m5.

\ \ \

\ \ \

4.3. Efficient Implementations by Compiling PC Layers

We address both problems through a compilation process, where we assign every node an index, and precompute index tensors that enable efficient block-based parallelization. The first step is to partition the sum node blocks into groups, such that every node block within a group has a similar number of connected child node blocks. We then pad the children with pseudo-product node blocks with probability 0 such that all sum node blocks in a group have the same number of children. The partition is generated by a dynamic programming algorithm that aims to divide the layer into the smallest possible number of groups while ensuring that the fraction of added pseudo-node blocks does not exceed a predefined threshold. Due to space constraints, we elaborate the node block partitioning algorithm in Appendix A.1. We also discuss its optimality and time/memory efficiency.

\ \

\ \ \

\ \ Partitioning a layer into groups with the same number of children allows us to use different kernel launching hyperparameters according to the specific setup of every node group (e.g., number of nodes) to achieve better performance.

\ \

\ \ \

\

4.4. Analysis: IO and Computation Overhead

\

\ \ \ igure 5. Runtime and IO overhead of a sum layer from the PD structure (with 29K nodes and 30M edges). The results demonstrate significant performance gains from our block-based parallelization, even with small block sizes.

\ \ Results are shown in Figure 5. As the block size increases, both the forward and the backward pass become significantly faster. Notably, this is accompanied by a significant drop in IO overhead. Specifically, with a large block size, the kernel consumes 2x fewer reads/writes between the L2 cache and the HBM, and 25-50x fewer IO between the L1 and L2 cache. This corroborates the hypothesis stated in Section 3 that the extensive value reloads significantly slow down the computation.

\ \

\ \ the speedup obtained by having a larger block size outpaces the overhead caused by padded edges with zero parameters, which leads to speed-ups.

\ \

:::info Authors:

(1) Anji Liu, Department of Computer Science, University of California, Los Angeles, USA (liuanji@cs.ucla.edu);

(2) Kareem Ahmed, Department of Computer Science, University of California, Los Angeles, USA;

(3) Guy Van den Broeck, Department of Computer Science, University of California, Los Angeles, USA;

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, service@support.mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

추천 콘텐츠

Crypto-Fueled Rekt Drinks Sells 1 Millionth Can Amid MoonPay Collab

Crypto-Fueled Rekt Drinks Sells 1 Millionth Can Amid MoonPay Collab

The post Crypto-Fueled Rekt Drinks Sells 1 Millionth Can Amid MoonPay Collab appeared on BitcoinEthereumNews.com. In brief Rekt Brands sold its 1 millionth can of its Rekt Drinks flavored sparkling water. The Web3 firm collaborated with payments infrastructure company MoonPay on a peach-raspberry flavor called “Moon Crush.” Rekt incentivizes purchasers of its drinks with the REKT token, which hit an all-time high market cap of $583 million in August. Web3 consumer firm Rekt Brands sold its 1 millionth can of its Rekt Drinks sparkling water on Friday, surpassing its first major milestone with the sold-out drop of its “Moon Crush” flavor—a peach raspberry-flavored collaboration with payments infrastructure firm MoonPay.  The sale follows Rekt’s previous sellout collaborations with leading Web3 brands like Solana DeFi protocol Jupiter, Ethereum layer-2 network Abstract, and Coinbase’s layer-2 network, Base. Rekt has already worked with a number of crypto-native brands, but says it has been choosy when cultivating collabs. “We have received a large amount of incoming enquiries from some of crypto’s biggest brands, but it’s super important for us to be selective in order to maintain the premium feel of Rekt,” Rekt Brands co-founder and CEO Ovie Faruq told Decrypt.  (Disclosure: Ovie Faruq’s Canary Labs is an investor in DASTAN, the parent company of Decrypt.) “We look to work with brands who are able to form partnerships that we feel are truly strategic to Rekt’s goal of becoming one of the largest global beverage brands,” he added. In particular, Faruq highlighted MoonPay’s role as a “gateway” between non-crypto and crypto users as a reason the collaboration made “perfect sense.”  “We’re thrilled to bring something to life that is both delicious and deeply connected to the crypto community,” MoonPay President Keith Grossman told Decrypt.  Rekt Brands has been bridging the gap between Web3 and the real world with sales of its sparkling water since November 2024. In its first sale,…
공유하기
BitcoinEthereumNews2025/09/20 09:24