06:03PM EDT - Cerebras did the wafer scale - a single chip the size of a wafer

06:03PM EDT - Here's WSE1

06:04PM EDT - Uses standard tools like TensorFlow and pyTorch with Cerebras compiler

06:04PM EDT - CS-1 fits in standard 15U rack

06:04PM EDT - 400k cores

06:04PM EDT - (Costs a few $mil each)

06:05PM EDT - No DRAM, full on-chip SRAM

06:05PM EDT - 3D mesh network

06:06PM EDT - Allows all 400k cores to work on the same problem

06:06PM EDT - Linear perf scaling

06:07PM EDT - Cerebras graph compiler

06:07PM EDT - extract compute graph, create a graph in the WSE format, route kernals on the fabric, then create executable

06:09PM EDT - Graph matching for matrix multiply loops

06:09PM EDT - Supports hand optimized kernels

06:10PM EDT - Model-parallel and data-parallel optimization with the compiler

06:11PM EDT - Trade-off as resources vs compute for each kernel

06:12PM EDT - All kernels can be resized as needed

06:12PM EDT - All functionally identical

06:13PM EDT - Global optimization function to maximize throughput and utilization

06:14PM EDT - 3 key benefits

06:15PM EDT - Flexible parallelism

06:16PM EDT - Enough Fabric performance to connect everything at scale

06:16PM EDT - Otherwise slow across GPUs or a cluster

06:16PM EDT - Small batch size has super high utilization

06:16PM EDT - no weight sync overhead

06:19PM EDT - Core is designed for sparsity

06:20PM EDT - Intrinsic sparsity harvesting

06:20PM EDT - filters out all zeros

06:23PM EDT - ML user has full control over full range of sparse techniques

06:23PM EDT - WSE is MIMD, each core can be independent

06:24PM EDT - True variable sequence length support

06:24PM EDT - No padding required

06:24PM EDT - Higher utilization for irregular models

06:25PM EDT - Dynamic depth networks

06:26PM EDT - only need to process exact lengths of sequences

06:26PM EDT - World's most powerful AI computer

06:27PM EDT - Complete flexibility due to the size of the wafer scale engine

06:28PM EDT - Working in the lab today

06:29PM EDT - More info later this year

06:29PM EDT - Q&A team

06:29PM EDT - time*

06:31PM EDT - Q: What's the main benefit of WSE? A: Bypassing the issues with workloads that need multiple GPU/TPU/DPU. Opens up novel techniques that wouldn't run on traditional hardware at any sense of speed

06:32PM EDT - Q: How to feed the beast? A: Traditional server might be the GPU/TPU, so the next one is IO. We are a system company, our product is the full system, as we control all aspects of the system. We have a 1.2 Tb/s ethernet interconnect to feed the engine to keep up with the compute

06:34PM EDT - Q: How long does it take a compile a model over 400k units? A: It's an algorithmically complex search space problem. Annealing and heuristics bring that down - we are borrowing many ideas from the EDA industry. Our problem is simpler than billions of LEs on FPGAs, so we're in the minutes.

06:34PM EDT - That's a wrap. Next talk is 4096 RISC-V chip



View All Comments

  • jamessw - Monday, September 21, 2020 - link

    I will definitely share your post on my https://www.bestessays.com/edu_college.php website. My audience will be happy to find this info. Thanks! Reply
  • Homework Help - Monday, September 28, 2020 - link

    If you have any problem with writing your assignment, you don’t have to worry because we understand the importance of assignments in your academics. We understand how important it is for you to get good grades/marks in your assignments. Friends, you are just a click away from getting better grades in your assignment. Our assignment help service with its experienced and expert provide all sorts of assignment help to the students. Our assignment help services are affordable and you can get the maximum value for your money.
    Our services are available 24/7. Talk to us any time. We guarantee you quality and 100% satisfaction.

    Visit us Now !!!******************
    <a href="http://helpinhomework.org/blog/how-to-write-an-ess... to write a good essay</a>

Log in

Don't have an account? Sign up now