Podcast: Accelerating FPGA Adoption for AI Inference with the Inspur TF2
Earlier this year Inspur’s GM of Sales for Strategic Accounts Bob Anderson sat down with Intel’s Kyle Ambert and Emily Hudson on their Intel on AI podcast to talk about the TF2 compute acceleration engine and how it lowers the barrier of entry to using FPGAs for AI inference. You can read it transcribed below or listen to the original podcast here.
Emily and Kyle: Hi this is Emily Hudson and Kyle Ambert and this is Intel on AI. Today we’re talking with Bob Anderson, the General Manager of Sales for Strategic Accounts at Inspur. Welcome Bob.
Bob: Thank you for having me on. Appreciate it.
Emily: Can you tell us a little bit about your work with Inspur?
Bob: Absolutely. I currently work with strategic accounts in the US. I’ve been with Inspur coming up on 4 years now. I’ve been in the data center server business over a decade, and I’ve been in the technology arena since the late 80s.
Kyle: Can you tell us a little bit about what got you interested in working in data center technologies?
Bob: I kind of got into data center technology by accident. I was working in consumer technologies and as we saw what was happening in the data center business and how the server business was evolving, we saw an opportunity to be able to provide solutions that were more customized to especially internet infrastructure customers, and the rest is history.
Kyle: I was reading a little bit about Inspur, it sounds like a really interesting company. One of the offerings that I saw you had was a full-stack AI platform and thought that was really cool. I’ve seen these starting to come up more recently with other companies as well, so I was curious how do you differentiate yourselves from your competition, and who would you say your target customer is?
Bob: Inspur is really the world’s leading AI computing platform provider. How we differentiate is we offer a four-layer AI stack consisting of computing hardware, management suite, framework optimization and application acceleration. This stack goes in agile efficient and optimized AI infrastructure. As it relates to our target customer, Inspur becomes a very important AI server and solutions supplier for many top tier CSP customers.
Kyle: You mentioned framework optimization, that’s something that I’ve spent a lot of time, my day to day role at Intel, doing. So I’m curious, at Inspur, whether you guys handle that in an automated way, or if it requires manual optimization by the engineers?
Bob: Currently a lot of AI companies adopt the technical route of using FPGA technology to achieve customizable, low latency, high performance and high power consumption ratio for AI inference application. However there are so many challenges before FPGA technology enters into large scale AI deployment, like high software writing threshold, limited performance optimization, and challenging power control. So the goal of Inspur’s TF2 compute acceleration engine is to solve these challenges for those customers. That TF2 compute acceleration engine which supports TensorFlow helps AI customers quickly implement these FPGAs based on mainstream AI training software, and deep neural network model – or DNN – on inference. Our TF2 engine delivers high performance and low latency for AI application through the world’s first DNN shifting technology on FPGAs.
Emily: Kyle and I were talking before the call about how you don’t often hear FPGAs and TensorFlow in the same sentence, so this seems that it’s new ground you’re breaking. I’d love to hear a little bit about AI applications that this TF2 is really well suited for.
Bob: Yeah sure. An example of an AI application powered by TF2 and our FPGA, the F10A — the SqueezNet model on the F10A FPGA card shows excellent computational performance for the TF2 acceleration engine. So if you’re not familiar with the F10A, it is the world’s first half-height and half-length FPGA accelerator card to support the Intel Arria10 chip. SqueezeNet, typical convolutional neural network architecture which is a streamlining model but its accuracy is comparable to AlexNet. It’s especially suitable for image-based AI applications with high real-time requirements. Running the SqueezeNet model optimized by the TF2 engine on FPGA, the calculation time of this single picture is 0.674 milliseconds while maintaining the original accuracy. So image is certainly one of those applications.
Emily: So, Inspur’s TensorFlow FPGA compute acceleration engine, TF2, can you give us a little bit more detail about that?
Bob: Yeah I’d be happy to. The TF2 computing acceleration engine consists of two parts. First part is the model optimization conversion tool, the TF2 transform kit which optimizes and transforms the deep neural network model data trained by the framework, such as TensorFlow, by greatly reducing the size of the model data file. So it can compress 32 bit floating point data into a 4 bit data integer model, making the actual model data file size smaller than the original 1/8 and basically keeps the rule storage of the original model data. The second part is the FPGA intelligent running engine TF2 runtime engine. It can automatically convert the previously optimized model file into an FPGA target running file. In order to eliminate the dependence of deep neural networks, such as CNN, on FPGA floating point computing power, Inspur designed the innovative shift computing technology which can quantize 32 bit float points into 8 bit integer data. So combine that with the aforementioned 4 bit integer model data, the converted convolution operation floating point multiplication is calculated as an 8 bit integer shift operation. This greatly improves the FPGA for inference calculation performance, and effectively reduces its actual operating power consumption. It’s also the world’s first case of implementing the shift operation of deep neural network on FPGA, under the premise of maintaining the accuracy of the original model.
Kyle: Nice. So, I’m curious what sorts of challenges does TF2 enable customers to solve on FPGA?
Bob: Another great question. So, our Inspur TF2 computing acceleration engine improves AI calculation performance on the FPGA through technical innovations like shift calculation and model optimization. This lowers the AI software implementation threshold of the FPGA. It supports the FPGA to be widely used in the AI ecosystem to enable more AI applications. We plan to open this TF2 engine to our customers. We will continue to upgrade and develop optimization technologies that can support multiple models, the latest deep neural network model, and FPGA accelerator cards using the latest chip. We do expect that the performance of our next generation high performance FPGA accelerator card will be 3 times that of F10A.
Kyle: So, what you’re describing what TF2 is doing on FPGA reminds me a lot of an Intel product that we have — OpenVino — in which we essentially try to do something similar. We target FPGAs as a deployment device but it’s most frequently written about in conjunction with data center type of deployment, wherein we take a TensorFlow model and then compile it into an intermediate representation and then run it, using some optimization that we can enable that lets it run faster on CPU, GPU, etc. So can you tell us a little bit about how Inspur is working with Intel and what sorts of solutions your partnership with intel has made available for your customers?
Bob: Inspur, in partnering with Intel over the past 20 plus years, delivering IA – Intel architecture – based products, so specifically in the AI space, we have both Intel Xeon and Intel Alterra based solutions, such as our TensorFlow supported FPGA compute acceleration engine, also known as TF2.
Emily: So Bob, for our listeners out there who want to learn more about inspur’s FPGA solutions, where should they go for more information?
Bob: They can get information on our website – inspursystems.com.
Emily and Kyle: Great, thank you so much for joining us today, Bob. This is Emily Hudson, and this is Kyle Ambert, and we are Intel on AI.