The Best of Modern Code | October

October 11, 2017, 10:57 am

Latest and popular articles on Intel Technologies

≫ Next: The Parallel Universe Issue #27: The Changing HP Landscape Still Looks the Same

≪ Previous: Top Ten Intel Software Developer Stories | October

Intel® Xeon® Scalable Processor Cryptographic Performance

Security doesn’t have to be slow. Learn about the cryptographic performance enhancements seen in the Intel® Xeon® Scalable processors.

Cells in the Cloud: Thoughts on the Distributed Architecture

Read about the simulation models being used in the BioDynaMo project, which is part of CERN* openlab.

Modernizing Software with Future-Proof Code Optimizations

Learn how to exploit the full performance potential of Intel processors using the new Intel® Parallel Studio XE.

DeepMask Installation and Annotation Format for a Satellite Imagery Project

The process of training our computers to recognize different objects in given images is complex. Abu Bakr describes the process of transfer learning.

Mode Collapse in GANS

Read about mode collapse, what it is, how it happens, and what affect is has on Generative Adversarial Network (GAN) architectures.

Register NOW for the Intel® HPC Developer Conference

Don't miss out on this free conference. You will get two days of shared expertise by industry luminaries using real-code optimization techniques and best practices, including hands-on experience and networking. Register Now.

You will experience two days of shared learnings by industry luminaries, using real-code optimization techniques and best practices, including hands-on experience and networking.

↧

The Parallel Universe Issue #27: The Changing HP Landscape Still Looks the Same

October 9, 2017, 11:02 am

Latest and popular articles on Intel Technologies

≫ Next: The Parallel Universe Issue #28: Parallel Languages, Language Extensions, and Application Frameworks

≪ Previous: The Best of Modern Code | October

Celebrating 20 Years of OpenMP*

The OpenMP* application programming interface turns 20 this year. To celebrate, we tapped Michael Klemm (the current CEO of the OpenMP Architecture Review Board, or ARB) and some of his colleagues to give an overview of the newest features in the specification―particularly, enhancements to task-based parallelism and offloading computations to specialized accelerators.

Our feature article covers The Present and Future of the OpenMP API Specification, so I’ll say a little about its past. I half-jokingly refer to the early to mid-1990s as the bad old days of high-performance computing (HPC). There were many, many diﬀerent parallel programming models and parallel architectures dotting a fast-changing HPC landscape. For distributed-memory architectures, there were low-level, message-passing methods like SHMEM, high-level, methods like PVM or MPI, and even higher levels of abstraction with High Performance Fortran and Unified Parallel C. For shared-memory architectures, there were low-level threading methods like Pthreads or higher-level compiler-directed threading. One thing was clear: There were no magic compilers that could automatically parallelize real applications. Parallel compiler directives were the next best thing.

For those of us who remember parallel compiler directives before OpenMP, there were many vendor-specific sets to choose from (e.g., Cray, SGI, Intel, Kuck and Associates, Inc.), each doing the same thing but with diﬀerent syntaxes. In exasperation, several large governmental HPC facilities demanded a unified syntax for parallel compiler directives.

OpenMP was born in 1997. Most of the original vendors are still on the ARB, and many more members have been added since (the ARB currently has 29 members). It remains the gold standard for portable, vendor-neutral parallel programming directives because it never lost sight of its original purpose.

Today, MPI and OpenMP cover most application requirements in HPC. There are still challenges. Memory subsystems are as unbalanced as ever, diﬀerent processor architectures now commonly exist within the same system, and keeping data coherent among these diﬀerent processing elements is an additional burden on the programmer. But MPI and OpenMP continue to evolve with these challenges, so the HPC future looks bright.

New Tools for Tuning Serial Performance

Parallelism is great, but would you parallelize code that has not been properly tuned? No, you wouldn’t. So this issue of The Parallel Universe also looks at tuning serial performance. My first supercomputer was a Cray X-MP, so I learned early the importance of vectorization. Vectorization Opportunities for Improved Performance with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) gives a good overview of tuning code with the new Intel AVX-512 instruction set and shows how to use these instructions to expose vectorization opportunities that were not previously possible. The new Intel® Advisor Rooﬂine and Intel® VTune™ Amplifier Memory Analysis features help visualize performance optimization tradeoﬀs and how memory access is aﬀecting an application’s performance. These features are demonstrated in Intel® Advisor Rooﬂine Analysis and Identify Scalability Problems in Parallel Applications. We round out this issue with tips for optimizing general matrix-matrix multiplication operations in the Intel® Math Kernel Library (Reducing Packing Overhead in Matrix-Matrix Multiplication) and a brief overview of Intel software support for machine learning (Intel-Powered Deep Learning Frameworks).

Hello, I’m New Here

Finally, I’d like to introduce myself as the new editor of The Parallel Universe. I’ve been doing HPC since about 1990, but I was originally doing research in computational life science. Each successive research project required more computing power. To stay relevant, I had to learn about performance tuning and parallel programming. My academic background is in biochemistry and genetics, so I resented the intrusion of computer science into my scientific domain. But my initial resistance gave way to fascination when I saw how HPC could change my research and make it possible to answer new and bigger research questions. Hardware and software advances allow me to quickly run simulations on my laptop that once took days on a circa 1995 supercomputer. I used to dread the heterogeneous parallel computing future. Now, I welcome it with the same fascination I had as a young graduate student.

Read it >

Subscribe >

↧

The Parallel Universe Issue #28: Parallel Languages, Language Extensions, and Application Frameworks

October 9, 2017, 1:44 pm

Latest and popular articles on Intel Technologies

≫ Next: The Parallel Universe #29: Old and New

≪ Previous: The Parallel Universe Issue #27: The Changing HP Landscape Still Looks the Same

Back in the days of nonstandard programming languages and immature compilers, parallel computing as we know it today was still far over the horizon. It was still a niche topic, so practitioners were content with language extensions and libraries to express parallelism (e.g., OpenMP*, Intel® Threading Building Blocks (Intel® TBB), MPI*, pthreads*). Programming language design and parallel programming models were separate problems, so they continued along distinct research tracks for many years. These tracks would occasionally cross with varying degrees of success (e.g., High-Performance Fortran*, Unified Parallel C*), and there were frequent debates about whether the memory models of popular languages even allowed parallelism to be implemented safely. However, much was learned during this time of debate and experimentation.

Today, parallel computing is so ubiquitous that we’re beginning to see parallelism become a standard part of mainstream programming languages. This issue’s feature article, Parallel STL: Boosting Performance of C++ STL Code, gives an overview of the Parallel Standard Template Library in the upcoming C++ standard (C++17) and provides code samples illustrating its use.

Though it’s not a parallel language in and of itself, we’re still celebrating 20 years of OpenMP, the gold standard for portable, vendor-neutral parallel programming directives. In the last issue of The Parallel Universe, Michael Klemm (the current CEO of the OpenMP Architecture Review Board) gave an overview of the newest OpenMP features. In this issue, industry insider Rob Farber gives a retrospective look at OpenMP’s development and its modern usage in Happy 20th Birthday, OpenMP.

I rely on R for certain tasks but I won’t lie to you, it’s not my favorite programming language. I would never have thought to use R for high-performance computing (HPC) but Drew Schmidt from the University of Tennessee Knoxville makes the case for using this popular statistics language in HPC with R: The Basics. Drew’s article is helping to make an R believer out of me.

New Software for Machine Learning

There’s no denying that machine learning, and its perhaps-more-glamorous nephew, deep learning, are
consuming a lot of computing cycles these days. Intel continues to add solutions to its already robust machine learning portfolio. The latest oﬀering, BigDL, is designed to facilitate deep learning within big data environments. BigDL: Optimized Deep Learning on Apache Spark* will help you get started using this new framework. Solving Real-World Machine Learning Problems with the Intel® Data Analytics Acceleration Library (Intel® DAAL) walks through classification and clustering using this library. Two problems taken from the Kaggle predictive modeling and analytics platform are used to illustrate, and comparisons to Python* and R alternatives are shown.

Coming Attractions

Future issues of The Parallel Universe will contain articles on a wide range of topics. Stay tuned for articles on the Julia* programming language, working with containers in HPC, fast data compression for cloud and IoT applications, Intel® Cluster Checker, and much more.

Read it >

Subscribe >

↧

The Parallel Universe #29: Old and New

October 10, 2017, 12:15 pm

Latest and popular articles on Intel Technologies

≫ Next: DEVFEST: SEVEN SMART REASONS TO REGISTER

≪ Previous: The Parallel Universe Issue #28: Parallel Languages, Language Extensions, and Application Frameworks

Back in 1993, the institute where I was doing my postdoctoral research got access to a Cray C90* supercomputer. Competition for time on this system was so fierce that we were told―in no uncertain terms―that if our programs didn’t take advantage of the architecture, they should run elsewhere. The C90 was a vector processor, so we had to vectorize our code. Those of us who took the time to read the compiler reports and make the necessary code modifications saw striking performance gains. Though vectorization would eventually take a backseat to parallelization in the multicore era, this optimization technique remains important. In Vectorization Becomes Important―Again, Robert H. Dodds Jr. (professor emeritus at the University of Illinois) shows how to vectorize a real application using modern programming tools.

We continue our celebration of OpenMP*’s 20th birthday with a guest editorial from Bronis R. de Supinski, chief technology officer of Livermore Computing and the current chair of the OpenMP Language Committee. Bronis gives his take on the conception and evolution of OpenMP as well as its future direction in OpenMP Is Turning 20! Though 20 years old, OpenMP continues to evolve with modern architectures.

I used to be a Fortran zealot, before becoming a Perl* zealot, and now a Python* zealot. Recently, I had occasion to experiment with a new productivity language called Julia*. I recoded some of my time-consuming data wrangling applications from Python to Julia, maintaining a line-for-line translation as much as possible. The performance gains were startling, especially because these were not numerically intensive applications, where Julia is known to shine. They were string manipulation applications to prepare data sets for text mining. I’m not ready to forsake Python and its vast ecosystem just yet, but Julia definitely has my attention. Take a look at Julia*: A High-Level Language for Supercomputing for an overview of the language and its features.

This issue’s feature article, Tuning Autonomous Driving Using Intel® System Studio, illustrates how the tools in Intel System Studio give embedded systems and connected device developers an integrated development environment to build, debug, and tune performance and power usage. Continuing the theme of tuning edge applications, Building Fast Data Compression Code for Cloud and Edge Applications shows how to use the Intel® Integrated Performance Primitives (Intel® IPP) to speed data compression.

As I mentioned in the last issue of The Parallel Universe, R* is not my favorite language. It is useful, however, and Accelerating Linear Regression in R* with Intel® Data Analytics Acceleration Library (Intel® DAAL) shows how data analytics applications in R can take advantage of the Intel® Data Analytics Acceleration Library. Finally, in MySQL* Optimization with Intel® C++ Compiler, we round out this issue with a demonstration of how Interprocedural Optimization significantly improves the performance of another important application for data scientists, the MySQL database.

Coming Attractions

Future issues of The Parallel Universe will contain articles on a wide range of topics, including persistent memory, IoT development, the Intel® Advanced Vector Extensions (Intel® AVX) instruction set, and much more. Stay tuned!

Read it >

Subscribe >

↧

DEVFEST: SEVEN SMART REASONS TO REGISTER

October 12, 2017, 5:23 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel Game Developer Showcase comes back to Austin for year two!

≪ Previous: The Parallel Universe #29: Old and New

IoT Skill-Building, Mentoring and More For Developers – Attend Intel Global IoT DevFest II Nov. 7-8

The return of Intel’s premier online forum for IoT developers worldwide is only weeks away – which means it’s time to reserve your place at this virtual conference now.

Intel Global IoT DevFest II will serve up a two-day feast of commercial IoT knowledge and inspiration, featuring more than 100 speakers from Intel and its technology partners, as well other IoT visionaries. These IoT superstars will cover a wide range of commercial IoT-related topics that fall into four main subject tracks.

DevFest Tracks

The raw IoT expertise headlining DevFest is reason enough to attend. But if you’re still on the fence, consider these seven additional reasons to register now:

Reasons to Attend Intel Global IoT DevFest II

1. DevFest keeps you current with all things IoT. In case you missed the inaugural event in June, Intel Global IoT DevFest II will help you get current on the latest developments and trends. DevFest provides a worldwide platform for industry thought leaders, innovators, professional developers, and enthusiasts to contribute their knowledge and insights, engage in deep-dive training, and showcase real-world usages of commercial IoT solutions in action.

2. DevFest II is bigger and better. Each day’s activities are extended to 16 hours, for a total of 32 hours – 8 a.m. to 12 a.m. UTC, November 7 and 8. And that means more keynotes, more presenters, more demos, more participants and more mentoring opportunities. More of everything you expect from Intel and it’s developer program. Our debut event last June drew participants from 94 countries to hear 46 talks from leading IoT experts; this one will be bigger yet.

Speakers

3. Engage in 1:1 mentoring with the leaders in IoT development. DevFest brings together star IoT innovators who lend their expertise, experience and skills for personal engagement with up-and-coming developers. These mentoring relationships enrich our development gene pool to improve the overall quality of commercial IoT solutions coming to market.

4. You don’t have to come to DevFest – it’s coming to you. To be more inclusive for the worldwide IoT developer community, this next DevFest forum is expanded to accommodate more time zones. Whatever your working hours, DevFest will provide the IoT expertise and content you need, when you need it.

5. Connect and collaborate with peers. DevFest is your opportunity to engage with fellow IoT developers in your areas of interest. Share tips and tricks, compare ideas and methods, and grow collectively as an IoT development community.

6. Intel hosts DevFest at no cost to you. This commercial IoT-focused event is for passionate developers who share a love of creating their own unique contribution to the Internet of Things. Through online forums, as well as a vast array of tools, resources and training, Intel strives to nurture your IoT developer skills and talents to their fullest potential. DevFest is our investment in you.

7. DevFest is your voice in the IoT community. DevFest provides you and other IoT developers a voice and opportunity to share projects, be inspired and perhaps even influence others. Plan to join us as we share knowledge, tools and training to connect the unconnected and build a software-defined autonomous world.

Plan to Attend Intel® Global IoT DevFest II

It’s almost here: Intel’s premier online forum for all things IoT returns Nov. 7 and 8. Learn why you should plan to connect with your IoT developer peers worldwide, and meet some of the 100+ industry leaders who will be on hand to share their knowledge and vision.

↧

Intel Game Developer Showcase comes back to Austin for year two!

October 11, 2017, 4:02 pm

Latest and popular articles on Intel Technologies

≫ Next: The Fab Five: Game Developer Content | October

≪ Previous: DEVFEST: SEVEN SMART REASONS TO REGISTER

About 3 weeks ago Austin had its second Game Developer Showcase. This Intel sponsored event took place at Brazos Hall, and is one of the keystone events that kick off a week of Game Dev focused activity; all focused around the Austin Games Conference (AGC). The atmosphere was electric with curiosity from more than 200 developer’s onsite to network and play local PC games in various stage of development.

If you love good tacos you were also in luck!

Once the showcase opened attendees picked up their AGC badges from us allowing them to forgo the long lines at the convention center. What followed was an hour for networking and drinks (on Intel of course) before the showcase officially started.

Our game dev community is the best

The point of the showcase is to give local and regional Indie developers an opportunity to pitch their games to a panel of industry judges and compete for prizing and industry cred. This year we added a little something extra for the winners, each of them would earn a spot in the Intel booth at AGC; allowing them to demo their game and increase its exposure and visibility. We had 8 developers give quick 5 minute pitches, then followed up by allowing the judges and showcase attendees got to play each game. After the game play sessions our evening concluded with the prize awards:

Best Overall Game: Outpost Delta by Hidden Achievement
Most Unique Game: SimShine by Team DogPit
Best Artwork: Renegade Knight by Radiance Studios
Audience Pick: It was a tie!
- To the Top by Electric Hat Games
- Cutthroat by Ocumens

The Outpost Delta Team by Hidden Achievement

Our winning team: The Outpost Delta Team by Hidden Achievement

For those who couldn’t attend, we streamed the whole event on our Twitch channel, https://www.twitch.tv/collections/pNYwOu2b6BTULg. We hope to see you at our annual Intel Buzz Workshop next September in Austin!

For those in attendance, thank you for joining our Indie game dev event. We had just as much fun as you did and can’t wait to see what next year brings!

For more information on Intel Buzz Workshops and the Game Developer Showcase please check out https://software.intel.com/buzzworkshop

Stephanie, David, Cindi, Phil, and Josh

The Intel Team: Stephanie, David, Cindi, Phil, and Josh

↧

The Fab Five: Game Developer Content | October

October 16, 2017, 4:54 pm

Latest and popular articles on Intel Technologies

≫ Next: AR/VR Tools and Tech Meetup, Austin Sept 2017: Apple* ARKit and Google* ARCore

≪ Previous: Intel Game Developer Showcase comes back to Austin for year two!

Announcing the Partnership between Scalar and Intel

StudioCloud* by Scalar is a production pipeline for the media and entertainment industry to offer clients more flexibility and shorter-term solutions.

More Than Just a Pretty Game with Dinosaurs

It wasn’t just the dinosaur fights that attracted Pubgames* to Extinction but the opportunity to figure out new methods for making the game development economy work.

Use the Intel® SPMD Program Compiler for CPU Vectorization in Games

Find out how you can migrate highly vectorized GPU compute kernels to vectorized CPU code by using the Intel® SPMD Program Compiler.

Share Gameplay Your Way with Swing

Making the streaming process more cost effective and efficient, Swing allows you to share your gameplay to any device live or on-demand, as well as across social networks

The Intel® Software Distribution Hub Fuels Enthusiast PC Sales with Amazing PC Game Bundles

The new Intel® Software Distribution Hub allows hardware partners to create unique bundles and promotions and helps to reach more customers via Intel hardware channels.

Like this content? Join the the Intel Game Dev Program. Miss the Fab Five last month? Read it here.

↧

AR/VR Tools and Tech Meetup, Austin Sept 2017: Apple* ARKit and Google* ARCore

October 17, 2017, 12:17 pm

Latest and popular articles on Intel Technologies

≫ Next: Question: Does Software Actually Use New Instruction Sets?

≪ Previous: The Fab Five: Game Developer Content | October

The AR/VR Tools and Tech meetup was formed in July in Austin, Texas and is focused on exploring all the newest tech tricks on a monthly basis. The September edition was all about Augmented Reality (AR). Apple and Google are both advancing strong into the next generation of AR development, and their new tools unlock a huge potential for creating cutting-edge experiences!

This September was a great opportunity to get involved in the Austin, Texas virtual reality (VR) scene. Our community loves to come together over pizza and beer to share tips, tricks, and hacks for emerging tech creation. Being acknowledged as one of the major tech cities in the country has spawned a lot of interest in creating both gaming and non-gaming applications alike. For us, as Innovators, we get to share our experiences and our integral knowledge of some cutting-edge tools for developers.

Intel® Software Innovator, Tim Porter, co-hosted the event. Tim is co-founder of Underminer Studios which has been in the emerging technology space for two years. His passionate and skilled team is focused on changing the perspective of how technology can solve real problems. His clients seek leading edge tech solutions with content creation, consulting, and ideation for entertainment, medical, enterprise, education, training, and other burgeoning fields. They are building the paradigms that are creating the future.

This was the first time Tim Porter has hosted a meetup and he spoke to the crowd about what Intel is doing in the VR field as well as the Intel® Software Innovator program. He talked about how the program has helped him with resources to push further in to leading edge technologies and allow him to work toward solving problems with opportunities through early innovation projects and technical articles on an MR Configuration Tool and VR Optimization that help spread the knowledge to developers and pushing their own capabilities. He encouraged using Intel as a resource for any developers, especially those that want to learn from innovators and experts in their fields in a creative, and supportive developer-focused environment.

The bulk of the event was dedicated to breaking down the differences, similarities and changes that Apple’s* ARKit and Google’s* ARCore are bringing to the alternate reality space. The release of Apple’s product has spurred a new wave of interest in how to use AR in enterprise, medical, education and other ventures. There was also a speaker from Banjo that talked about a measuring tool they developed for ARKit and the applications.

ARKit and ARCore are the new competing platforms in the world of AR. These are mobile solutions that promise to do away with the $500-$1000 licenses which plague the industry at this point. They are fully embedded and functional solutions which take into account the hardware limitations and device specific quirks that are inherent from a diversified market place.

This meetup taught the audience to utilize AR tools that are not quite ready for prime-time within a production environment. This makes an impact in the Austin Area showing that BETA tools can be usable. For a look at the presentation, check out this link. This monthly meeting is at Capital Factory and will update with the latest topic on their Meetup page toward the middle of the month so that the most relevant and interesting technology tools and tech get shared in good time.

At Underminer Studios, we are currently working on using AR/VR and other leading edge tech in a new product that expands on IDEGO: Virtual Engagements and Moment AR and includes elements from REEF and the SenceBand wearable to open up new possibilities with a cohesive digital ecosystem that showcases an even better way to improve mental health. Leveraging tech will help users LEARN + FIX + SUSTAIN whole health. Using alternative realities, empathy training, and biofeedback to identify emotions, educate caregivers, and treat each unique user with an evolving and personalized ecosystem.

↧

Question: Does Software Actually Use New Instruction Sets?

October 17, 2017, 1:00 pm

Latest and popular articles on Intel Technologies

≫ Next: MeshCentral2 - Upcoming Beta 2

≪ Previous: AR/VR Tools and Tech Meetup, Austin Sept 2017: Apple* ARKit and Google* ARCore

Over time, Intel and other processor vendors add more and more instructions to the processors that power our phones, tablets, laptops, workstations, servers, and other computing devices. Adding instructions for particular compute tasks is a good way to gain processing efficiency without the need to make processor pipelines more complex or drive clock frequencies into unattainable territories. If a new instruction can replace a few old ones, you can easily get many times better performance on particular tasks. New instructions also provide entirely new functionality – for example, Intel® Software Guard Extensions (SGX) and Intel® Control-Flow Enforcement (CET).

A good question is how quickly and easily new instructions added to the Instruction-Set Architecture (ISA) reach users. Considering that our operating systems and programs are generally backwards-compatible, and run on all kind of hardware, can they actually take advantage of new instructions? In the old days, you did this by recompiling your software for a new architecture and adding checks to avoid running on an old machine where the software would break (“sorry, this program is not supported on this hardware”).

I played around with my favorite virtual platform tool, Wind River® Simics®, to find out to what extent software today is capable of making use of newer instructions, while remaining compatible with older hardware.

The Experimental Setup

To investigate whether software can adapt dynamically to newer hardware, I took our “generic PC” platform in Simics and ran it with two different processor models. One model was of an Intel® Core™ i7 first-generation processor (codename “Nehalem”), and the other model was of an Intel Core i7 sixth-generation processor (codename “Skylake”).

The first-gen processor was launched in late 2008 – I actually got myself a PC with an Intel Core i7-970 processor back in early 2009—a wonderful machine fitted with three memory channels so I could cram 9GB of RAM in it. The sixth-gen processor was launched in mid-2015, roughly seven years later.

I booted three different Linux* operating system (OS) setups on both hardware configurations:

Ubuntu* 16.04 with kernel 4.4, released in early 2016
Yocto* 1.8 with kernel 3.14, released in early 2014
Busybox* with kernel 2.6.39, released in 2011

The same disk image was used on old and new hardware – nothing was changed in the software stack between the runs. Only the virtual hardware configuration differed. I expected the newer Linux operating systems would make use of newer instructions on the newer hardware. Spoiler: they did, but sometimes in surprising ways.

On each setup, Simics instrumentation counted the dynamic frequency of instructions (executed instruction count), grouped by opcode (assembly instruction name as used by Simics processors). The Simics instrumentation is non-intrusive and does not introduce any changes to the execution of the target system. Since it operates at the hardware level, we can instrument right through the BIOS and OS kernel boot. The software running on the target system cannot tell the difference between instrumented and non-instrumented execution. Each setup was run for 60 virtual seconds, which was enough to get through BIOS and OS boot, and to a desktop or user prompt in all cases. At the end of each run, the 100 most common instructions were listed along with their frequency, and the data was exported to Excel for further processing and analysis.

Investigating the Nature of the Hardware

The underlying assumption of this work is that software stacks can dynamically adapt the code that runs, based on the nature of the hardware. Thus, a single binary setup will potentially use different instructions on different hardware platforms.

The key to such dynamic adaptation is to detect the nature of the hardware that the software is running on. Back when processor releases were few and far between, software might check if you had an Intel 80386 or 80486 processor, or a Motorola* 68020 or 68030 and adapt accordingly. Processors came out every few years, and the variety was limited. Today, there is more diversity, especially considering the huge installed base of a large variety of systems. To deal with this, IA processors have the CPUID instruction. CPUID is a system in its own right, where numerous aspects of the hardware can be queried.

You have probably seen information from the CPUID instruction without thinking about the source; every time a program tells you the type of your processor, it is based on CPUID output. For example, the Microsoft* Windows* 8.1 task manager shows the processor type and some of its characteristics – all of which it gets from CPUID:

Dynamic ISA Windows PC Infographic with Caption

On Linux, doing “cat /proc/cpuinfo” will show CPUID information for a much rawer insight into the processor, including flags showing the available processor features and instruction sets. Each instruction set addition gets its own flag or flags that software can use to determine the availability of features. For example, here is what you see with a 4th generation Core i5 processor:

CPUID will tell the software about the various instruction set extensions and hardware features available, but how does software actually use the flags to choose different code depending on the host? It seems unreasonable to pepper code with conditionals like “if instruction set X is available, then do this…” The code has to avoid checking the same information over and over again, since it is not going to change during a run.

In the Linux kernel, the most common way to do this is to set up function pointers for different implementations of the same function, based on the available instruction sets. A good example is found in arch/x86/crypto/sha1_ssse3_glue.c in the Linux kernel 4.13.5 (as indexed by http://elixir.free-electrons.com/linux/v4.13.5/source):

These functions check if the boot processor supports a particular instruction set, and they register the appropriate hashing functions accordingly. They are called in order of priority, to make sure the most efficient solution is used. The best solution for this particular case is apparently to have a processor that supports specialized SHA instructions, but if that is not available, the kernel falls back to AVX or SSE.

With that in mind, it is time to run the code and see which instructions get used.

Results

The graph below summarizes the results from six different runs (two types of processor cores each with three different operating system variants). It shows all instructions that occur more than 1% in any of the runs. “v1” means the software stack was run on a 1st generation Core i7, and “v6” means it was run on a 6th generation Core i7.

The first conclusion is that most instructions are not particularly new, but rather the basics harkening back to the Intel 8086: moves, compares, jumps, and adds. Newer instructions are marked with the instruction set extension that they belong to, and there are only six of them in the top 28 instructions shown in the graph.

It is clear that there is a lot variety between the software stacks, in addition to variation across processor generations. For example, the old Busybox setup uses the LEAVE instruction, which the others generally do not, and it uses far fewer POPs. However, that does not address whether software stacks take advantage of new instructions when they are available. The most interesting variation is between processor generations for the same software stack.

Different runs could do different things. In this case, even though all runs are Linux boots, there are variations in the kernel configuration and the contents of the root file system. Different distributions and kernel versions will be built with different compiler versions and different compiler settings. Thus, the executable code generation for the same source code might be different. There are many ways to generate machine code to achieve the same effects, and the way that is done changes over time, and it changes with different target machine optimization settings in the compiler.

We do see some of that in the data: the Yocto setup is unique in using the ADCX, MULX, and ADOX instructions (from the ADX and BMI2 sets of instructions). This also shows the speed at which new instructions can get into software - these instructions were added in the fifth generation of Intel Core processors, which were released about the same time as that Linux kernel. When the processors came on the market, the software support was already there. Indeed, specs for new instructions usually come out well in advance of the hardware implementation (see an overview in this paper), and thus software support can be added early (quite often tested on virtual platforms modeling future hardware variants, so when the hardware comes out, it just works).

However, the newer Ubuntu 16.04 setup does not use the ADX and BMI2 instructions, which indicates that it was built in a different way. It could be the compiler version used, compiler flags, or kernel flags, or the particular packages present on the disk image.

Control Transfers

Another thing I checked was how common control transfer instructions were. The classic rule-of-thumb from Hennessy and Patterson was that one instruction in six was a jump. However, it seems more common in the Linux code base I tried here—about one instruction in five was a control transfer of some kind. But for the Yocto stacks, it was more like one in six. Once again, there was more variation than one might expect.

Vector Instructions

When talking about new instruction sets, the most well-known category is probably single-instruction multiple-data (SIMD) or vector instructions. SIMD instructions have been around since Intel released the MMX instruction set with the Intel Pentium® Processor with MMX in 1997. Today, their presence is taken for granted. An MMX instruction almost made it into the “most popular” instructions graph presented above – just below the threshold there was the PXOR instruction. After MMX came Intel® Streaming SIMD Extensions (SSE) in various generations, and most recently Intel® Advanced Vector Extensions (AVX), AVX2, and AVX512.

Since I did my investigation on operating system boots, I would not expect much usage of vector instructions. However, some five to six percent of the executed instructions were vector instructions. I took a look at which instruction set extensions they belonged to. Grouping the instructions by the particular instruction set variant, I got the following data:

The first thing to note is that the Busybox build hardly uses vector instructions. The next thing to note is that as we go from v1 to v6 processors, the use of older instruction sets go down, and the use of newer ones go up. In particular, there is a move toward AVX from the older SSE instruction sets. The 6th generation core i7 supports AVX2, but it is not being used by these software stacks at all.

Simics Technology

As stated earlier, doing this in the Simics virtual platform was easy. Simics trivially accesses all instructions executed, across all processors in the target system (I used dual-core setups for the experiment, but it turned out the second core did nothing during the boot and at OS idle). The boots were all fully automated, including selecting boot devices and logging into the target system at the end of the boot, so there was no manual intervention.

I ran each test only once, as running it again would give the exact same results (since we are looking at a repeatable scenario, starting from the same point each time). Everything is repeatable unless intentionally changed, including aspects like the real-time clock setting for the target system (it is a parameter to the simulation and not taken from the outside world).

Summary

It was enlightening to see how software stacks adapt to use newer instructions in newer processors. Today, we have software that is adaptive and will behave differently depending on the hardware it is running on—without changing any binaries—the adaptiveness is built right in. In all the cases I tried here, I used the same software stack for two different types of target systems, and saw them use different instructions depending on what was available on each target system (except for the case when the software stack was so old that it did not know about the features of the newer hardware). The study is an example of the kind of data collection that is easy to do in a simulator, but tricky to carry out in hardware.

↧

MeshCentral2 - Upcoming Beta 2

October 17, 2017, 9:41 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Black Belt Software Developers, Intel® Software Innovators, & Intel® Student Ambassadors: October 2017

≪ Previous: Question: Does Software Actually Use New Instruction Sets?

Just a quick note to mention that likely within the next week or so, I will be posting MeshCentral2 Beta 2. The latest version of the web based remote management web site. I have not posted an update in a while because Beta 2 has some core changes that make it incompatible with Beta 1 and so, servers and agents will need to be completely re-installed when the new version comes out. In order not to break any existing Beta 1 server right now until the Beta 2, I am holding off on releasing any updates on NPM. However, you can see the latest code changes on Github.

As you will see, there are some security improvements that have been made and so, the way certificates are created and some of the data in the database are stored completely differently. Watch for an update about this within the next week or so.

Thanks,
Ylian
meshcommander.com

↧

Intel® Black Belt Software Developers, Intel® Software Innovators, & Intel® Student Ambassadors: October 2017

October 19, 2017, 12:59 pm

Latest and popular articles on Intel Technologies

≫ Next: Detecting Video Memory Budget with Dynamic Video Memory Technology* (DVMT*)

≪ Previous: MeshCentral2 - Upcoming Beta 2

Intel® Developers and Innovators were busy over the last month! Here’s an update on what the Intel® Software Innovators, Intel® Black Belt Software Developers, and Intel® Student Ambassadors were up to around the globe.

BLACK BELTS

Abhishek Nandy spoke about the basics of AI at AI 101 in Kolkata and Birbhum. He also wrote an article on The World of AI and Deep Learning, and an article on Setting up the New Intel DLSDK Tools Beta. Abhishek also presented at several Intel® IoT Commercial workshops across India and at the Intel® Nervana™ AI Academy seminar in Durgapur.

Gaston Hillar provided a private training first in Mendoza, Argentina and then in Santiago, Chile with 60 developers on the MQTT Protocols on Intel® Joule™ platform. Javier Caceres published an article reviewing artificial intelligence, machine learning, and deep learning documentation for new entrants to these topics.

Martin Foertsch& Thomas Endres showcased their Avatar telepresence system using the Nao* Robot, Oculus* Rift, Intel® RealSense™ camera, and Intel® IoT Gateway as well as Genuino 101* technology at the Javazone Conference. Marco Dal Pino and Massimo Bonanni supported the Intel® IoT Commercial Gateway workshop in Milan.

INNOVATORS

Asia Pacific

BEKRAF Day Surabaya: Adrianus Yoza Aprilio talked about updates from Google* I/O including Android*, Android* Things, and Tensorflow and also gave a workshop on Smartcity. Frida Dwi Iswantoro shared how to start game development and created a workshop to build a game prototype.

Frida Dwi Iswantoro held a workshop on virtual reality at the Camp 0274, a 2-day game developer camp. Adam Ardisasmita gave a keynote about VR, workshop how to create a VR game, porting into the Oculus* Rift and also Google* Daydream at IPB IT Today and Google* Developer Group. Achal Shah built a mnist classifer and deploy as a web application. At the Chatbot Roadshow in Medan and Jakarta, Adrianus Yoza Aprilio gave the introduction and spoke on how the intelligence works behind smart chatbots. At the Intel® Commercial IoT Workshop, Avirup Basu focused on commercial and industrial Internet of Things.

As a Google* Women TechMakers Kolkata Chapter Lead, Manisha Biswas, was a speaker and started with basics of machine learning followed with a working demo on TensorFlow using the Intel® AI Cluster at the Technology Awareness Programme organized by the Kolkata section of IEEE.

Mythili Vutukuru published and presented a paper on her libVNF project to develop high performance scalable network functions using a DPDK based network stack. Pablo Farias Navarro’sIntro to VR Development course which covers the development of a simple VR game with Unity had 56 new enrollments and his Virtual Reality Mini-Degree course that covers the development for 15 VR games with Unity had 94 new enrollments.

Pooja Baraskar gave a webinar on IoT Solutions on Microsoft* Azure and also hosted the Intel® Commercial IoT Meetup in Chennai where they introduced gateways to developers. Prajyot Mainkar talked about the Intel® Edison platform at the September Monthly Meetup in Panaji. Sanju Mathew is mentoring 15-year olds through a Neural Networks and AI program that uses 7^th grade mathematics to teach the concepts of using relationships between numbers to expose how the concept of artificial neural networks were formed.

South America

Paulo Gurgel Pinheiro, the CEO of HOOBOX Robotics, was selected to represent Brazil at the University Startup World Cup 2017 in Copenhagen, Denmark. While there, Paulo demoed Wheelie at the High Tech Summit. They were one of 8 finalists selected from 20 countries to participate at the ‘Tesla Pitch Copenhagen’ event. Each finalist got the chance to ride inside a Tesla with Joo Runge, the Head of Investor Relations of HippoCorn, and for the 10-minute ride the finalist had the chance to pitch their ideas to the investor.

At the Hyper VR Conference, Pedro Kayatt hosted a demo area with an HTC* Vive showing how virtual reality can expand from games to industrial training. Pedro showed how VR can be used in industry training at a full day event at Votorantim Demo Day. At EducaSao 360 Pedro had a booth where they showed how VR can make a revolution on the educational field, showcasing their Dinos do Brasil as well as several new projects in this line such as their “Future Lab”. At Semana de Biologia na UFPI, the biggest federal university in the region, Pedro was asked to talk about his Dinos do Brasil project and how we can use technology to aid the spread of paleontologists’ studies.

GDG DevFest Maceai: Nelson Glauber gave a presentation explaining how to get started with Kotlin for Android development and the main differences of Kotlin and Java. Suelen Goularte Carvalho spoke about Android* Instant Apps and Ubiratan Soares gave a talk about testing automation practices for Android apps, using a pragmatic approach guided by a well-suited architecture.

United States

Anthony Chow presented a demo at the Out of the Box Developer meetup in Santa Clara on gPRC framework to support application to utilize the Cache Allocation Technology of Intel’s RDT.Chris Matthieu introduced his Computes IoT supercomputer project at the Phoenix Mobile Festival and demonstrated how an Arduino 101* could be controlled via a Raspberry* Pi while working on other machine learning computations.

Harsh Verma gave a lightning talk and demo on IoT and pedestrian safety at USDOT workshop. He also organized a joint talk on “What is Data Science and How is it Changing the Workplace” at the ACM Meetup in Sacramento. Harsh is also mentoring at the Sacramento New Technology School and provided a critique on the IoT sensors used and how to improve their applications in robotics.

Justin Lassen worked with the Intel® Software Video team to create the first interview episode of Innovators of Tomorrow. Kosta Popov introduced his Cappasity project at TechCrunch Disrupt SF. Macy Kuang spoke about VR design at GDG Berlin and also hosted the AndroidTO Conference. Nicolas St-Pierre spoke on Pay-as-you-Protect Network Protection: How On-Demand Virtualized Infrastructure Can Enhance Network Security at a joint webinar with Intel, Sandvine and Heavy Reading. As part of the Intel® Commercial IoT Workshops, Paul Langdon built a Gateway using 3 different voice services (Amazon*Alexa SDK, Google* Assistant SDK, and Mycroft) on an Intel® NUC.

Peter Ma posted his Doctor Hazel project on Mesh, it was built out at the TechCrunch Hackathon as an AI that detects skin cancer using Intel Movidius NCS. Received media coverage of the Doctor Hazel project by Mobile Health News and TechCrunch as well. At the O’Reilly AI Conference SanFran DevJam, Peter gave a demo on Joule, RealSense, Walabot, and Intel® Movidius™ Neural Compute Stick (NCS), with a training and basic understanding of AI to everyone around Vehicle Rear Vision. At the Strata Data Conference Peter gave a basic demonstration of how he uses Intel® Xeon Phi™ processor to train the AI and use NCS to display those in real time, showcasing both Vehicle Rear Vision and Doctor Hazel projects.

Rosemarie Day continued to work on the Home Fingerprint project by starting to incorporate cameras to identify when someone has walked into a room, allowing for data to be collected for each person as well as knowing when someone is in the room or not. At the West Hartford Toastmasters meetup, Rose introduced the group to data mining their social media, and at the Commercial IoT Meetup they discussed the Intel® IoT Gateway Technology with voice recognition and IoT automation using Amazon’s* Alexa skills, Google, and MyCroft ,all on an Intel® NUC or Raspberry* Pi.

Europe

Alejandro Alcade created Hugo Similar posts which computes similar posts for his blog using Sklearn Kmeans. Fabrizio Lapiello did an inspirational session on creating digital products for developers and startups at 012 Academy. He also gave a talk on cloud computing and IoT at python.pizza meetup in Napoli, explaining different processes and applying them to the creation of pizza; such as temperature control, humidity, water ph check and more. Johnny Chan wrote articles on how to setup Pytorch and Tensorflow with Jupiter Notebook on the Intel® Nervana™ AI Academy cluster for Deep Learning.

Lorenzo Karavania helped to build something focused on benefiting local community, improving the living environment using Arduino 101* and the Intel® IoT Developer Kit at ACM WomENcourage. Marco Spaziani Brunella gave a presentation on networking at the Intel NDZ in San Jose. Matteo Valoriani gave a demo of a mixed reality experience where users can interact with fishes, turtles, and dolphins at Salone Nautica as well as supported the Intel® Commercial IoT Alliance workshop in Milan.

Michel Schloh gave an interview with a journalist of BTCManager answering questions in regards to hardware acceleration of cryptographic signatures, in particular of relevance to cryptocurrency users. Especially related to a hardware design project he’s leading which relates to Intel designs like TinyTILE, integrating Intel® Curie™ platform, which in turn integrates Quark SE, Bosch, and Nordic Semiconductor circuit logic.

Michele Tameni gave a demo on the Intel® IoT Gateway and Genuino 101* at the Intel® Commercial IoT Alliance Workshop. Roberto Diaz Morales research paper, LIBIRWLS: A parallel IRWLS library for full and budgeted SVMs, has been accepted for publication in the international journal “Knowledge-Based Systems”. He also gave an overview of device and cross-device tracking using machine learning detection at the Barcelona TensorFlow Meetup. Salvino Fidacaro spoke at GDG Europe Extended Messina. Vu Pham spoke about AI at the Berlin Machine Learning Meetup. Zayen Chagra posted his Eko System project which is a collection of IoT based products that monitor an environment so that you can spend your time doing other things knowing that the environment is safe and maintained.

China

Hongbo Xiao spoke at both IN Time & IoT AI as well as Intel® AI ETE Tech Seminar. WeiHua Liu spoke at VR salon Wuhan 2017. Yaguang Wu spoke at VR solution forum in Shanghai.

STUDENT AMBASSADORS

Intel®Student Ambassador Forum and DevJam in San Francisco: Andy Roslaes Elias, Nikhil Murthy, Pallab Paul, Panuwat Janwattanapong, and Srivgnessh Pss gave presentations during the student ambassador forum, took part in a panel formed of student ambassadors and hosted a mini-poster session.

Alfred Ongere gave an interactive, hands-on session at Unravelling Artificial Intelligence. Carlos Paradis created a project called How Devs Mesh?. Chris Barsolai spoke at three university workshops: Intel® Deep Learning SDK Workshop at Taita Taveta University, Intel® AI Day at Kenyatta University, and AI Demystified 2.0 at Multimedia University. Christian Gabor had a meeting with fellow computer science students at OSU to create a machine learning club, and they devised the club objectives, discussed projects and map out a plan to develop applications for the upcoming terms. Daniel Theisges dos Santos submitted two projects: Obstacle Detection and Employee Voluntary Turnover.

David Ojika’s project, Edge Neural Network (ENN), proposes to design and build an AI software and hardware system with real-time inference capabilities for edge applications that exhibit high-speed, massive dataset characteristics wherein communicating with the cloud directly would be impractical or too expensive. Karandeep Singh Dhillon’s Cats vs. Dogs project uses a deep learning binary classification for cat and dogs using tesla k80. Kaustav Tamuly’sBeating Atari using Autoencoder-Augmented NeuroEvolution is based a bit on the popularity from OpenAI beating Deepmind’s Atari AI recently, but is focused on evolutionary/genetic algorithms from a broader learning perspective.

Kshitiz Rimal’sKa Learn project uses a deep neural network retrained on Inception v3 model using TensorFlow which classifies the Devanagari Alphabet ‘Ka’ from other images or alphabets with 90% accuracy. He also wrote a blog post on Faster AI: a blog series on the tl;dr version of Fast.ai part 1 course on Deep Learning. Kshitiz also gave a brief session on usage and importance of AI in astronomy at the National Academy of Geo & Space Science (NAGSS) workshop on Amateur Astronomy. He gave training on getting stared with deep learning at a 2-day Developer Sessions – Deep Learning Prerequisites Workshop.

Maaz Khan’sSMS Spam Classification project is a basic implementation of bag of words model by normalization of dataset, and then using Naïve Bayes classifier algorithm. Maria Kaglinskaya shared the results of her sea lion counting project and information about Intel optimized Caffe and the Student Ambassador program at the Russian Supercomputing Days conferencePeter Szinger wrote a blog post with an introduction to clustering and the K-means algorithm. Pallab Paul created an application, AI Calorie Counter, which uses machine learning to detect the amount of calories in a user’s food and subtracts that amount from the user’s food log.

Prajjwal Bhargava made Covnets, a digit recognition program using convolutional neural networks. Prajjwal also created some videos which cover the nuts and bolts of deep learning and a blog post on batch normalization. Rashik Kotwal posted his DeepMammo project which discusses CNN and its application in breast tumor classification. Rashik also created a video demo of GUI developed in Python for the ongoing research project at AMIIL. Rouzbeh Asghari Shirvani posted a project for traffic sign classification that will be used as one of the building boxes for self-driving cars. Shaury Baranwal has posted a stock price prediction project. Siddharth Nayak has written the code for a one-wheeled balancing robot as well as created the body of the robot which included manufacturing all of the parts of the chassis.

Soubhik Das is developing a medical diagnostic engine wherein a patient will be told about the steps to undertake at each stage of the disease, including precautions. Srivgnessh Pss created a Deep Learning Facebook page which has over 2000 followers online. Suprabhat Das is working on his project: Carbon Footprint Analysis for a better tomorrow, which will help individual car owners track their emissions and make adjustments to their vehicle or their drives to reduce emissions. Ujjwal Upadhyay’sIdentify Product Category project has trained a model to cover 25 categories of food and beverages in his dataset. Vaibhav Patel posted his project that identifies rice type classification using CNN. Yash Akhauri wrote a blog post updating the progress on his early innovation project Art’Em which explains the beginning phase of his artistic style transfer to virtual reality. Gilbert Kiprotich Kigen spoke at Intel AI Day at Kenyatta University, and also joined Ngesa Marvin in hosting the Intel® Deep Learning SDK Workshop in Kenya.

Want to learn more?

You can read about our innovator updates, get the full Innovator program overview, meet the innovators and learn more about innovator benefits. We also encourage you to learn more about our Black Belt Software Developer program as well as our Student Ambassador program. Also check out Developer Mesh to learn more about the various projects that our community of innovators are working on.

Interested in more information? Contact Wendy Boswell on Twitter.

↧

Detecting Video Memory Budget with Dynamic Video Memory Technology* (DVMT*)

October 18, 2017, 2:01 pm

Latest and popular articles on Intel Technologies

≫ Next: Students at the Intel® HPC Developer Conference

≪ Previous: Intel® Black Belt Software Developers, Intel® Software Innovators, & Intel® Student Ambassadors: October 2017

Many graphic-intensive applications (especially games) require a minimum amount of video memory to run correctly, or to run at all. Dynamic Video Memory Technology* (DVMT*) is a method that dynamically allocates system memory for use as graphics memory. DVMT balances 2D and/or 3D graphics and system performance. Graphics memory allocates based on system requirements and application demands (up to the configured maximum amount). When an application no longer needs memory, the dynamically allocated portion returns to the operating system for other uses.

Intel recommends that developers use the following methods to detect the video memory budget that’s available to your application.

For Windows® 10, use the QueryVideoMemoryInfo() approach.

We recommend applications detect the video memory budget using a query to the QueryVideoMemoryInfo() method. This is the most accurate method and also gives the operating system the chance to limit this to what you’re actually budgeted for: 2 GB for all 4^th generation and earlier Intel® Core™ processors, and roughly 90 percent of total memory divided by two for 5^th generation and later Intel® Core™ processors.

For pre-Windows 10, use the DXGI_ADAPTER_DESC approach:

We recommend adding the DedicatedVideoMemory to the SharedSystemMemory (50 percent of the total memory) within the DXGI_ADAPTER_DESC structure. On discrete graphics cards, DedicatedVideoMemory is the amount of onboard video RAM (VRAM). For Intel integrated graphics processing units (GPUs), this represents 128 MB DVMT pre-allocation set in BIOS. Consult other vendors for how they handle this.

The QueryVideoMemoryInfo() approach explicitly defines the Budget value as a UINT64 in the DXGI_QUERY_VIDEO_MEMORY_INFO structure. With the DXGI_ADAPTER_DESC approach, Microsoft* stores the system memory information as a SIZE_T (defined under the hood as an unsigned long) in the DXGI_ADAPTER_DESC structure. This can be problematic. Depending on how the compiler treats UINT64 and SIZE_T, and whether your app is 32 bit or 64 bit, the difference between what the video memory budget appears to be and what it actually is could be a margin upwards of 50 percent to 90 percent.

For more details from Microsoft on calculating graphics memory, see Calculating Graphics Memory.

↧

Students at the Intel® HPC Developer Conference

October 18, 2017, 2:02 pm

Latest and popular articles on Intel Technologies

≪ Previous: Detecting Video Memory Budget with Dynamic Video Memory Technology* (DVMT*)

Developers: The Next Generation

We are making an investment in the next generation of developers. Amazing individuals with diverse interests are our future. From SnapChat* to CERN,* to SONAR,* young developers are already changing the world. Creating an environment where innovation and creation are first and foremost is key to nurturing these great minds. This is why I am asking you to join me at the Intel® HPC Developer Conference happening November 11-12 in Denver, Colorado. We are excited to be able to provide information, training, networking, resources, and support for the next generation of developers.

Multiple Sessions for Learning

We are offering a wide variety of sessions and talks geared towards Parallel Programming, High Productivity Languages, Artificial Intelligence, Systems, Enterprise, Visualization Development, and more at the conference. As part of our ongoing support of the world-wide student developer community and advancement of science, Intel has partnered with CERN through CERN openlab to sponsor the Intel® Modern Code Developer Challenge. Intel's partnership with CERN openlab around modern code is part of our continued commitment to education and building the next generation of scientific coders that are using HPC, AI& IOT technologies to have a positive impact on people’s lives across the world. We will announce the winner of the challenge at the Intel booth at SC17 supercomputing conference.

Cern Students In addition, Intel has several programs geared exclusively for students. Our Intel Student Ambassador program provides students with training, technology, and networking opportunities. Several of our ambassadors will be on-site at the conference to talk about their experiences and current projects.

Don’t Miss Out

Register now for this free¹technical conference and great networking opportunity. You can catch me at the “Meet the Experts” on Saturday 5:30-7:30 p.m. to learn more about our opportunities, programs, or to share what inspires you.

About the Author

Michelle Chuaprasert is Director of Marketing in Intel’s Developer Relations Division, Software & Services Group. She holds an EE degree from Cornell University and an MBA in Marketing. Michelle’s passion is to stitch together the technical inspiration from our worldwide industry of unique individuals into stories and proof points that drive business results and make our world better.

Michelle has previous experience across rotation programs, Design Engineering, Platform Marketing, Applications Engineering, and as Tech Marketing Manager and Engineering Manager. She is the Diversity & Inclusion Champion for her Division and cherishes the innovation we all achieve by sharing our experience and unique points of view. Key philosophies include Management by Strengths and the principle that if you’re happy at work, you’ll do your best work.

1. Registration subject to Terms and Conditions stated on the registration form. One free admission per unique registration.

↧

Trending on IoT: Our Most Popular Developer Stories for October

October 19, 2017, 5:18 pm

Latest and popular articles on Intel Technologies

≫ Next: Austin Unity Group Meetup, October 2017: Unity Engine and VR

≪ Previous: Students at the Intel® HPC Developer Conference

Global IoT DevFest Returns – Bigger & Better

This two-day virtual conference brings together IoT developers of all experience levels. They share their IoT journey, teach and learn in a variety of session topics, and increase their developer skills through one-to-one mentoring opportunities.

How-to Build a Face Access Control Solution

Use this IoT reference implementation to learn to create a facial recognition application.

Intel® Computer Vision SDK Developer Guide

Learn about the new SDK from Intel for development and optimization of computer vision and image processing pipelines for system-on-chips (SoC) from Intel.

How to Authenticate Remote Devices with the DE10-Nano Kit

This beginner-level tutorial shows you how to prepare and authenticate the DE10-Nano board, and then adapt the sample code for your own use.

Intel® System Studio 2018 Beta - Invitation and Product Overview

Move from prototype to product faster using optimizing compilers, highly tuned libraries, analyzers, and debug tools, along with custom workflows and code samples.

Intel® System Studio 2018 Beta User Guide for C and C++

Take advantage of example projects to create IoT projects in C and C++ using Intel® System Studio.

Intel® System Studio 2018 Beta User Guide for Java*

Use this guide to create IoT projects in Java* using plugins for Eclipse* that allow you to connect to, update, and program on a compatible board.

Create Key and Certificate Files for Encryption and Authentication

Create a set of key files and certificates that can be used to set up encryption and authentication for MQTT-TLS and HTTP-TLS connections.

A Brief Exploration of the OSI 7 Layer Model

Learn about the traditional Open System Interconnection (OSI) 7 layer network model and how the TCP-IP family of IoT protocols work with the model.

Video: Demonstrating MRAA and UPM Examples

Walk through examples that demonstrate the capabilities of the MRAA Library.

Intel® Developer Zone experts, Intel® Software Innovators, and Intel® Black Belt Software Developers contribute hundreds of helpful articles and blog posts every month. From code samples to how-to guides, we gather the most popular software developer stories in one place each month so you don’t miss a thing. Miss last month? Read it here.

Intel IoT

↧

Austin Unity Group Meetup, October 2017: Unity Engine and VR

October 20, 2017, 3:02 pm

Latest and popular articles on Intel Technologies

≫ Next: Art’Em – Artistic Style Transfer to Virtual Reality Week 7 Update

With the Unite 2017 conference being held in Austin, Texas this year it only made sense for the Austin Unity Meetup Group to do a mixer the night before. The mixer was hosted at Capital Factory and hosted a large group of developers from international locales. Artists, developers, educators, filmmakers, researchers, storytellers – anyone and everyone interested in creating with Unity was able to gain valuable insight and inspiration at this event. It also created some connections and excitement for the actual conference that took place at the Austin Convention Center located in downtown over the next couple of days.

Tim Porter and I, Alex Porter, of Underminer Studios are both Intel® Software Innovators and we were one of the featured VR demos at the event showcasing our newest iteration of IDEGO: Virtual Engagements, Confronting Fear of Heights. IDEGO was envisioned to create a treatment option for those looking for alternatives to traditional therapy and medication for mental health. Virtual Engagements are a self-led process that use the fundamental building blocks from cognitive therapies that have been super powered by combining them with tech, gamification, and unprecedented accessibility. The process itself is a gradual progression through four levels and challenges that progress to be more realistic each stage with a safeguard to escape to a calm world allowing for engagement at the user’s pace.

The commercial availability and low cost of tech equipment now allows for access worldwide in the consumer market, so by targeting any device on all major platforms we can allow people to access treatment in private for free. Our content addresses issues such as fear and phobias to begin, pushing toward more comprehensive options including PTSD, Autism, Depression, Anxiety, and aging diseases and others geared more toward overall wellness.

The demo shown was “Confronting Fear of Heights” which sends users through 4 level progressions. We used the MSI* VR One backpack and talked about the collaboration with Intel to create the backpack PC that stops everyone in their tracks. With the backpack the user is essentially “wire free” and only tethered to themselves so they can walk more freely to explore the environment which enhances the impact on the vestibular system.

Level 1 is very cartoonish and the task is to look around, Level 2 is illustrative and the user is asked to look over the edge of the building, Level 3 is stylized and the user is on a plank between two buildings and asked to look over the edge, and lastly Level 4 in which the user is on the roof of a very realistic high rise and ultimately asked to step off of the building. The goal is to allow the user to create coping mechanisms and overcome their fears with a discernible moment that allows them to separate a real world and a virtual world and have self-awareness of their own abilities

We spoke directly to the developer attendees about Intel and Unity interfacing and pushing both gaming and non-gaming uses further. Unity and Intel are a strong collaborative force for developers. They want to promote and encourage use of the tools and the tech that they bring to the table. Unity has engaged Intel especially to push VR development and create a vibrant developer community.

Everyone that attended had a great time getting to know more about the Austin developer scene. There was a great reception to a non-gaming use for VR using Unity. The crowd was really excited to get to attend the conference in the following days and learn more about the latest updates and upcoming features from Unity.

Learn more about the Austin Unity Group:

Visit the meetup site for info on the next Austin Unity Group event.

Learn More About Underminer Studios:

Based in Austin, Texas, Underminer Studios has been in the emerging technology space for 2 years. Our passionate and skilled team is focused on changing the perspective of how technology can solve real problems. Our clients seek leading edge tech solutions with content creation, consulting, and ideation for entertainment, medical, enterprise, education, training, and other burgeoning fields. We are building the paradigms that are creating the future.

↧

Art’Em – Artistic Style Transfer to Virtual Reality Week 7 Update

October 23, 2017, 11:05 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Developer Mesh: Editor’s Picks October 2017

≪ Previous: Austin Unity Group Meetup, October 2017: Unity Engine and VR

Art’Em is an application that hopes to bring artistic style transfer to virtual reality. It aims to increment the stylization speed by using low precision networks.

In the last article, I delved into the basic proof of concept of multiplication by XNOR (Exclusive Not OR) and population count operation along with some very elementary benchmarks. The rough potential of the binarized network’s speed was realized, however its true implementation was not under any scrutiny there. Today we shall delve into how I plan on implementing the convolution function effectively, and the roadblocks that are expected in its creation.

Let us begin by looking at Intel's® Xeon Phi™ x200 processors. It supports Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions. This allows applications to pack 32 double precision and 64 single precision floating point operations per second per clock cycle within the 512-bit vectors! We do not need to look at the Fused Multiply Add (FMA) units here because we will be utilizing some intrinsics, like bitwise logical operators, population count and such. In the image below, one can clearly see how Intel Instruction Set Architectures (ISA) has evolved to vectorize operations greatly.

Source: https://www.intel.in/content/www/in/en/architecture-and-technology/avx-512-animation.html

It is important to visualize how a convolution operation is done. We can then delve into the vectorization of code and strategies to parallelize feed forward propagation.

Note above that the X operator indicates normal convolution, whereas the encircled X indicates binary convolution. Above, a and b are the coefficients that the binarized matrix is multiplied with to give the original matrix. These are full precision floating point numbers. Whereas, A_b and B_bare the binarized matrices. The color coding above represents the different coefficients associated with each kernel and weight.

This should sufficiently explain the basic idea behind convolution. While it is pretty easy to parallelize classical matrix multiply operations when lowering precision of the network, things could get a little tougher with convolution because of the overhead time cost of packing every submatrix to a data type for bitwise operations. We know from the last article that we need to take the XNOR and do a population count in order to overcome the need for a full precision dot product.

Now we must undertake some architectural changes to maximize throughput. The most important component of this network is the ability to mask submatrices to data types for which there are logical bitwise operation intrinsics in the AVX512 ISA. Int32 and Int64 are feasible ones. We will only consider square kernels here. So it is feasible to use 2ⁿx2ⁿ kernels where n is greater than 1. I say this because the smaller the number n is, the more kernel repetition can happen. When making layers with high depth we must scale n accordingly.

The Intel Xeon Phi x200 processor supports Intel AVX-512 instructions for a wide variety of operations on 32- and 64-bit integer and floating-point data. This may be extended to 8- and 16-bit integers in future Intel Xeon processors. Thus we can expect to see much better support for XNOR-nets in the future.

Now that the kernel sizes have been discussed, let us see how a loop is vectorized across different ISAs.

Intel AVX-512 shows great potential for vectorization. I hope to pack 8 submatrices in one _m512i data type, and run bitwise logical operators to speed up the convolution operation. One roadblock I am currently facing is the fact that the Intel Xeon Phi x200 processors do not support the Instruction set AVX-512 Vector Population Count Doubleword and Quadword (AVX512VPOPCNTDQ), and thus the intrinsic _mm512_popcnt_epi32 cannot be used on the Xeon Phi. While I will try to implement another popcount function, It will be an observable bottleneck till the Knights Mill or the Ice Lake processors are released. Another bottleneck would be the parallelization of the bit packing of submatrices when the network is running.

The image above depicts the basic idea behind how the dot product of every binarized submatrix from the incoming weights will be vectorized. Notice that the depth is 8, which gives us 8x8x8 = 512 values, all of which are either 1 or -1. These will be packed into _m512 data type represented by Asub and B_d. We will then take the XOR of Asub and B_d and do a population count (PC).

Here, two things must be taken into consideration. We had taken the XNOR of the matrix in our example in the last article. However, there is a XOR intrinsic directly available to us. Thus we shall be taking the Exclusive OR (XOR) of Asub and B_d, and adjust our PC (Population count) function accordingly. Also keep in mind that the PC (population count) function here is not technically the true count of the number of high bits, rather the number of low bits minus the number of high set bits. Every submatrix is being packed into a Int64 value, and bitwise operation is vectorized by the intrinsics available to us.

This works for kernel size of 8x8, however for kernel size of 4x4 we will be loading the 16 bits to an Int32 data type. The other half of the 32 bits will have to be augmented with values such that it has no effect on the final result. For this purpose, the PC (population count) function will also have to be adjusted accordingly. However, these adjustments are very simple to make. We are losing out on about half the speed up potential by virtue of only utilizing 16 bits in the data type. But as I mentioned before we may have support 8- and 16-bit integers in future Intel Xeon processors. This leaves greater potential for varying kernel sizes in the future.

Parallelization of the matrix multiplication would be done similarly on GPUs, except that there is no need to load an aligned memory space for the _m512 data type. I am optimistic about its implementation using CUDA because we have population count intrinsics. Thus the only bottleneck would be replacing submatrices with a data type for bitwise logical operations.

Now that we had a discussion on how parallelization on Intel Xeon Phi’s lineup is planned as well as delved into GPU implementation of convolution, we must think about a custom network suited for parallelization. In the training of the network on the Xeon Phi cluster, we will mostly aim for 4x4 and 8x8 kernel sizes. The training of the network will be the last phase of the project, because XNOR-nets have been shown to have top-1 accuracy of about 43.3% on a relatively simpler image recognition model: AlexNet. I am led to believe that the image semantics will be understood relatively well by such a network.

I hope that these considerations will allow me to create well vectorized code for convolutions and general matrix to matrix multiplication operations. If all goes well I should have a well optimized convolutional kernel ready in a few weeks. Then integration with backend of a framework such as Neon can be attempted. I am looking forward to working on implementing these strategies.

↧

Intel® Developer Mesh: Editor’s Picks October 2017

October 23, 2017, 12:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel At Money20/20: Designing The Future Of Secure Cyber Transactions With Hardware-based Security

≪ Previous: Art’Em – Artistic Style Transfer to Virtual Reality Week 7 Update

Every month I pick out 5 projects from Intel® Developer Mesh that I find interesting and share them with you. There is a diverse array of projects on the site, so narrowing it down to just five can be difficult! I hope you’ll take a few minutes to find out why each of these projects caught my eye and then hop over to mesh to see what other projects interest you.

Driving Buddy

As people age they get concerned about their own driving. They want to keep their independence and continue being safe drivers. Intel® Software Innovator Geeta Chauan has designed Driving Buddy which will help the elderly to drive better during challenging conditions such as night time. By using compute vision and deep learning, along with the Intel® Movidius™ Neural Compute Stick (NCS) Geeta will help to make the elderly feel young again and empower them to take control of their lifestyle. Intel’s NCS and edge devices will do the inferencing on the edge in real time and by harnessing the power of deep learning on the edge, the elderly will be able to drive safely to all the places they need to go.

Cash Recognition for Visually Impaired

As an initiative for the Nepalese visually impaired community, Intel® Student Ambassador Kshitiz Rimal has designed an app that will help them to recognize notes without any hassle. As you can imagine, it is quite difficult and stressful for visually impaired people to handle day-to-day monetary transactions since they can’t easily recognize bank notes. With this app, powered by deep learning technology, a user can hover their smartphone over a note and the application will recognize it and play audio enabling the user to hear and know the value of the note. The current version will be for Nepalese currency and will be bilingual with Nepalese and English languages as audio playback.

Eko System

Time is valuable and people can have a hard time people can have a hard time finding time to both work and live, especially with the additional effort of taking care of children or pets. Intel® Software Innovator Zayen Chagra and his team come up with the idea of a solution they call Eko System which has four categories of monitoring: Eko Aquarium, Eko Cage, Eko Cradle, and Eco Babycare. The idea is to create a collection of IoT based products that monitor an ecosystem. For example with the aquarium category the device can monitor the temperature, oxygen level, feeding, water cleanliness, etc. and alert the user if anything is off. This way the user can spend less time worrying about their fish and only step in if the sensors show there is a need. Obviously the complexity goes up as the category changes from fish, to animal, to child, but the user can be less mentally burdened when they know the temperature, air quality, humidity, gas levels of the child’s room are normal.

Facial Features and Voice Correlation Research

Is there a relationship between someone’s facial features and their voice? This is the question that Intel® Student Ambassador Alfred Ongere is researching in his latest project. The aim of the research project is to determine whether there is a direct relationship between a human being’s face and their voice. And if so, can we develop a system that could predict someone’s facial features with a degree of accuracy based on their voice. Alfred plans to use deep learning in his research to understand the connection and predictability.

Monolight

Intel® Software Innovator Michael Schloh proposes that just as holding your hands under an automatic faucet becomes second nature, semi-autonomous home lighting will too while also yielding both comfort and conserving energy. Using a 5G Sub-gigahertz connected IoT Microcontroller for ambient light sensing, you can conserve energy and keep intruders away. The device can sense ambient light levels in a home or office, identify if people are away or sleeping, and adjust the light level as necessary. With low cost for the device parts and manufacturing, low maintenance, and the additional benefits of vision health, comfortable living, and less energy waste, Monolight technology could soon become a normal and expected feature in our home and offices.

Become a Member

Interested in getting your project featured? Join us at Intel® Developer Mesh today and become a member of our amazing community of developers.

If you want to know more about Intel® Developer Mesh or the Intel® Software Innovator Program, contact Wendy Boswell.

↧

Intel At Money20/20: Designing The Future Of Secure Cyber Transactions With Hardware-based Security

October 24, 2017, 2:27 pm

Latest and popular articles on Intel Technologies

≫ Next: MeshCentral2 - Improved Crypto & ClickOnce

≪ Previous: Intel® Developer Mesh: Editor’s Picks October 2017

Cybersecurity is fundamental to digital commerce; Intel hardware is a ubiquitous ingredient that powers more secure solutions.

RickE Today is one of the most exciting times to be at the crossroads of technology and financial services. Leaders in the space are developing innovations that will positively shape and impact the global market, increasing the reach and flexibility of payments and transactions in general.

But a key challenge threatens these exciting developments: cybersecurity. It’s no secret that the future of money will depend on security, scalability, and convenience at every level — from the leaders in the financial industry to the consumers using these innovations.

To be in front of the most advanced challenges, Intel is building hardware-enabled protection capabilities directly into our silicon to help raise the security posture of every layer of the compute stack, reaching from the chip to the cloud. Integrated with partner solutions, Intel hardware helps deliver integrity, reliability, and consistency for billions of connected users and devices. This year, in collaboration with ecosystem partners, we’re focused on bringing innovative solutions to life for the financial sector.

Lenovo* and Intel deliver simpler and safer online authentication experiences: Today Lenovo announced that it is the first PC company to offer fingerprint readers integrated directly into Windows PCs for online authentication. This means that people now have a safer way of logging into websites like PayPal, Google, Stripe, Dropbox and Facebook, and it’s simpler, too — with the touch of a finger or a quick click of a button, in real-time. This is enabled through Intel® Online Connect, available on 7th and 8th Gen Intel® Core™ processors, along with integrated FIDO authenticators that support both Universal Authentication Framework (UAF) and Universal 2nd Factor (U2F).

Lenovo brings consumers a simpler, safer way to authenticate with new encrypted fingerprint readers on their latest devices. (Credit: Lenovo)

Touch-based banking with Bank of America: Bank of America announced today that it would begin implementing Intel® Online Connect technology into its online banking platform, giving customers added security when they bank online. Bank of America plans to incorporate the security feature into its online banking authentication process in 2018, and it will be the first financial services company to offer the technology to customers.

"As online and mobile banking usage continues to grow, we’re focused on implementing the latest technologies that will give our customers the best possible user experience. Biometrics can help us achieve that goal, and we’re excited to work with Intel to bring added convenience to our more than 34 million digital banking customers." -- Michelle Moore, BofA Head of Digital Banking

Bank of America will use Intel Online Connect to give consumers better authentication security when they bank online. (Credit: Intel Corporation)

Intel collaborates with ecosystem partners to bring the power of blockchain to multiple use cases: Secure Key announces that it will enable consumers to access its next-generation, blockchain-based digital identity and attribute-sharing technology via traditional web browsers. The technology will make it easier for consumers to securely and privately verify identity using Intel® SGX-enabled laptop and desktop computers.

Owners of any cryptocurrency must protect their private keys against unauthorized usage in order to safeguard their digital assets. With Ledger’s unique solution, sensitive information will be secured within an Intel® SGX enclave rather than on the application, thus mitigating possible software attacks.

Additionally, Intel is helping address the enterprise requirements of blockchain. Intel is collaborating with AlphaPoint who is using Intel® SGX to include traditionally non-liquid assets in financial services use cases and increase the security of related transactions.

Intel’s hardware-based technology is improving the privacy, scalability, and trust of blockchain technology. (Credit: Intel Corporation)

Finally, Intel and Hewlett Packard Enterprise (HPE) are collaborating to deliver HPE platforms that support enterprise-grade blockchain workloads, using Intel® SGX for enhanced privacy, security, and resilience.

At Money20/20, Intel will demonstrate these and other hardware-based innovations for the financial industry. Visit us in Booth 1663 to see them for yourself, and join us for the following sessions where we will challenge the status quo in digital identity, authentication, and artificial intelligence:

Identity is Fundamental: What You Need to Know About Identity & The Future of Money on Wednesday, 10/25, at 8:30 AM in the Titian, Venetian level 2; and
AI as a Core Technology: How Rapid Adoption of AI is Transforming Business Today on Sunday, 10/22, at 1:00 PM in the Lido Ballroom, Venetian level 3.

Visit these sites to learn more about Intel Online Connect authentication and Intel SGX application and data protection.

You are welcome to follow Rick on LinkedIn and Twitter (@RJEche) for future insights, industry best practices, and discussions.

↧

MeshCentral2 - Improved Crypto & ClickOnce

October 25, 2017, 12:00 pm

Latest and popular articles on Intel Technologies

≫ Next: MeshCommander - Firmware Loader

≪ Previous: Intel At Money20/20: Designing The Future Of Secure Cyber Transactions With Hardware-based Security

Today, MeshCentral2 is going Beta2 with many more improvements, new features and improved stability. MeshCentral2 is a light weight open source remote computer management web site. In marking this version as Beta2, it broke compatibly with Beta1 and so, everyone will need to create new user accounts, create new meshes and re-install MeshAgents. The compatibly break is going to be annoying for existing users, but was necessary to move MeshCentral2 to the latest cryptographic algorithms. With improvements in both general computing and possibly quantum computers in the years to come, it’s important that any product that will be used in the long term use strong cryptography.

Starting with MeshCentral2 Beta2, all hashing is done using SHA384 instead of SHA256. This means that all node identifiers, certificate signatures, binary update hashes, password hashing and more are all using the new longer and stronger hashing function. This has a wide ranging impact on MeshCentral2, pretty much everything in the database is now different and so, it’s best to make a clean break. In addition to hashing, certificates created by MeshCentral2 now use RSA3072 instead of RSA2048. You will notice a longer time starting the server and agent for the first time as these new stronger certificates take much longer to create. Lastly, browser cookies are now encrypted and integrity checked using AES256-GCM instead of AES128-CBC/HMAC-SHA256. Long term, these updates make today’s MeshCentral2 likely more resistance against computers of the future.

Also this week, MeshCentral2 now has Microsoft ClickOnce support for RDP, Putty and WinSCP. Using this new feature, you can under the right situation, launch a native application on your computer and connect to another computer over the Internet. MeshCentral2 relays all the traffic, even thru routers and proxies. For example, when you click on the new “RDP” link on the web site, a ClickOnce routing application is installed and launched. That routing application will act as a relay between the RDP client and MeshCentral2 that will then relay the traffic to the right agent. Take a look at the new YouTube demonstration video on this topic.

Many thanks to Bryan Roe this week that been working like crazy on the MeshAgent2, the changes are impressive and significant. MeshCentral2 is still in beta and should not be used in production environments.

Enjoy!
Ylian
http://www.meshcommander.com/meshcentral2

Microsoft ClickOnce demonstration: https://www.youtube.com/watch?v=--RCkWqJ-gI

↧

MeshCommander - Firmware Loader

October 26, 2017, 3:28 pm

Latest and popular articles on Intel Technologies

≫ Next: AI Student Ambassador Alfred Ongere: Using AI to Improve Lives

≪ Previous: MeshCentral2 - Improved Crypto & ClickOnce

MeshCommander is a web-based Intel® AMT management console that is available in many versions including as a standalone tool, as part of MicroLMS and built into MeshCentral2. However, one of the most intriguing versions of MeshCommander is the one that can be loaded directly into Intel® AMT 11.5 and higher flash storage. This version of MeshCommander allows remote hardware management of a computer with nothing else but a browser, making it super convenient for many applications. Never as it been easier to make use of Intel® AMT when you need it.

Today, I just released the new MeshCommander firmware loader that comes as a single Windows executable. You can get it on MeshCommander.com, it super easy to use and in less than a minute your Intel® AMT 11.5+ will be upgraded with a powerful management console built right into the computer. Just login using your favorite browser and start remotely managing your computer.

As if it could not already be easier, I have a YouTube video demonstration of this new tool. You can download the new tool here.

Enjoy!
Ylian

MeshCommander firmware loader is an easy to use Windows application.
It’s a single executable and in a few steps, you are done.

Using the MeshCommander firmware loader, you can replace the basic
Intel® AMT default web page with the powerful MeshCommander web application.

Even if it’s less than 60k, MeshCommander loaded into Intel® AMT packs quite a punch.
From hardware remote desktop to power control, all the basic features are present.

↧