Trying to get the Phi running under Ubuntu 12.04 LTS with kernel 3.5.0-26

As I reported in [1] I was able to get the Xeon Phi running on Ubuntu 11.10 with the 2.1.5889-14 mpss version and the latest available kernel for 11.10. After a complete crash of our cluster node we tried to use the chance to upgrade to 12.04 LTS with kernel 3.5 and I tried for two days to get the Phi running on this configuration.

The first thing when trying to follow my description in [1] is that the compilation in step 9

For compiling the kernel module go to /root/rpmbuild/SPECS. Here execute the command rpmbuild –bb intel-mic-kmod.spec. The sources are being compiled and a rpm-file is created. It is placed in /root/rpmbuild/RPMS/x86_64.

fails with several compile errors. This is caused due three changes in the linux kernel header files:

  1. The tty_driver structure has no field minor_num any more. It was removed because it was never used. The Intel software writes it at one position in linvcon.c but never reads this value. So I commented this writing out.
  2. The poll_table structure renamed her entry key to _key. So I changed this in all uses in the files miscif_api.c and miscif_nm.c.
  3. The SYSTEM_SUSPEND_DISK state was removed. It is only checked in micsif_nm.c and the only command that follows is a break. So I commented out those to lines.

So I extraced the driver code from the tar archiv to direction dirorg and copied these file to a new directory name dirpatch. There I applied the code changes described above. After that I created a second patch file with the diff command:
diff -uNr dirorg dirpatch > kernel35.patch
and added his execution/application to the spec file. I also changed the original spec file to remove the inherent error, that the name of the original first patch file there was wrong. The files are available in Kernel35 Patches. If you use this spec file to install the 2.1.5889-14 MPSS version on Ubuntu 11 as descripted in [1] you can skip step 7 from there and ignore the original patch (intel-mic-mpss21up1-kmodspecfile.patch) for the *.spec file.

With the kernel35.patch, the original intel-mic-mpss21up1-kmod-2.1.4982.patch  and the modified intel-mic-kmod.spec it was possible to compile the kmod kernel module and create the *.deb file following steps 8-10 from [1]. After finishing the installation process with this selfmade package, it is possible to get the status of the Phi via micctrl –status. It says ready and you can execute micctrl –initdefaults. The ssh keys are copied.

But then the disillusion. When trying to start the mpss daemon everything behaves normally and the Phi reports online. After 14 seconds (time taken from log file) the status changes to loss and the Phi tries to reboot until he reaches ready state. There he stays some time and after a while the hole system crashes. This behaviour is reproducible.

Nothing of my tries could change this situation. Warm or cold restarts or a complete new installation of the software (with repeating compiling). To uninstall the Intel stuff I generated a list with all installed packages with:
dpkg –get-selections > installed-software
and uninstalled one Intel package after the other with
dpkg -r PACKAGE
dpkg –purge PACKAGE

As last idea I tried to use the old Intel software version 2.1.4982-15. But with the same result in the end. As intermediate problem there was additionally a conflict of the intel-mic-gdb and the intel-mic-gpl package, where I had to force dpkg to install it, although they were writing to same positions.

After that I gave up the plan to use the new linux for the moment and went back to 11.10 with kernel revision 3.0.0-32, where I only needed about 30 minutes to get the Phi running, following my own step by step manual from [1].

Sources:
[1] http://www.theismus.de/HPCBlog/?p=1
[2] http://lkml.indiana.edu/hypermail/linux/kernel/1203.0/01593.html
[3] http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/02974.html