Overclock.net banner

1 - 20 of 44 Posts

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #1
I swapped GPUs from AMD (in the X58 NAS) to my GTX 580. Nouveau works fine, but after installing nvidia, it seems X won't start. I can get a prompt and login (with ALT+F2) and try to startx, but it complains "no screens found" and suggests I check the log file at var/log/Xorg.0.log (paraphrased and cut down):

Code:

Code:
…
Module "ramdac" already built-in
Nvidia Failed to initialize the Nvidia kernel module please see the systems kernal log for additional error msgs and consult the nvidia readme for details.
No devices detected
fatal error: no screens found
Anyone seen errors like this before? If anyone has any suggestions, I'd really appreciate it. I'm running out of ideas and really don't want to reinstall the OS at this point.

More Details:
I installed nvidia and lib32-nvidia-utils, and after initial loading, I get a blinking cursor. So I tried chrooting into the installation from Arch install media (I'm running Antergos, which is just a GUI installer for Arch Linux) and reinstalling the drivers, then running nvidia-xconfig, and the issue persists.

So I tried nvidia-340xx and running nvidia-xconfig again, and I can get to "OK Reached target Graphical Interface" with a blinking cursor beneath. I chrooted and did the same again, same issue. I chrooted and reinstalled xorg, same issue.

I discovered I can get a basic login with ALT+F2 and so did the diagnoses I mentioned above, trying to start x and so forth. I then tried reinstalling 340xx when logged in yet again and restarting, same issue.

I then tried disabling the grub framebuffer by uncommenting: GRUB_TERMINAL_OUTPUT=console. No change, though I didn't rebuild grub.

When I run lspci -k | grep -A 2 -E "(VGA|3D)" it lists both nouveau and nvidia as kernel modules, so I wonder if nouveau is not being blacklisted as it should? I'm running out of ideas.

I guess I'll uninstall the AMD drivers, but I've read that's usually not necessary, for what it's worth.
 

·
 
Joined
·
29,532 Posts
You need to ensure that nouveau is blacklisted. If it shows up in lsmod it's not being blacklisted correctly.
It is likely that you'll need to include the blacklist file in /etc/mkinitcpio.conf, as the module may be loaded early in the boot process.
 
  • Rep+
Reactions: Almost Heathen

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #3
Quote:
Originally Posted by gonX View Post

You need to ensure that nouveau is blacklisted. If it shows up in lsmod it's not being blacklisted correctly.
It is likely that you'll need to include the blacklist file in /etc/mkinitcpio.conf, as the module may be loaded early in the boot process.
Thank you for the suggestion (+rep).

I'll see if I can figure that out and edit this post with results.

Edit:
lsmod doesn't list nouveau, insofar as I can tell (no nouveau, mesa, f86-video-nouveau, lib32-mesa, or even nvidia). FWIW both nouveau and nvidia are listed when running mkinitcpio -M. Trying to start nvidia with modprobe nvidia-340xx yields a familiar error:

Code:

Code:
modprobe: FATAL: Module nvidia-340xx not found in directory /lib/modules/4.13.3-1-ARCH
Confirmed nouveau is blacklisted in .../modprobe.d/nvidia.conf.

Added nvidia to (empty) modules list in /etc/mkinitcpio.conf and ran mkinitcpio -p linux. No change.

I'm going to try blacklisting it anyway (when I figure it out
redface.gif
), and will update again.
 

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #4
Double post, sorry.
 

·
 
Joined
·
29,532 Posts
The closed-source driver's module name is just nvidia. What happens if you modprobe that?

If you can't modprobe it please post the output of 'pacman -Qs nvidia'
 

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #6
Quote:
Originally Posted by gonX View Post

The closed-source driver's module name is just nvidia. What happens if you modprobe that?

If you can't modprobe it please post the output of 'pacman -Qs nvidia'
Code:

Code:
sudo modprobe nvidia
ERROR: could not insert 'nvidia': Exec format error
Code:

Code:
pacman -Qs nvidia
local/lib32-libvdpau 1.1.1-2
     Nvidia VDPAU library
local/lib32-nvidia-340xx-utils 340.104-2
     NVIDIA drivers utilities (32-bit)
local/libvdpau 1.1.1+3+ga21bf7a-1
     NVIDIA VDPAU library
local/nvidia-340xx 340.104-4
     NVIDIA drivers for linux, 340xx legacy branch
local/nvidia-340xx-utils 340.104-1
     NVIDIA drivers utilities
 

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #8
Quote:
Originally Posted by ltpdttcdft View Post

Use the current driver version for the Fermi and newer cards such as the GTX580. (version 384.xx)
340.xx is the legacy version for Tesla and older.
I was getting similar errors with the latest version, and so switched to 340 (reasoning that if the 400 series was legacy, perhaps Fermi in general has issues with the latest Fermi is not legacy, my mistake).

Thank you. I'll try the latest drivers again if I can't get anywhere with this.

Edit: Switched back to the latest driver. Same behavior as before; after initial loading, black screen with blinking cursor. X errors etc. appear to be the same as with 340. Thank you for the idea nonetheless.
 

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #10
Quote:
Originally Posted by thestraw0039 View Post

Any chance you have an Intel cpu with an integrated igpu? https://wiki.archlinux.org/index.php/NVIDIA/Troubleshooting#X_fails_with_.22no_screens_found.22_with_Intel_iGPU
Nope. i7 920 has no IGPU I believe. I did the fix anyway, and reinstalled xorg. Now it no longer has the same error when starting x, but still refuses to work (the error message is odd, like the display is showing it incorrectly)

Code:

Code:
...
Log file: "/var/log/Xorg.0.log" Time: (snip)
Using config file: "/etc/X11/xorg.conf"
Using system config directory "/usr/share/X11/xorg.conf.d"
he log file at "/var/log/Xorg.0.log" for additional information. (EE) (EE) Server terminated with error (1). Closing log file. for help. (EE) Please also check t
xinit: giving up
xinit: unable to connect to X server: Connection refused
xinit: server error
Rebooted and the screen error is back.

Xorg.0.log snippet:

Code:

Code:
"[168.031] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the system's kernel log for additional error messages and consult the NVIDIA README for details"
[168.031] (EE) No devices detected.
[168.031] (EE) 
Fatal server error:
[168.031] (EE) no screens found (EE)
[168.031] (EE)
Tried to add nouveau blacklist to mkinitcpio
Not sure if this is correct procedure:
1. Sudo nano /etc/modprobe.d/modprobe.conf
created file that only reads:
blacklist nouveau

2. sudo nano etc/mkinitcpio.conf
added to files area:
/etc/modprobe.d/modprobe.conf

3. sudo mkinitcpio -p linux

No change.
 

·
Registered
Joined
·
302 Posts
Some people have had success totally deleting the file /etc/X11/xorg.conf and letting X detect all your settings.

Just to be safe you could backup the file:

sudo cp /etc/X11/xorg.conf xorg.conf_backup
sudo rm /etc/X11/xorg.conf
 

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #12
Quote:
Originally Posted by thestraw0039 View Post

Some people have had success totally deleting the file /etc/X11/xorg.conf and letting X detect all your settings.

Just to be safe you could backup the file:

sudo cp /etc/X11/xorg.conf xorg.conf_backup
sudo rm /etc/X11/xorg.conf
Thank you for the suggestion (+rep). No change, unfortunately.

dmesg and journalctl both have frequent errors that look like this:
...nvidia: disagrees about version of symbol module_layout
After seeing this old post, I have to wonder if it's related.

Also tempted to try this.

Edit:
I unplugged the monitor and switched to a TV, then reinstalled nvidia. Now I get as far as "OK reached target Graphical Interface" before the blinking cursor. No change otherwise.

Apparently the symbol error is common:
https://bbs.archlinux.org/viewtopic.php?id=231401

I've read reference to it being a linux header mismatch too. Scratching my head as far as how to proceed.
 

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #14
Quote:
Originally Posted by thestraw0039 View Post

Can you start xorg from terminal? If not I don't think that the second link will help.

I really feel like the system is trying to use something else instead of the Nvidia card. Read this and see if it helps.

https://www.x.org/wiki/FAQErrorMessages/#index8h2
No, but thank you for the suggestion. "sudo startx" produces the same error (from a previous post): (EE) Server terminated with error (1). Xorg.0.log shows "(EE) NVIDIA: Failed to initialize the NVIDIA kernel module."

That makes sense. Reading the site you linked now and will try the fixes. Thank you.
 

·
 
Joined
·
29,532 Posts
Check 'dmesg', it'll tell you more about the exec format error. It's likely that the module isn't compiled for your current kernel.

/edit: didn't see you already checked dmesg. Use nvidia-dkms and enable dkms.
 
  • Rep+
Reactions: Almost Heathen

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #16
Quote:
Originally Posted by thestraw0039 View Post

Can you start xorg from terminal? If not I don't think that the second link will help.

I really feel like the system is trying to use something else instead of the Nvidia card. Read this and see if it helps.

https://www.x.org/wiki/FAQErrorMessages/#index8h2
Nothing in the xorg FAQ helped (from that section anyway), but was worth a shot. I checked for additional PCI devices and there were none, ran xorgcfg ("command not found"), and X -configure errors out:
"Number of created screens does not match number of detected devices.
Configuration failed.
(EE) Server terminated with error (2). Closing the log file.
Quote:
Originally Posted by gonX View Post

Check 'dmesg', it'll tell you more about the exec format error. It's likely that the module isn't compiled for your current kernel.

/edit: didn't see you already checked dmesg. Use nvidia-dkms and enable dkms.
Will check dmesg again and then try dkms. Thank you.

Edit: dmesg makes no mention of the exec format error, just lots of symbol errors, as far as I can tell.

After installing nvidia-dkms I get this error:
No kernel 4.13.3-1-ARCH headers. You must install them to use DKMS!
No kernel 4.13.12-1-ARCH modules. You must install them to use DKMS!

On reboot, "dkms status" shows nothing. Further, it looks like nouveau only is loaded.

I believe I'm using updated headers on the 4.13.3-1 kernel, but never could figure out how to update the kernel because of a ZFS dependency issue. Maybe I should roll back the headers to 4.13.3-1. Not sure. DKMS is hard to wrap my head around, to be honest.

I'm thinking at this point perhaps I should backup some important files, figure out how to update zfs-linux (not the DKMS version) and the rest of the system, then reinstall the GPU drivers when I'm fully up to date. But I'm open to suggestion.

Updating ZFS seems poorly documented, and it seems cyclical dependencies won't allow it: eg: the latest ZFS works with the latest kernel but it can't be updated because my current version of ZFS relies on the current kernel. The only option I'm seeing as far as updating ZFS is telling Pacman to ignore dependencies, but if someone has any familiarity with it, please advise.
 

·
 
Joined
·
29,532 Posts
You should probably use 'zfs-dkms' instead of a precompiled kernel. Using anything but 'linux' or 'linux-lts' is generally not recommended because it causes odd issues like these that can be hard to debug.

If the latest version of the kernel (package 'linux') doesn't work well with ZFS I suggest using 'linux-lts' instead.

It's important that the version of whatever kernel package and headers are matching, otherwise you will get exec format errors.
 

·
Registered
Joined
·
996 Posts
Quote:
Updating ZFS seems poorly documented, and it seems cyclical dependencies won't allow it: eg: the latest ZFS works with the latest kernel but it can't be updated because my current version of ZFS relies on the current kernel. The only option I'm seeing as far as updating ZFS is telling Pacman to ignore dependencies, but if someone has any familiarity with it, please advise.
This issue is explained here.

So... I'm thinking you should apply the ZFS-related patch first, then install the nvidia-dkms package dependencies and then try installing it again
 

·
Linux Gamer
Joined
·
961 Posts
Discussion Starter #19
Quote:
Originally Posted by gonX View Post

You should probably use 'zfs-dkms' instead of a precompiled kernel. Using anything but 'linux' or 'linux-lts' is generally not recommended because it causes odd issues like these that can be hard to debug.

If the latest version of the kernel (package 'linux') doesn't work well with ZFS I suggest using 'linux-lts' instead.

It's important that the version of whatever kernel package and headers are matching, otherwise you will get exec format errors.
After I finish backing up, I'll switch to dkms. The git version looks upgradeable as well. Evidently 'zfs-linux' was not a great choice.

Thank you for clarifying. Still a lot I don't know about Linux.
Quote:
Originally Posted by Petrol View Post

This issue is explained here.

So... I'm thinking you should apply the ZFS-related patch first, then install the nvidia-dkms package dependencies and then try installing it again
By patch, do you mean zfs-dkms, or maybe I missed something? Thank you, I'll do that and update this post afterwards. A few more hours backing up and I should be able to try it.

Edit: backing up has taken a lot longer than expected. Won't be done until late tonight, and so may not be able to try zfs-dkms until tomorrow. I'm paranoid switching zfs versions could make some files corrupt, permanently inaccessible, etc., so need to be fully backed up.
 

·
 
Joined
·
29,532 Posts
The explained issue in Petrol's post seems to only be related when you're not using the official linux or linux-rts packages.
 
1 - 20 of 44 Posts
Top