UNSOLVED veye_raspistill hangs up
-
I am working on an allsky camera using the Raspberry Pi with the imx327e module. I initially had this working for a couple of nights without error at the beginning of February which was fantastic and spurred me on. Recently though it invariably locks up after a few minutes or hours.
The allsky code is basically running the veye_raspistill code as fast as it can which is about every second as the exposure time is about 1 second.
When it fails I get
$ps -el | grep veye
0 S 1000 30964 30961 0 90 10 - 19941 futex_ ? 00:00:00 veye_raspistill$ps -aux | grep veye
pi 30961 0.0 0.0 1940 376 ? S 05:16 0:00 sh -c nice veye_raspistill --nopreview --output image.jpg --burst --mode 3 --quality 95
pi 30964 0.0 0.2 79764 2316 ? SNl 05:16 0:00 veye_raspistill --nopreview --output image.jpg --burst --mode 3 --quality 95I have established that this semaphore in main() does complete
vcos_semaphore_wait(&callback_data.complete_semaphore);but the main() function does not return.
I cannot see why this is locking up and whether it is unique to veye_raspistill() and not with the standard raspistill().
Has anyone else seen this issue and can offer any assistance? -
Hi Roger,
This question has been accepted, and there is a customer in China who has a similar problem with you.
We will arrange to solve it, but it will take some time. Because the recurrence of this problem is relatively slow. Do you have a faster way to reproduce the problem?
Thank you for your patience. -
@veye_xumm
It may be worse on the latest updated Buster release but I am currently working from the December 2020 release and it still happens.If it helps my code is at
https://github.com/bleara/allsky
This is still in development so some of the configuration is still to be fully done but it might help you with your replication of the problem, although it might also be a distraction. It needs the camera set in config.sh as CAMERA="RPi_VEYE" as auto will not work.
Some nights it will only run for an hour or two others like last night it ran for 12 hours.
I am now using a timeout like "timeout 10s veye_raspistill ...." but last night that hung too. -
I wonder if this could be anything to do with a file write error not be handled properly? In my ideal configuration the image file is would be written directly to a mounted NFS drive but currently it is written to SD card first and then copied across, so I can't see why this would affect the function of the veye_raspistill command completion as this appears to lock on an 'futex'.
The system seems to be more reliable if no NFS drive is mapped and copied to but I have insufficient evidence to be sure.
I have also experimented using a timelapse option for veye_rasptistill but this stopped after 9999 files had been written (even using %08 in the filename) but I could be wanting to write 30K to 40K files overnight.
I was expecting at least a 64K file limit and could have understood a 32K limit.
I need to try this again for repeatability and try with the official raspberry camera too. -
Hi, Last night I had a failure just using the 32Gbyte SD card and there is 15G available I got this syslog trace.
Is there a way of sharing the whole file with you as it gets flagged as spam if I put much here?
Here is a short extract :Apr 7 04:15:58 allsky allsky.sh[661]: mmal: Splitter has 4 output port,you could use num 2,3 for extend
Apr 7 04:15:59 allsky allsky.sh[661]: Opening output file image.jpg
Apr 7 04:15:59 allsky allsky.sh[661]: mmal: camera_buffer_callback data len is 931322 file handle is ba5718
Apr 7 04:15:59 allsky allsky.sh[661]: mmal: camera_buffer_callback data len is 931074 file handle is ba5718
Apr 7 04:15:59 allsky allsky.sh[661]: mmal: Unable to write buffer to file - aborting 0 vs 931074
Apr 7 04:16:01 allsky kernel: [51999.788089] ------------[ cut here ]------------
Apr 7 04:16:01 allsky kernel: [51999.788135] WARNING: CPU: 2 PID: 8580 at drivers/firmware/raspberrypi.c:64 rpi_firmware_transaction+0xec/0x128
Apr 7 04:16:01 allsky kernel: [51999.788147] Firmware transaction timeout
Apr 7 04:16:01 allsky kernel: [51999.788158] Modules linked in: cmac bnep hci_uart btbcm bluetooth ecdh_generic ecc 8021q garp stp llc spidev brcmfmac brcmutil raspberrypi_hwmon sha256_generic i2c_mux_pinctrl i2c_mux cfg80211 bcm2835_codec(C) rfkill v4l2_mem2mem bcm2835_v4l2(C) videobuf2_vmalloc bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 snd_bcm2835(C) videobuf2_common snd_pcm snd_timer videodev snd mc vc_sm_cma(C) spi_bcm2835 i2c_bcm2835 uio_pdrv_genirq uio fixed i2c_dev ip_tables x_tables ipv6 -
camera_buffer_callback() should always post via vcos_semaphore_post() even if there is a disk write error or disk full, so in the code where "complete = 1" is commented out it shouldn't be.
-
@roger said in veye_raspistill hangs up:
vcos_semaphore_post
Thanks for your sharing, I'm really sorry about this bug. And Thanks again.
-
With this fix it seems more reliable but it did crash again last night leaving a veye_raspistill Zombie. Short stack trace:-
Apr 11 23:31:21 allsky kernel: [22053.109872] WARNING: CPU: 3 PID: 20358 at drivers/firmware/raspberrypi.c:64 rpi_firmware_transaction+0xec/0x128
Apr 11 23:31:21 allsky kernel: [22053.109884] Firmware transaction timeout
Apr 11 23:31:21 allsky kernel: [22053.109895] Modules linked in: rpcsec_gss_krb5 8021q garp stp llc spidev brcmfmac brcmutil sha256_generic i2c_mux_pinctrl i2c_mux raspberrypi_hwmon cfg80211 rfkill bcm2835_codec(C) v4l2_mem2mem bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_vmalloc snd_bcm2835(C) videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common snd_pcm snd_timer i2c_bcm2835 snd spi_bcm2835 videodev mc vc_sm_cma(C) uio_pdrv_genirq uio fixed i2c_dev ip_tables x_tables ipv6
Apr 11 23:31:21 allsky kernel: [22053.110362] CPU: 3 PID: 20358 Comm: kworker/3:0 Tainted: G C 5.10.17-v7+ #1403
Apr 11 23:31:21 allsky kernel: [22053.110369] Hardware name: BCM2835
Apr 11 23:31:21 allsky kernel: [22053.110386] Workqueue: events dbs_work_handler