44 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			44 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
								 | 
							
								TODO before FFTW-$2\pi$:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* figure out how to autodetect NEON at runtime
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* figure out the arm cycle counter business
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* Wisdom: make it clear that it is specific to the exact fftw version
							 | 
						||
| 
								 | 
							
								  and configuration.  Report error codes when reading wisdom.  Maybe
							 | 
						||
| 
								 | 
							
								  have multiple system wisdom files, one per version?
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* DCT/DST codelets?  which kinds?
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* investigate the addition-chain trig computation
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* I can't believe that there isn't a closed form for the omega
							 | 
						||
| 
								 | 
							
								  array in Rader.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* convolution problem type(s)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* Explore the idea of having n < 0 in tensors, possibly to mean
							 | 
						||
| 
								 | 
							
								  inverse DFT.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* better estimator: possibly, let "other" cost be coef * n, where
							 | 
						||
| 
								 | 
							
								  coef is a per-solver constant determined via some big numerical
							 | 
						||
| 
								 | 
							
								  optimization/fit.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* vector radix, multidimensional codelets
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* it may be a good idea to unify all those little loops that do
							 | 
						||
| 
								 | 
							
								  copying, (X[i], X[n-i]) <- (X[i] + X[n-i], X[i] - X[n-i]),
							 | 
						||
| 
								 | 
							
								  and multiplication of vectors by twiddle factors.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* Pruned FFTs (basically, a vecloop that skips zeros).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* Try FFTPACK-style back-and-forth (Stockham) FFT.  (We tried this a
							 | 
						||
| 
								 | 
							
								  few years ago and it was slower, but perhaps matters have changed.)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* Generate assembly directly for more processors, or maybe fork gcc.  =)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* ensure that threaded solvers generate (block_size % 4 == 0)
							 | 
						||
| 
								 | 
							
								  to allow SIMD to be used.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* memoize triggen.
							 |