Digital filter implementation suitable for execution, together with application code, on a same processor
First Claim
1. An apparatus comprising:
- a general purpose processor having general purpose registers;
addressable memory coupled to the general purpose processor for storing input, coefficient and output vector data; and
software instructions executable on the general purpose processor and including a discrete-time filter implementation to incrementally load respective portions of the input and coefficient vector data into first and second sets of the general purpose registers and operate thereupon to accumulate the output vector data into a third set of the general purpose registers without use of a digital signal processor.
4 Assignments
0 Petitions
Accused Products
Abstract
A filter is implemented in software on a general purpose processor in a manner which reduces the number of memory accesses as compared to conventional methods. In some realizations, both application code and filter code are executed on a same general purpose processor. The filter code incrementally loads respective portions of input and coefficient vector data from addressable storage into respective registers of the processor and performs successive operations thereupon to accumulate output vector data into other respective registers of the processor. The filter code typically exhibits an execution ratio of less than two input and coefficient data loads per operation to accumulate. In some realizations, the filter code is callable from the application code and provides the application code with a signal processing facility without use of a digital signal processor (DSP).
30 Citations
36 Claims
-
1. An apparatus comprising:
-
a general purpose processor having general purpose registers;
addressable memory coupled to the general purpose processor for storing input, coefficient and output vector data; and
software instructions executable on the general purpose processor and including a discrete-time filter implementation to incrementally load respective portions of the input and coefficient vector data into first and second sets of the general purpose registers and operate thereupon to accumulate the output vector data into a third set of the general purpose registers without use of a digital signal processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
wherein memory access overhead for any single one of the incremental loads is amortized over multiple of the accumulations of the output vector data. -
3. The apparatus of claim 1,
wherein the discrete-time filter implementation exhibits an execution ratio of less than two of the incremental loads per operation to accumulate. -
4. The apparatus of claim 1,
wherein the discrete-time filter includes a Finite Impulse Response (FIR) filter. -
5. The apparatus of claim 1,
wherein the operation upon respective portions of the input and coefficient vector data in first and second sets of the general purpose registers includes execution of successive multiply-accumulate operations. -
6. The apparatus of claim 1,
wherein the operation upon respective portions of the input and coefficient vector data in first and second sets of the general purpose registers includes execution of successive multiply and accumulate operations. -
7. The apparatus of claim 1,
wherein the signal processing functions at least partially implement a modem. -
8. The apparatus of claim 1,
wherein the general purpose processor is a RISC processor. -
9. The apparatus of claim 1,
wherein the general purpose processor provides a scalar multiply-accumulate instruction. -
10. The apparatus of claim 1,
wherein only a partial portion of the input vector data is represented in the general purpose registers at any given time; - and
wherein additional portions of at least the input vector data are loaded from the addressable memory into respective ones of the general purpose registers under control of the discrete-time filter implementation.
- and
-
11. The apparatus of claim 1,
wherein the general purpose registers of the first, second and third sets are all allocated from an architecturally-defined set of registers available to a computational thread that executes the software instructions. -
12. The apparatus of claim 11,
wherein the first, second and third sets each number 8 and the architecturally-defined set of available registers number at least 24. -
13. The apparatus of claim 1,
wherein the software instructions of the discrete-time filter implementation are executable on the general purpose processor without use of an instruction that performs a multiply-accumulate operation and delay line shift in a single-cycle. -
14. The apparatus of claim 1,
wherein the software instructions of the discrete-time filter implementation are executable on the general purpose processor without use of a vector multiply-accumulate operation. -
15. The apparatus of claim 1,
wherein the software instructions of the discrete-time filter implementation are executable on the general purpose processor without use of a vector addressing facility. -
16. The apparatus of claim 1,
wherein the general purpose processor does not provide an instruction that performs a multiply-accumulate operation and delay line shift in a single-cycle. -
17. The apparatus of claim 1,
wherein the general purpose processor does not provide a vector multiply-accumulate operation. -
18. The apparatus of claim 1,
wherein the general purpose processor does not provide a vector addressing facility. -
19. The apparatus of claim 1,
wherein application software instructions are also executable on the general purpose processor. -
20. The apparatus of claim 1, embodied as one or more of:
-
a personal digital assistant;
a portable computer; and
a phone.
-
-
21. The apparatus of claim 1, wherein the software instructions implement:
-
signal processing functions based on the discrete-time filter implementation; and
application functions, wherein the software instructions that implement the signal processing functions and those that implement the application functions are both executable on the general purpose processor.
-
-
22. The apparatus of claim 21,
wherein the signal processing functions at least partially implement a modem. -
23. The apparatus of claim 21, further comprising:
a communications interface coupled between a communications medium and the general purpose processor, wherein the input vector data corresponds to a signal received via the communications interface.
-
24. The apparatus of claim 23,
wherein the communications medium includes a phone line and the communications interface includes an analog-to-digital conversion. -
25. The apparatus of claim 1, further comprising:
signal processing structures at least partially implemented by the discrete-time filter implementation, the signal processing structures including one or more of an interpolator, an echo canceller, and an equalizer.
-
26. The apparatus of claim 25,
wherein the signal processing structures at least partially implement one or more of a telephony feature, modem feature, answering machine feature, voice or data compression feature, and speech recognition system feature of the apparatus. -
27. The apparatus of claim 1, further comprising:
receive path and transmit path signal processing structures at least partially implemented by the discrete-time filter implementation.
-
-
28. A method of providing a signal processing facility in a computing device without use of a digital signal processor (DSP), the method comprising:
-
executing both application code and FIR filter code on a same processor;
the FIR filter code incrementally loading respective portions of input and coefficient vector data into respective registers of the processor and performing successive operations thereupon to accumulate output vector data into other respective registers of the processor;
the FIR filter code exhibiting an execution ratio of less than two input and coefficient data loads per operation to accumulate. - View Dependent Claims (29, 30)
wherein the operations to accumulate include successive scalar multiply-accumulate operations. -
30. The method of claim 28,
wherein L1 of the registers are allocated to the respective portions of the output vector data, L2 of the registers are allocated to the respective portions of the input vector data, and L2 of the registers are allocated to the respective portions of the coefficient vector data; -
wherein the input and coefficient vector data loads number no more than approximately
per KN scalar multiply-accumulate operations, where K is the number of elements in the output vector and N is the number of taps of the FIR filter.
-
-
-
31. An apparatus comprising:
-
a processor;
application code including instructions executable by the processor;
FIR filter code including instructions executable by the processor to load respective portions of input and coefficient vector data from addressable storage into respective registers of the processor and to operate thereupon to accumulate output vector data into other respective registers of the processor, wherein the FIR filter code is callable from the application code and provides the application code with a signal processing facility without use of a digital signal processor (DSP). - View Dependent Claims (32, 33, 34, 35, 36)
wherein memory access overhead for any single one of the loads is amortized over multiple of the accumulations of the output vector data. -
33. The apparatus of claim 31,
wherein the FIR filter code exhibits an execution ratio of less than two of the loads per operation to accumulate. -
34. The apparatus of claim 31,
wherein the operation upon the input and coefficient vector data includes execution of successive multiply-accumulate operations. -
35. The apparatus of claim 31,
wherein the signal processing facility includes a modem. -
36. The apparatus of claim 31, configured as a personal digital assistant.
-
Specification