εδώ και λίγο καιρό ένας μηχανικός της intel φτιάχνει ανεξάρτητα, έναν ειδικό decoder μέσα στον ffdshow για να εκμεταλεύεται
την τεχνολογία Intel QuickSync που βρίσκεται στα ενσωματωμένα γραφικά των επεξεργαστών SandyBridge Celeron/Pentium/i3/i5/i7.
περισσότερα εδώ:
Intel SandyBridge hardware accelerated FFDShow decoder (H264/VC1/MPEG2) (doom9)
κατεβάζετε τον νέο πειραγμένο ffdshow τον περνάτε και στην καρτέλα codecs ρυθμίζεται τον Intel QuickSync.
Anandtech: Intel Engineer Ports QuickSync Video Decoding to FFDShow &
Intel Quick Sync Technology
Οι μετρήσεις και τα αποτελέσματα από τις beta εκδόσεις είναι άκρως ενθαρρυντικά.
AVSforum: Official Sandy Bridge / LGA1155 for HTPCs Thread
την τεχνολογία Intel QuickSync που βρίσκεται στα ενσωματωμένα γραφικά των επεξεργαστών SandyBridge Celeron/Pentium/i3/i5/i7.
περισσότερα εδώ:
Intel SandyBridge hardware accelerated FFDShow decoder (H264/VC1/MPEG2) (doom9)
κατεβάζετε τον νέο πειραγμένο ffdshow τον περνάτε και στην καρτέλα codecs ρυθμίζεται τον Intel QuickSync.
Anandtech: Intel Engineer Ports QuickSync Video Decoding to FFDShow &
Intel Quick Sync Technology
An Intel engineer by the name of Eric Gur started an AVSForum thread indicating he had begun work on enabling Quick Sync support in FFDShow's video decoder. Quick Sync is typically known as Intel's hardware accelerated transcoding engine found in Sandy Bridge, however there are both encode and decode aspects to the engine. Gur's work focuses on the latter.
To access Intel's hardware video decode acceleration application developers typically turn to the DirectX Video Acceleration (DXVA) API. Sandy Bridge's hardware decode engine interfaces with DXVA and can return decoded frames not run on the x86 CPU cores. As we've lamented in the past, open source DXVA decoders haven't typically worked all that great for Sandy Bridge (or previous generation Intel GPUs, for that matter). FFDShow users have often avoided DXVA solutions as they can't be used with any custom post processing FFDShow filters.
Gur's Quick Sync filter for FFDShow gets around all of this. By accessing SNB's video decoder through Quick Sync, FFDShow gets full hardware acceleration by going through the Intel Media SDK and not through DXVA directly. It can also be used on non-Sandy Bridge systems, but, with higher CPU usage. The filter is obviously unsupported software but head on over to AVSForum if you're interested in checking it out. If you want more technical details check out the related thread on the Doom9 Forums.
Οι μετρήσεις και τα αποτελέσματα από τις beta εκδόσεις είναι άκρως ενθαρρυντικά.
AVSforum: Official Sandy Bridge / LGA1155 for HTPCs Thread
]Intel QuickSync video decoder also works with Celeron/Pentium SNB. At playback of 1080p24 AVC with Pentium G840,
|CPU usage|System power
libavcodec+madVR|40%|55W
Intel QuickSync+madVR|20%|52W
Playing back 1080i60 AVC is more interesting. A weaker SNB processor can't handle decoding and deinterlacing simultaneously with libavcodec, but it can with Intel QuickSync.
- Celeron G530/Pentium G620/Pentium G840 SNB processor
- ASRock Z68 Pro3-M
- DDR3-2133 2 x 2GB (@2133MHz)
- Intel HD Graphics
- MPC HomeCinema, LAV Source/Splitter, ffdshow Video Decoder (libavcodec/Intel QuckSync, yadif on), madVR (softcubic 100); LAV Audio Decoder, ReClock (media adaptiation + WASAPI exclusive mode)
- La Traviata (2010) [2 min HD i video MKV].mkv (1080i60, AVC, DTS-HD MA)
libavcodec
| Celeron G530 2.4GHz| Pentium G620 2.6GHz| Pentium G840 2.8GHz
Dropped frames | 4129 | 910 | 0 CPU usage (average) | 90% | 90% |87%
GPU usage (average) |44%|84%| 84%
Rendering time (average) |16.37 ms|14.10 ms|13.62 ms
Intel QuickSync
| Celeron G530 2.4GHz| Pentium G620 2.6GHz| Pentium G840 2.8GHz
Dropped frames | 2 | 0 | 0 CPU usage (average) | 69% | 61% |61%
GPU usage (average) |93%|86%| 86%
Rendering time (average) |14.96 ms|13.81 ms|13.78 ms
Unfortunately, 6 EUs are not enough even for Bicubic 50 in luma upsampling (rendering time ~18ms for SD video-based contents [rendering time must be less than 1/59.94 s = 16.68ms]). Celeron G530/Pentium G620 HD Graphics (+ DDR3-2133) is good enough for every kind of contents with Intel QuickSync + madVR Low Quality (bilinear/bilinear/bilinear). I still prefer madVR LQ to EVR because of smoother playback.
Update
madVR in medium quality settings (Bilinear/Bicubic 50/Bicubic 50) is also possible by overclocking GPU slightly. I observed that the average rending time at various SD video-based contents playback with the default GPU clock 1100MHz is at most 18ms. To reduce it to 16.68ms, overclocking GPU to 18/16.68 x 1100MHz = 1187MHz should be enough.
Details:Celeron G530 under madVR (medium quality)
If you are interested in using madVR with Celeron/Pentium + Intel HD Graphics or Core i3-2100/2120/2130/Core i5-2300-2500 + Intel HD Graphics 2000, here is a successful configuration.
- Processor: Any SNB processor with Intel HD Graphics (every Celeron/Pentium SNB) except for G440 single core, or Intel HD Graphics 2000 (Core i3-2100/2120/2130/Core i5-2300-2500). In the test below I used Celeron G530 2C/2T 2.4GHz, the lowest-end dual-core SNB processor.
- Memory: DDR3-2133 2 x 2GB @2133MHz. For example, G.SKILL F3-17000CL9D-4GBXL, $55.
- Z68 chipset mb. For example, ASRock Z68M/USB3, $95. H61/H67 is no good because it supports only up to DDR3-1333 SDRAM.
- Intel HD Graphics @1350MHz. The default clock is 1100MHz. GPU of every SNB processor should be able to run at this clock without problem.
- Player: Any that supports madVR. I used MPC HomeCinema.
- Video decoder: Intel QuickSync. Right now this is part of ffdshow. If you use Pentium G840 or higher, you can also use libavcodec.
- Deinterlacer: yadif (frame doubling) in ffdshow for interlaced contents. In future Intel QuickSync may support GPU's hardware deinterlacer.
- Video renderer: madVR, medium quality (Bilinear/Bicubic 50/Bicubic 50), in full screen exclusive mode. For high quality settings, you will need Intel HD Graphics 3000 such as Core i3-2105/2125.
I used the following test clips:
- SD film: Ratatouille (2007) [2 min SD film MKV].mkv (480i60, MPEG-2, AC3)
- SD video: Die Zauberflote (2003) [2 min SD video MKV].mkv (480i60, MPEG-2, AC3)
- HD film: Iron Man (2008) [2 min HD film MKV].mkv (1080p60, AVC, TrueHD)
- HD i video: La Traviata (2010) [2 min HD i video MKV].mkv (1080i60, AVC, DTS-HD MA)
- HD p video: La Traviata (2010) [2 min HD p video MKV].mkv (1080p60, AVC, DTS-HD MA)
To be clear,
|Origin|Format|Output to renderer|Frame interval
SD film|film-based|480i60|480p24|41.708 ms
SD video|video-based|480i60|480p60|16.683 ms
HD film|film-based|1080p24/1080i60(broadcast)|1080p24|41.708 ms
HD i(nteralced) video|video-based|1080i60|1080p60|16.683 ms
HD p(rogressive) video|video-based|1080p60|1080p60|16.683 ms
These five are the major video formats found in NTSC countries (well, except for 720p60 in broadcast, which is much easier to play back than 1080p60).
Results (with Celeron G530)
| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 0 | 0 | 0 | 0 CPU usage (average) |11%|21%|23%|74%|55%
GPU usage (average) |54%|88%|25%|60%|59%
Rendering time (average) |20.93 ms|13.90 ms|8.87 ms|8.88 ms|8.90 ms
Why Intel QuickSync video decoder?
With libavcodec (ffdshow's default decoder), I got:
| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 0 | 0 | 4017 | 3715 CPU usage (average) |11%|18%|50%| 93% | 97% GPU usage (average) |55%|88%|25%|30%|31%
Rendering time (average) |20.88 ms|13.89 ms|8.61 ms|9.46 ms|13.56 ms
There are lots of dropped frames at HD i/p video playback. The bottleneck is the weak CPU that can't handle AVC decode (libavcodec) and deinterlacing (yadif) simultaneously. Intel QuickSync supports hardware AVC/VC-1/MPEG-2 decode under madVR, hence offloads CPU. If you use Pentium G840 or higher, there should be no such problem with libavcodec.
Why overclock GPU?
With the default GPU clock 1100MHz, I got:
| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 687 | 0 | 0 | 0 CPU usage (average) |13%|21%|25%|70%|54%
GPU usage (average) |54%| 96% |25%|60%|59%
Rendering time (average) |20.85 ms| 17.40 ms |9.13 ms|8.96 ms|8.89 ms
GPU struggles with upsampling SD at the rate of 60 fps. To reduce 17.40 ms to, say, 14.00 ms, safely below the threshold value of 16.68 ms, it needs to be overclocked by the factor of 17.40/14.00 = 1.24, resulting in 1367 MHz. Every SNB processor's GPU should be able to run @1350MHz with no problem. (Note that 1350MHz is the default clock of Core i7-2600K, the highest SNB processor.)
Why DDR3-2133?
With DDR3-1066, all the other settings remaining the same as the first one, I got:
| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 627 | 0 | 2004 | 0 CPU usage (average) |14%|25%|29%| 91% |76%
GPU usage (average) |59%| 95% |27%|59%|65%
Rendering time (average) |21.74 ms| 16.41 ms |9.98 ms|12.17 ms|10.13 ms
With DDR3-1600 (supported only by Z68 chipset), all the other settings remaining the same as the first one, I got:
| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 162 | 0 | 0 | 0 CPU usage (average) |12%|23%|28%|77%|61%
GPU usage (average) |55%| 94% |26%|64%|62%
Rendering time (average) |21.17 ms| 15.52 ms |9.57 ms|9.71 ms|9.66 ms
Without enough memory bandwidth, CPU struggles with deinterlacing and/or GPU struggles with upsampling SD at the rate of 60 fps.