Intel QuickSync FFDShow decoder (H264/VC1/MPEG2)

Alexandros_Wallace

Supreme Member
3 June 2007
4,003
Dystopia
εδώ και λίγο καιρό ένας μηχανικός της intel φτιάχνει ανεξάρτητα, έναν ειδικό decoder μέσα στον ffdshow για να εκμεταλεύεται
την τεχνολογία Intel QuickSync που βρίσκεται στα ενσωματωμένα γραφικά των επεξεργαστών SandyBridge Celeron/Pentium/i3/i5/i7.



περισσότερα εδώ:
Intel SandyBridge hardware accelerated FFDShow decoder (H264/VC1/MPEG2) (doom9)

κατεβάζετε τον νέο πειραγμένο ffdshow τον περνάτε και στην καρτέλα codecs ρυθμίζεται τον Intel QuickSync.




Anandtech: Intel Engineer Ports QuickSync Video Decoding to FFDShow &
Intel Quick Sync Technology

videodecode1_575px.jpeg



An Intel engineer by the name of Eric Gur started an AVSForum thread indicating he had begun work on enabling Quick Sync support in FFDShow's video decoder. Quick Sync is typically known as Intel's hardware accelerated transcoding engine found in Sandy Bridge, however there are both encode and decode aspects to the engine. Gur's work focuses on the latter.

To access Intel's hardware video decode acceleration application developers typically turn to the DirectX Video Acceleration (DXVA) API. Sandy Bridge's hardware decode engine interfaces with DXVA and can return decoded frames not run on the x86 CPU cores. As we've lamented in the past, open source DXVA decoders haven't typically worked all that great for Sandy Bridge (or previous generation Intel GPUs, for that matter). FFDShow users have often avoided DXVA solutions as they can't be used with any custom post processing FFDShow filters.

Gur's Quick Sync filter for FFDShow gets around all of this. By accessing SNB's video decoder through Quick Sync, FFDShow gets full hardware acceleration by going through the Intel Media SDK and not through DXVA directly. It can also be used on non-Sandy Bridge systems, but, with higher CPU usage. The filter is obviously unsupported software but head on over to AVSForum if you're interested in checking it out. If you want more technical details check out the related thread on the Doom9 Forums.


Οι μετρήσεις και τα αποτελέσματα από τις beta εκδόσεις είναι άκρως ενθαρρυντικά.




AVSforum: Official Sandy Bridge / LGA1155 for HTPCs Thread

]Intel QuickSync video decoder also works with Celeron/Pentium SNB. At playback of 1080p24 AVC with Pentium G840,

|CPU usage|System power
libavcodec+madVR|40%|55W
Intel QuickSync+madVR|20%|52W

Playing back 1080i60 AVC is more interesting. A weaker SNB processor can't handle decoding and deinterlacing simultaneously with libavcodec, but it can with Intel QuickSync.

- Celeron G530/Pentium G620/Pentium G840 SNB processor
- ASRock Z68 Pro3-M
- DDR3-2133 2 x 2GB (@2133MHz)
- Intel HD Graphics
- MPC HomeCinema, LAV Source/Splitter, ffdshow Video Decoder (libavcodec/Intel QuckSync, yadif on), madVR (softcubic 100); LAV Audio Decoder, ReClock (media adaptiation + WASAPI exclusive mode)
- La Traviata (2010) [2 min HD i video MKV].mkv (1080i60, AVC, DTS-HD MA)

libavcodec

| Celeron G530 2.4GHz| Pentium G620 2.6GHz| Pentium G840 2.8GHz
Dropped frames | 4129 | 910 | 0 CPU usage (average) | 90% | 90% |87%
GPU usage (average) |44%|84%| 84%
Rendering time (average) |16.37 ms|14.10 ms|13.62 ms

Intel QuickSync

| Celeron G530 2.4GHz| Pentium G620 2.6GHz| Pentium G840 2.8GHz
Dropped frames | 2 | 0 | 0 CPU usage (average) | 69% | 61% |61%
GPU usage (average) |93%|86%| 86%
Rendering time (average) |14.96 ms|13.81 ms|13.78 ms

Unfortunately, 6 EUs are not enough even for Bicubic 50 in luma upsampling (rendering time ~18ms for SD video-based contents [rendering time must be less than 1/59.94 s = 16.68ms]). Celeron G530/Pentium G620 HD Graphics (+ DDR3-2133) is good enough for every kind of contents with Intel QuickSync + madVR Low Quality (bilinear/bilinear/bilinear). I still prefer madVR LQ to EVR because of smoother playback.

Update

madVR in medium quality settings (Bilinear/Bicubic 50/Bicubic 50) is also possible by overclocking GPU slightly. I observed that the average rending time at various SD video-based contents playback with the default GPU clock 1100MHz is at most 18ms. To reduce it to 16.68ms, overclocking GPU to 18/16.68 x 1100MHz = 1187MHz should be enough.

Details:Celeron G530 under madVR (medium quality)


If you are interested in using madVR with Celeron/Pentium + Intel HD Graphics or Core i3-2100/2120/2130/Core i5-2300-2500 + Intel HD Graphics 2000, here is a successful configuration.

- Processor: Any SNB processor with Intel HD Graphics (every Celeron/Pentium SNB) except for G440 single core, or Intel HD Graphics 2000 (Core i3-2100/2120/2130/Core i5-2300-2500). In the test below I used Celeron G530 2C/2T 2.4GHz, the lowest-end dual-core SNB processor.
- Memory: DDR3-2133 2 x 2GB @2133MHz. For example, G.SKILL F3-17000CL9D-4GBXL, $55.
- Z68 chipset mb. For example, ASRock Z68M/USB3, $95. H61/H67 is no good because it supports only up to DDR3-1333 SDRAM.
- Intel HD Graphics @1350MHz. The default clock is 1100MHz. GPU of every SNB processor should be able to run at this clock without problem.
- Player: Any that supports madVR. I used MPC HomeCinema.
- Video decoder: Intel QuickSync. Right now this is part of ffdshow. If you use Pentium G840 or higher, you can also use libavcodec.
- Deinterlacer: yadif (frame doubling) in ffdshow for interlaced contents. In future Intel QuickSync may support GPU's hardware deinterlacer.
- Video renderer: madVR, medium quality (Bilinear/Bicubic 50/Bicubic 50), in full screen exclusive mode. For high quality settings, you will need Intel HD Graphics 3000 such as Core i3-2105/2125.

I used the following test clips:

- SD film: Ratatouille (2007) [2 min SD film MKV].mkv (480i60, MPEG-2, AC3)
- SD video: Die Zauberflote (2003) [2 min SD video MKV].mkv (480i60, MPEG-2, AC3)
- HD film: Iron Man (2008) [2 min HD film MKV].mkv (1080p60, AVC, TrueHD)
- HD i video: La Traviata (2010) [2 min HD i video MKV].mkv (1080i60, AVC, DTS-HD MA)
- HD p video: La Traviata (2010) [2 min HD p video MKV].mkv (1080p60, AVC, DTS-HD MA)

To be clear,

|Origin|Format|Output to renderer|Frame interval
SD film|film-based|480i60|480p24|41.708 ms
SD video|video-based|480i60|480p60|16.683 ms
HD film|film-based|1080p24/1080i60(broadcast)|1080p24|41.708 ms
HD i(nteralced) video|video-based|1080i60|1080p60|16.683 ms
HD p(rogressive) video|video-based|1080p60|1080p60|16.683 ms

These five are the major video formats found in NTSC countries (well, except for 720p60 in broadcast, which is much easier to play back than 1080p60).

Results (with Celeron G530)

| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 0 | 0 | 0 | 0 CPU usage (average) |11%|21%|23%|74%|55%
GPU usage (average) |54%|88%|25%|60%|59%
Rendering time (average) |20.93 ms|13.90 ms|8.87 ms|8.88 ms|8.90 ms

Why Intel QuickSync video decoder?

With libavcodec (ffdshow's default decoder), I got:

| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 0 | 0 | 4017 | 3715 CPU usage (average) |11%|18%|50%| 93% | 97% GPU usage (average) |55%|88%|25%|30%|31%
Rendering time (average) |20.88 ms|13.89 ms|8.61 ms|9.46 ms|13.56 ms

There are lots of dropped frames at HD i/p video playback. The bottleneck is the weak CPU that can't handle AVC decode (libavcodec) and deinterlacing (yadif) simultaneously. Intel QuickSync supports hardware AVC/VC-1/MPEG-2 decode under madVR, hence offloads CPU. If you use Pentium G840 or higher, there should be no such problem with libavcodec.

Why overclock GPU?

With the default GPU clock 1100MHz, I got:

| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 687 | 0 | 0 | 0 CPU usage (average) |13%|21%|25%|70%|54%
GPU usage (average) |54%| 96% |25%|60%|59%
Rendering time (average) |20.85 ms| 17.40 ms |9.13 ms|8.96 ms|8.89 ms

GPU struggles with upsampling SD at the rate of 60 fps. To reduce 17.40 ms to, say, 14.00 ms, safely below the threshold value of 16.68 ms, it needs to be overclocked by the factor of 17.40/14.00 = 1.24, resulting in 1367 MHz. Every SNB processor's GPU should be able to run @1350MHz with no problem. (Note that 1350MHz is the default clock of Core i7-2600K, the highest SNB processor.)

Why DDR3-2133?

With DDR3-1066, all the other settings remaining the same as the first one, I got:

| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 627 | 0 | 2004 | 0 CPU usage (average) |14%|25%|29%| 91% |76%
GPU usage (average) |59%| 95% |27%|59%|65%
Rendering time (average) |21.74 ms| 16.41 ms |9.98 ms|12.17 ms|10.13 ms

With DDR3-1600 (supported only by Z68 chipset), all the other settings remaining the same as the first one, I got:

| SD film | SD video | HD film | HD i video | HD p video Dropped frames | 0 | 162 | 0 | 0 | 0 CPU usage (average) |12%|23%|28%|77%|61%
GPU usage (average) |55%| 94% |26%|64%|62%
Rendering time (average) |21.17 ms| 15.52 ms |9.57 ms|9.71 ms|9.66 ms

Without enough memory bandwidth, CPU struggles with deinterlacing and/or GPU struggles with upsampling SD at the rate of 60 fps.
 

tmjuju

Administration Team
Staff member
21 January 2007
21,629
Η Intel ακόμα δεν έχει 23.976 fps …
Οι drivers της είναι οι χειρότεροι όλων - Η κάρτα γραφικών τους είναι μεν από IHV αλλά τον driver τον αναπτύσσει τμήμα της Intel για το οποίο δεν ακούγονται και τα πιο κολακευτικά σχόλια.
Πρόσφατες edited*φήμες* ότι ακύρωσαν την υποστήριξη DX11 (11.1 έλεγαν κάπου :D) στους επερχόμενους ivy bridge – την επόμενη γενιά επεξεργαστών τους. Οι ivy bridge θα βγουν με υποστήριξη μόνο DX9 :D παρόλο που το hardware υποστηρίζει 11
Για 23.976 fps ακόμα και για την επόμενη γενιά επεξεργαστών τους … δεν έχω δει καμία ανακοίνωση.
Να με συγχωράτε, αλλά δεν τους εμπιστεύομαι μέχρι να δώ την υλοποίηση και την *ποιότητα* της υλοποίησης τους σε σχέση με τον ανταγωνισμό.
 
Last edited:

tmjuju

Administration Team
Staff member
21 January 2007
21,629
Cedar Trail (next generation atom) Ακόμα δε ξέρουμε εάν θα υποστηρίζει dx10.1 ή 9
 

Alexandros_Wallace

Supreme Member
3 June 2007
4,003
Dystopia
Εγώ έχω μπερδευτεί?
Τι γίνεται ρε παίδες?


ναι είναι περίεργο ,ακόμα και αυτοί που κάνουν τα πειράματα τους έκανε εντύπωση.
το θέμα είναι ότι ακόμα και ο celeron και ο pentium έχουν hardware acceleration.
Πάντως έχει πολύ κόσμος πάνω στην Intel και να τεστάρει (anandtech/avsforum) και να προγραματίσει (lav,coreAVC,openelec).


Να με συγχωράτε, αλλά δεν τους εμπιστεύομαι μέχρι να δώ την υλοποίηση και την *ποιότητα* της υλοποίησης τους σε σχέση με τον ανταγωνισμό.

κάτι άλλο για να μην γίνονται παρεξηγήσεις.
όταν μιλάμε για ενσωματωμένα γραφικά συγκρίνουμαι με ενσωματωμένα γραφικά.
Αυτή τη στιγμή έχουμε δυο επιλογές. ΑΜD & Intel.
όποιος θέλει διαβάζει και διαλέγει...
 
Last edited:

tmjuju

Administration Team
Staff member
21 January 2007
21,629
Πριν να διαβάσει ο καθένας… και να αποφασίσει… γιατί είναι και μπόλικη η ύλη…
Υπάρχει έστω ένας λόγος για να διαλέξει Intel graphics?
Εγώ δεν έχω βρει κανέναν τόσα χρόνια.
 

naxian

AVClub Enthusiast
30 October 2007
1,091

Alexandros_Wallace

Supreme Member
3 June 2007
4,003
Dystopia
tmjuju για το 95% των χρηστών που παίζει με vlc και καπα-λίτε-κοντεκ-πακ,
σε τηλεόράσεις και μονιτor που δεν κλειδώνουν πουθενά αλλού εκτός από τα 50/60 αντίστοιχα,
με τις μετατροπές του χρώματος στο έλεος του Θεού,
το κόστος , η κατανάλωση, o θόρυβος των iGPU είναι μεγάλο ατου.

για το πιο "enthusiastic" κοινό προφανώς έχουμε πολύ δρόμο ακόμα. Απλά εγώ βλέπω μεγαλύτερα τα "αγκάθια" της AMD παρά της Intel.
Το ότι ένας επεξεργαστής των 42Ε χωρίς κάρτα παίζει με reclock + madVr (έστω χαμηλά/μετρια settings) 1080i60 & 1080p60 δεν είναι κάτι που το προσπερνάς έτσι.

Κάποια στιγμή λογικά όταν κάποιος θα ψάχνει φθηνό htpc , θα τα συζητήσουμε ένα-ένα όλα τα θετικά και αρνητικά.
Μιλάμε για μια πλατφόρμα που βγήκε τον Φλεβάρη με πολύ προσοχή πάνω της
από όλες τις πλευρές .
 

Alexandros_Wallace

Supreme Member
3 June 2007
4,003
Dystopia
Updated Oct. 9th 2011


Known Issues:
Please read the documentation attached to the installation files for known issues and limitations.

To install the new filter:
1. An ffdshow installer is supplied.
2. Open FFDShow configuration dialog and select 'Intel Quicksync' from the codec page for the desired formats (H264/VC1/MPEG2/WMV3).

Download version 0.15 alpha:
32 bit http://www.multiupload.com/SW88AXIEAR
64 bit http://www.multiupload.com/3QH5R6N6CD
Source code http://www.multiupload.com/GQBEQ161DB



Revision highlights:
v1.15:
* Rewrote time stamp handling code. Decoder now calculates frame rate if missing, corrects for splitters reporting double frame rate for interlaced content. Handles PTS and DTS time stamps. Broken streams that alternate frequently between telecined and interlaced frames are not handles perfectly (yet!).
* Handled unsupported H264 formats by reverting to libavcodec silently within ffdshow. HW acceleration is limited to H264 simple, main and high profiles. Previous version would crash on unsupported formats.
* Added support for WMV3 (part of the VC1 HW decoder).
* Various bug fixes and better decoder error handling. As reported by various users for the 0.14 release.
* Cleaned up minor memory leaks.

Eric Gur, Processor Client Application Engineer
Intel Corp.
 

lykman

Supreme Member
29 June 2006
5,798
Aθήνα

Alexandros_Wallace

Supreme Member
3 June 2007
4,003
Dystopia
Version 0.19 is out with the following changes:
  • Added limited support for WMC full screen exclusive mode:
    - Renderer must be connected to the decoder directly - no intermediate filters.
    - Screen is connected to the Intel GPU (decoder shares device with renderer).
    - Might only work on single monitor setups.
  • Decoder has exposed its configuration options GetConfig/SetConfig - must be called before initialized.
  • Padding the image to mod16 width is now off by default. Works with vobsub.
  • Decoder can be tested for compatibly with media types via the TestMediaType method
  • FFDShow rev4126

Download from SourceForge home page:
http://sourceforge.net/p/qsdecoder
 

Alexandros_Wallace

Supreme Member
3 June 2007
4,003
Dystopia
ενα ωραίο σχόλιο του ίδιου από εδώ


DXVA is the most efficient way to play video but Microsoft's decoder isn't working a 100% of the time and since it involves a different video pipeline, adding subtitles and possibly some image processing can't work like in the SW pipeline.
One of the reasons I started this project is to provide a decoder that works as well as SW decoders but with less CPU utilization.
BTW, looking at the CPU utilization in the task manager isn't good enough as SandyBridge will modify it's operating frequency all the time. The CPUs frequency can be lowered to 800Mhz (mobile) or 1600Mhz (desktop) if there's little activity (called LFM or low frequency mode). On the other hand, it can raise the normal frequency when it needs more performance (turbo). The change in frequency also implies an automatic change in voltage which is strongly linked with power consumption.
CPUz can show you the operating frequency, there's also Intel's Turbo boost widget which does the same. Other tools like CoreTemp can also show you the power draw and core temperatures. These are all free tools BTW.
 

Alexandros_Wallace

Supreme Member
3 June 2007
4,003
Dystopia
Version 0.32 beta is out with the following changes:
* Added HW deinterlacing - can't be disabled in this build. Works on content marked as interlaced. If this version give you problems, please report and revert to the previous version.
* Added support for HW Detail and Denoise filters. Disabled in this build. Need ffdshow GUI for these features.
* FFDShow rev4453

Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page
* FFDShow-tryout site
* LAV Splitter builds