大家好,我是Oleksandr Karpov,这个是我***次发表文章,希望大家喜欢。

创新互联建站服务项目包括天台网站建设、天台网站制作、天台网页制作以及天台网络营销策划等。多年来,我们专注于互联网行业,利用自身积累的技术优势、行业经验、深度合作伙伴关系等,向广大中小型企业、政府机构等提供互联网行业的解决方案,天台网站推广取得了明显的社会效益与经济效益。目前,我们服务的客户以成都为中心已经辐射到天台省份的部分城市,未来相信会继续扩大服务区域并继续获得客户的支持与信任!
在这我将为大家展示和介绍怎么样在C#和.NET下使用汇编秒速拷贝数据,在我是实例里面我用了一运用程序创建了一段视频,里面包含图片,视频和声音。
当然如果你也需要在C#使用汇编的情况,这方法给你提供一个快速简单的解决途径。
理解本文的内容, ***具备以下知识: 汇编语言, 内存对齐, c#, windows 和 .net 高级技巧(advanced techniques).
  要提高数据复制(copy-past )的速度, 我们需要将内存地址按 16 个字节对齐. 否则, 速度不会有明显的改变. (我的例子大概快 1.02 倍 )
  
 Pentium III+ (KNI/MMX2) 和 AMD Athlon (AMD EMMX) 这两种处理器都支持本文代码用到 SSE 指令集.
我用配置为: Pentium Dual-Core E5800 3.2GHz, 4GB 双通道内存的计算机做测试, 16 个字节内存对齐的速度要比标准方式快 1.5 倍, 而非内存对齐方式的速度几乎没有变化(1.02倍).
这是一个完整的演示测试,向你展示了性能测试以及如何使用。
FastMemCopy 类包含了用于快速内存拷贝逻辑的所有内容。
首先你需要创建一个默认的Windows Forms应用程序工程,在窗体上放两个按钮,一个PictureBox 控件,因为我们将用图片来测试。
声明几个字段先:
- string bitmapPath;
 - Bitmap bmp, bmp2;
 - BitmapData bmpd, bmpd2;
 - byte[] buffer = null;
 
现在创建两个方法用来处理按钮的点击事件。
标准方法如下:
- private void btnStandard_Click(object sender, EventArgs e)
 - {
 - using (OpenFileDialog ofd = new OpenFileDialog())
 - {
 - if (ofd.ShowDialog() != System.Windows.Forms.DialogResult.OK)
 - return;
 - bitmapPath = ofd.FileName;
 - }
 - //open a selected image and create an empty image with the same size
 - OpenImage();
 - //unlock for read and write images
 - UnlockBitmap();
 - //copy data from one image to another by standard method
 - CopyImage();
 - //lock images to be able to see them
 - LockBitmap();
 - //lets see what we have
 - pictureBox1.Image = bmp2;
 - }
 
快速方法如下:
- private void btnFast_Click(object sender, EventArgs e)
 - {
 - using (OpenFileDialog ofd = new OpenFileDialog())
 - {
 - if (ofd.ShowDialog() != System.Windows.Forms.DialogResult.OK)
 - return;
 - bitmapPath = ofd.FileName;
 - }
 - //open a selected image and create an empty image with the same size
 - OpenImage();
 - //unlock for read and write images
 - UnlockBitmap();
 - //copy data from one image to another with our fast method
 - FastCopyImage();
 - //lock images to be able to see them
 - LockBitmap();
 - //lets see what we have
 - pictureBox1.Image = bmp2;
 - }
 
好的,现在我们有按钮并且也有了事件处理,下面来实现打开图片、锁定、解锁它们的方法,以及标准拷贝方法:
打开一个图片:
- void OpenImage()
 - {
 - pictureBox1.Image = null;
 - buffer = null;
 - if (bmp != null)
 - {
 - bmp.Dispose();
 - bmp = null;
 - }
 - if (bmp2 != null)
 - {
 - bmp2.Dispose();
 - bmp2 = null;
 - }
 - GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
 - bmp = (Bitmap)Bitmap.FromFile(bitmapPath);
 - buffer = new byte[bmp.Width * 4 * bmp.Height];
 - bmp2 = new Bitmap(bmp.Width, bmp.Height, bmp.Width * 4, PixelFormat.Format32bppArgb,
 - Marshal.UnsafeAddrOfPinnedArrayElement(buffer, 0));
 - }
 
锁定和解锁位图:
- void UnlockBitmap()
 - {
 - bmpd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite,
 - PixelFormat.Format32bppArgb);
 - bmpd2 = bmp2.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite,
 - PixelFormat.Format32bppArgb);
 - }
 - void LockBitmap()
 - {
 - bmp.UnlockBits(bmpd);
 - bmp2.UnlockBits(bmpd2);
 - }
 
从一个图片拷贝数据到另一个图片,并且显示测得的时间:
- void CopyImage()
 - {
 - //start stopwatch
 - Stopwatch sw = new Stopwatch();
 - sw.Start();
 - //copy-past data 10 times
 - for (int i = 0; i < 10; i++)
 - {
 - System.Runtime.InteropServices.Marshal.Copy(bmpd.Scan0, buffer, 0, buffer.Length);
 - }
 - //stop stopwatch
 - sw.Stop();
 - //show measured time
 - MessageBox.Show(sw.ElapsedTicks.ToString());
 - }
 
这就是标准快速拷贝方法。其实一点也不复杂,我们使用了老牌的 System.Runtime.InteropServices.Marshal.Copy 方法。
以及又一个“中间方法(middle-method)”以用于快速拷贝逻辑:
- void FastCopyImage()
 - {
 - FastMemCopy.FastMemoryCopy(bmpd.Scan0, bmpd2.Scan0, buffer.Length);
 - }
 
现在,来实现FastMemCopy类。下面是类的声明以及我们将会在类中使用到的一些类型:
- internal static class FastMemCopy
 - {
 - [Flags]
 - private enum AllocationTypes : uint
 - {
 - Commit = 0x1000, Reserve = 0x2000,
 - Reset = 0x80000, LargePages = 0x20000000,
 - Physical = 0x400000, TopDown = 0x100000,
 - WriteWatch = 0x200000
 - }
 - [Flags]
 - private enum MemoryProtections : uint
 - {
 - Execute = 0x10, ExecuteRead = 0x20,
 - ExecuteReadWrite = 0x40, ExecuteWriteCopy = 0x80,
 - NoAccess = 0x01, ReadOnly = 0x02,
 - ReadWrite = 0x04, WriteCopy = 0x08,
 - GuartModifierflag = 0x100, NoCacheModifierflag = 0x200,
 - WriteCombineModifierflag = 0x400
 - }
 - [Flags]
 - private enum FreeTypes : uint
 - {
 - Decommit = 0x4000, Release = 0x8000
 - }
 - [UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
 - private unsafe delegate void FastMemCopyDelegate();
 - private static class NativeMethods
 - {
 - [DllImport("kernel32.dll", SetLastError = true)]
 - internal static extern IntPtr VirtualAlloc(
 - IntPtr lpAddress,
 - UIntPtr dwSize,
 - AllocationTypes flAllocationType,
 - MemoryProtections flProtect);
 - [DllImport("kernel32")]
 - [return: MarshalAs(UnmanagedType.Bool)]
 - internal static extern bool VirtualFree(
 - IntPtr lpAddress,
 - uint dwSize,
 - FreeTypes flFreeType);
 - }
 
现在声明方法本身:
- public static unsafe void FastMemoryCopy(IntPtr src, IntPtr dst, int nBytes)
 - {
 - if (IntPtr.Size == 4)
 - {
 - //we are in 32 bit mode
 - //allocate memory for our asm method
 - IntPtr p = NativeMethods.VirtualAlloc(
 - IntPtr.Zero,
 - new UIntPtr((uint)x86_FastMemCopy_New.Length),
 - AllocationTypes.Commit | AllocationTypes.Reserve,
 - MemoryProtections.ExecuteReadWrite);
 - try
 - {
 - //copy our method bytes to allocated memory
 - Marshal.Copy(x86_FastMemCopy_New, 0, p, x86_FastMemCopy_New.Length);
 - //make a delegate to our method
 - FastMemCopyDelegate _fastmemcopy =
 - (FastMemCopyDelegate)Marshal.GetDelegateForFunctionPointer(p,
 - typeof(FastMemCopyDelegate));
 - //offset to the end of our method block
 - p += x86_FastMemCopy_New.Length;
 - //store length param
 - p -= 8;
 - Marshal.Copy(BitConverter.GetBytes((long)nBytes), 0, p, 4);
 - //store destination address param
 - p -= 8;
 - Marshal.Copy(BitConverter.GetBytes((long)dst), 0, p, 4);
 - //store source address param
 - p -= 8;
 - Marshal.Copy(BitConverter.GetBytes((long)src), 0, p, 4);
 - //Start stopwatch
 - Stopwatch sw = new Stopwatch();
 - sw.Start();
 - //copy-past all data 10 times
 - for (int i = 0; i < 10; i++)
 - _fastmemcopy();
 - //stop stopwatch
 - sw.Stop();
 - //get message with measured time
 - System.Windows.Forms.MessageBox.Show(sw.ElapsedTicks.ToString());
 - }
 - catch (Exception ex)
 - {
 - //if any exception
 - System.Windows.Forms.MessageBox.Show(ex.Message);
 - }
 - finally
 - {
 - //free allocated memory
 - NativeMethods.VirtualFree(p, (uint)(x86_FastMemCopy_New.Length),
 - FreeTypes.Release);
 - GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
 - }
 - }
 - else if (IntPtr.Size == 8)
 - {
 - throw new ApplicationException("x64 is not supported yet!");
 - }
 - }
 
汇编代码被表示成带注释的字节数组:
- private static byte[] x86_FastMemCopy_New = new byte[]
 - {
 - 0x90, //nop do nothing
 - 0x60, //pushad store flag register on stack
 - 0x95, //xchg ebp, eax eax contains memory address of our method
 - 0x8B, 0xB5, 0x***, 0x01, 0x00, 0x00, //mov esi,[ebp][0000001***] get source buffer address
 - 0x89, 0xF0, //mov eax,esi
 - 0x83, 0xE0, 0x0F, //and eax,00F will check if it is 16 byte aligned
 - 0x8B, 0xBD, 0x62, 0x01, 0x00, 0x00, //mov edi,[ebp][000000162] get destination address
 - 0x89, 0xFB, //mov ebx,edi
 - 0x83, 0xE3, 0x0F, //and ebx,00F will check if it is 16 byte aligned
 - 0x8B, 0x8D, 0x6A, 0x01, 0x00, 0x00, //mov ecx,[ebp][00000016A] get number of bytes to copy
 - 0xC1, 0xE9, 0x07, //shr ecx,7 divide length by 128
 - 0x85, 0xC9, //test ecx,ecx check if zero
 - 0x0F, 0x84, 0x1C, 0x01, 0x00, 0x00, //jz 000000146 ↓ copy the rest
 - 0x0F, 0x18, 0x06, //prefetchnta [esi] pre-fetch non-temporal source data for reading
 - 0x85, 0xC0, //test eax,eax check if source address is 16 byte aligned
 - 0x0F, 0x84, 0x8B, 0x00, 0x00, 0x00, //jz 0000000C0 ↓ go to copy if aligned
 - 0x0F, 0x18, 0x86, 0x80, 0x02, 0x00, 0x00, //prefetchnta [esi][000000280] pre-fetch more source data
 - 0x0F, 0x10, 0x06, //movups xmm0,[esi] copy 16 bytes of source data
 - 0x0F, 0x10, 0x4E, 0x10, //movups xmm1,[esi][010] copy more 16 bytes
 - 0x0F, 0x10, 0x56, 0x20, //movups xmm2,[esi][020] copy more
 - 0x0F, 0x18, 0x86, 0xC0, 0x02, 0x00, 0x00, //prefetchnta [esi][0000002C0] pre-fetch more
 - 0x0F, 0x10, 0x5E, 0x30, //movups xmm3,[esi][030]
 - 0x0F, 0x10, 0x66, 0x40, //movups xmm4,[esi][040]
 - 0x0F, 0x10, 0x6E, 0x50, //movups xmm5,[esi][050]
 - 0x0F, 0x10, 0x76, 0x60, //movups xmm6,[esi][060]
 - 0x0F, 0x10, 0x7E, 0x70, //movups xmm7,[esi][070] we've copied 128 bytes of source data
 - 0x85, 0xDB, //test ebx,ebx check if destination address is 16 byte aligned
 - 0x74, 0x21, //jz 000000087 ↓ go to past if aligned
 - 0x0F, 0x11, 0x07, //movups [edi],xmm0 past first 16 bytes to non-aligned destination address
 - 0x0F, 0x11, 0x4F, 0x10, //movups [edi][010],xmm1 past more
 - 0x0F, 0x11, 0x57, 0x20, //movups [edi][020],xmm2
 - 0x0F, 0x11, 0x5F, 0x30, //movups [edi][030],xmm3
 - 0x0F, 0x11, 0x67, 0x40, //movups [edi][040],xmm4
 - 0x0F, 0x11, 0x6F, 0x50, //movups [edi][050],xmm5
 - 0x0F, 0x11, 0x77, 0x60, //movups [edi][060],xmm6
 - 0x0F, 0x11, 0x7F, 0x70, //movups [edi][070],xmm7 we've pasted 128 bytes of source data
 - 0xEB, 0x1F, //jmps 0000000A6 ↓ continue
 - 0x0F, 0x2B, 0x07, //movntps [edi],xmm0 past first 16 bytes to aligned destination address
 - 0x0F, 0x2B, 0x4F, 0x10, //movntps [edi][010],xmm1 past more
 - 0x0F, 0x2B, 0x57, 0x20, //movntps [edi][020],xmm2
 - 0x0F, 0x2B, 0x5F, 0x30, //movntps [edi][030],xmm3
 - 0x0F, 0x2B, 0x67, 0x40, //movntps [edi][040],xmm4
 - 0x0F, 0x2B, 0x6F, 0x50, //movntps [edi][050],xmm5
 - 0x0F, 0x2B, 0x77, 0x60, //movntps [edi][060],xmm6
 - 0x0F, 0x2B, 0x7F, 0x70, //movntps [edi][070],xmm7 we've pasted 128 bytes of source data
 - 0x81, 0xC6, 0x80, 0x00, 0x00, 0x00, //add esi,000000080 increment source address by 128
 - 0x81, 0xC7, 0x80, 0x00, 0x00, 0x00, //add edi,000000080 increment destination address by 128
 - 0x83, 0xE9, 0x01, //sub ecx,1 decrement counter
 - 0x0F, 0x85, 0x7A, 0xFF, 0xFF, 0xFF, //jnz 000000035 ↑ continue if not zero
 - 0xE9, 0x86, 0x00, 0x00, 0x00, //jmp 000000146 ↓ go to copy the rest of data
 - 0x0F, 0x18, 0x86, 0x80, 0x02, 0x00, 0x00, //prefetchnta [esi][000000280] pre-fetch source data
 - 0x0F, 0x28, 0x06, //movaps xmm0,[esi] copy 128 bytes from aligned source address
 - 0x0F, 0x28, 0x4E, 0x10, //movaps xmm1,[esi][010] copy more
 - 0x0F, 0x28, 0x56, 0x20, //movaps xmm2,[esi][020]
 - 0x0F, 0x18, 0x86, 0xC0, 0x02, 0x00, 0x00, //prefetchnta [esi][0000002C0] pre-fetch more data
 - 0x0F, 0x28, 0x5E, 0x30, //movaps xmm3,[esi][030]
 - 0x0F, 0x28, 0x66, 0x40, //movaps xmm4,[esi][040]
 - 0x0F, 0x28, 0x6E, 0x50, //movaps xmm5,[esi][050]
 - 0x0F, 0x28, 0x76, 0x60, //movaps xmm6,[esi][060]
 - 0x0F, 0x28, 0x7E, 0x70, //movaps xmm7,[esi][070] we've copied 128 bytes of source data
 - 0x85, 0xDB, //test ebx,ebx check if destination address is 16 byte aligned
 - 0x74, 0x21, //jz 000000112 ↓ go to past if aligned
 - 0x0F, 0x11, 0x07, //movups [edi],xmm0 past 16 bytes to non-aligned destination address
 - 0x0F, 0x11, 0x4F, 0x10, //movups [edi][010],xmm1 past more
 - 0x0F, 0x11, 0x57, 0x20, //movups [edi][020],xmm2
 - 0x0F, 0x11, 0x5F, 0x30, //movups [edi][030],xmm3
 - 0x0F, 0x11, 0x67, 0x40, //movups [edi][040],xmm4
 - 0x0F, 0x11, 0x6F, 0x50, //movups [edi][050],xmm5
 - 0x0F, 0x11, 0x77, 0x60, //movups [edi][060],xmm6
 - 0x0F, 0x11, 0x7F, 0x70, //movups [edi][070],xmm7 we've pasted 128 bytes of data
 - 0xEB, 0x1F, //jmps 000000131 ↓ continue copy-past
 - 0x0F, 0x2B, 0x07, //movntps [edi],xmm0 past 16 bytes to aligned destination address
 - 0x0F, 0x2B, 0x4F, 0x10, //movntps [edi][010],xmm1 past more
 - 0x0F, 0x2B, 0x57, 0x20, //movntps [edi][020],xmm2
 - 0x0F, 0x2B, 0x5F, 0x30, //movntps [edi][030],xmm3
 - 0x0F, 0x2B, 0x67, 0x40, //movntps [edi][040],xmm4
 - 0x0F, 0x2B, 0x6F, 0x50, //movntps [edi][050],xmm5
 - 0x0F, 0x2B, 0x77, 0x60, //movntps [edi][060],xmm6
 - 0x0F, 0x2B, 0x7F, 0x70, //movntps [edi][070],xmm7 we've pasted 128 bytes of data
 - 0x81, 0xC6, 0x80, 0x00, 0x00, 0x00, //add esi,000000080 increment source address by 128
 - 0x81, 0xC7, 0x80, 0x00, 0x00, 0x00, //add edi,000000080 increment destination address by 128
 - 0x83, 0xE9, 0x01, //sub ecx,1 decrement counter
 - 0x0F, 0x85, 0x7A, 0xFF, 0xFF, 0xFF, //jnz 0000000C0 ↑ continue copy-past if non-zero
 - 0x8B, 0x8D, 0x6A, 0x01, 0x00, 0x00, //mov ecx,[ebp][00000016A] get number of bytes to copy
 - 0x83, 0xE1, 0x7F, //and ecx,07F get rest number of bytes
 - 0x85, 0xC9, //test ecx,ecx check if there are bytes
 - 0x74, 0x02, //jz 000000155 ↓ exit if there are no more bytes
 - 0xF3, 0xA4, //rep movsb copy rest of bytes
 - 0x0F, 0xAE, 0xF8, //sfence performs a serializing operation on all store-to-memory instructions
 - 0x61, //popad restore flag register
 - 0xC3, //retn return from our method to C#
 - 0x00, 0x00, 0x00, 0x00, //source buffer address
 - 0x00, 0x00, 0x00, 0x00,
 - 0x00, 0x00, 0x00, 0x00, //destination buffer address
 - 0x00, 0x00, 0x00, 0x00,
 - 0x00, 0x00, 0x00, 0x00, //number of bytes to copy-past
 - 0x00, 0x00, 0x00, 0x00
 - };
 
我们将会通过前面创建的托管来调用汇编方法。
该方法目前工作在32位模式下,将来我会实现64位模式。
谁感兴趣的话可以添加到源代码中(文章中几乎包含了所有的代码)
在实现及测试该方法期间,我发现prefetchnta命令描述的不是很清楚,甚至是Intel的说明书也是一样。所以我尝试自己以及通过google来弄明白[[125062]]。注意movntps和movaps说明,它们只在16字节内存地址对齐时工作。
英文原文:C# - Fast memory copy method with x86 assembly usage
译文出自:http://www.oschina.net/translate/csharp-fast-memory-copy-method-with-x-assembly-usa
                文章题目:使用x86汇编实现C#的快速内存拷贝
                
                当前网址:http://www.csdahua.cn/qtweb/news21/176271.html
            
网站建设、网络推广公司-快上网,是专注品牌与效果的网站制作,网络营销seo公司;服务项目有等
声明:本网站发布的内容(图片、视频和文字)以用户投稿、用户转载内容为主,如果涉及侵权请尽快告知,我们将会在第一时间删除。文章观点不代表本网站立场,如需处理请联系客服。电话:028-86922220;邮箱:631063699@qq.com。内容未经允许不得转载,或转载时需注明来源: 快上网