Friday, April 15, 2011

Is there a method in python that's like os.path.split for other delimiters?

I want to use something like this:

os.path.split("C:\\a\\b\\c")

With this kind of output:

('C:\a\b', 'c')


However I want it to work on other delimiters like this:

method ('a_b_c_d')

With this kind of output:

('a_b_c', 'd')

From stackoverflow
  • >>> 'a_b_c_d'.rsplit('_', 1)
    ['a_b_c', 'd']
    

    Help on built-in function rsplit:

    rsplit(...) S.rsplit([sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string S, using sep as the delimiter string, starting at the end of the string and working to the front. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator.

    S.Lott : +1: Quote the documentation.
  • string.split(separator)
    
    recursive : that produces ["a", "b", "c", "d"]

Passing functions which have multiple return values as arguments in Python

So, Python functions can return multiple values. It struck me that it would be convenient (though a bit less readable) if the following were possible.

a = [[1,2],[3,4]]

def cord():
    return 1, 1

def printa(y,x):
    print a[y][x]

printa(cord())

...but it's not. I'm aware that you can do the same thing by dumping both return values into temporary variables, but it doesn't seem as elegant. I could also rewrite the last line as "printa(cord()[0], cord()[1])", but that would execute cord() twice.

Is there an elegant, efficient way to do this? Or should I just see that quote about premature optimization and forget about this?

From stackoverflow
  • printa(*cord())
    

    The * here is an argument expansion operator... well I forget what it's technically called, but in this context it takes a list or tuple and expands it out so the function sees each list/tuple element as a separate argument.

    It's basically the reverse of the * you might use to capture all non-keyword arguments in a function definition:

    def fn(*args):
        # args is now a tuple of the non-keyworded arguments
        print args
    
    fn(1, 2, 3, 4, 5)
    

    prints (1, 2, 3, 4, 5)

    fn(*[1, 2, 3, 4, 5])
    

    does the same.

    Jason Baker : It's a tuple. :-)
    regan : And the documentation: http://docs.python.org/tutorial/controlflow.html#tut-unpacking-arguments
    David Zaslavsky : Thanks Jason, I edited that in.
    Miles : # args is now a tuple...
  • Try this:

    >>> def cord():
    ...     return (1, 1)
    ...
    >>> def printa(y, x):
    ...     print a[y][x]
    ...
    >>> a=[[1,2],[3,4]]
    >>> printa(*cord())
    4
    

    The star basically says "use the elements of this collection as positional arguments." You can do the same with a dict for keyword arguments using two stars:

    >>> a = {'a' : 2, 'b' : 3}
    >>> def foo(a, b):
    ...    print a, b
    ...
    >>> foo(**a)
    2 3
    
  • Actually, Python doesn't really return multiple values, it returns one value which can be multiple values packed into a tuple. Which means that you need to "unpack" the returned value in order to have multiples. A statement like

    x,y = cord()
    

    does that, but directly using the return value as you did in

    printa(cord())
    

    doesn't, that's why you need to use the asterisk. Perhaps a nice term for it might be "implicit tuple unpacking" or "tuple unpacking without assignment".

Identifying 2 same images using Java

Hi all

I have a problem in my web crawler where I am trying to retrieve images from a particular website. Problem is that often I see images that are exactly same but different in URL i.e. their address.

Is there any Java library or utility that can identify if 2 images are exactly same in their content (i.e. at pixel level).

My input will be URLs for the images where I can download them.

From stackoverflow
  • I would think you don't need an image library to do this - simply fetching the URL content and comparing the two streams as byte arrays should do it.

    Unless of course you are interested in identifying similar images as well.

    Alex Reynolds : What if the image is compressed with a lossy algorithm? You could have two images that are the same but with different bytes.
    levik : You could - but you're not likely to. Such images wouldn't be identical, but would be similar to each other. Mostly such pixel-perfect clones don't exist on the web - it will either be a byte-for-byte copy, or will have some pixels differ from the original. Perhaps it will have a badge in the corner
  • Why not write one yourself? It does not seem difficult. Check this out.

  • Depending on how detailed you want to get with it:

    • download the image
    • as you download it generate a hash for it
    • make a directory where the directory name is the hash value (if the directory does not exist)
    • if directory contains 2 or more files then compare the file sizes
    • if the file sizes are the same then do a byte by byte comparison of the image to the bytes of the images in the file
    • if the bytes are unique then you have a new image

    Regardless of if you want to do all that or not you need to:

    • download the images
    • do a byte-by-byte comparison of the images

    No need to rely on any special imaging libraries, images are just bytes.

    Neil Coffey : You really only need to do this if you use a weak hash function. Honestly, you really can just use fairly strong hash function and "trust the hash".
  • Look at the MessageDigest class. Essentially, you create an instance of it, then pass it a series of bytes. The bytes could be the bytes directly loaded from the URL if you know that two images that are the "same" will be the selfsame file/stream of bytes. Or if necessary, you could create a BufferedImage from the stream, then pull out pixel values, something like:

      MessageDigest md = MessageDigest.getInstance("MD5");
      ByteBuffer bb = ByteBuffer.allocate(4 * bimg.getWidth());
      for (int y = bimg.getHeight()-1; y >= 0; y--) {
        bb.clear();
        for (int x = bimg.getWidth()-1; x >= 0; x--) {
          bb.putInt(bimg.getRGB(x, y));
        }
        md.update(bb.array());
      }
      byte[] digBytes = md.digest();
    

    Either way, MessageDigest.digest() eventually gives you a byte array which is the "signature" of the image. You could convert this to a hex string if it's helpful, e.g. for putting in a HashMap or database table, e.g.:

    StringBuilder sb = new StringBuilder();
    for (byte b : digBytes) {
      sb.append(String.format("%02X", b & 0xff));
    }
    String signature = sb.toString();
    

    If the content/image from two URLs gives you the same signature, then they're the same image.

    Edit: I forgot to mention that if you were hashing pixel values, you'd probably want to include the dimensions of the image in the hash too. (Just to a similar thing-- write two ints to an 8-byte ByteBuffer, then update the MessageDigest with the corresponding 8-byte array.)

    The other thing is that somebody mentioned is that MD5 is not collision-resistent. In other words, there is a technique for constructing multiple byte sequences with the same MD5 hash without having to use the "brute force" method of trial and error (where on average, you'd expect to have to try about 2^64 or 16 billion billion files before hitting on a collision). That makes MD5 unsuitable where you're trying to protect against this threat model. If you're not concerned about the case where somebody might deliberately try to fool your duplicate identification, and you're just worried about the chances of a duplicate hash "by chance", then MD5 is absolutely fine. Actually, it's not only fine, it's actually a bit over the top-- as I say, on average, you'd expect one "false duplicate" after about 16 billion billion files. Or put another way, you could have, say, a billion files and the chance of a collision be extremely close to zero.

    If you are worried about the threat model outlined (i.e. you think somebody could be deliberately dedicating processor time to constructing files to fool your system), then the solution is to use a stronger hash. Java supports SHA1 out of the box (just replace "MD5" with "SHA1"). This will now give you longer hashes (160 bits instead of 128 bits), but with current knowledge makes finding a collision infeasible.

    Personally for this purpose, I would even consider just using a decent 64-bit hash function. That'll still allow tens of millions of images to be compared with close-to-zero chance of a false positive.

    TofuBeer : That will not work, MD5 is not collision resistant (two different files can have the same MD5 hash), but it is a good start since the odds of a collision are low (you still need to do the byte-by-byte comparison if two MD5 hashes are the same though).
    Neil Coffey : See my edit-- I don't think the purpose in this case is to protect against that threat model. All we're probably looking for is a "good wide hash function", and MD5 is adequate for that.
    TofuBeer : Are you saying that it is impossible for two files to have the same hash except by attack? There are an infinite possible number of files that are being mapped to a finite number of hashes... (improbable and impossible are two very different things).
    Neil Coffey : They are different, but in this case, "improbable" means something in the order of "so improbable that there's more chance of a meteorite landing on top of you while you're writing the code".
    Neil Coffey : P.S. I know this is counter-intuitive. But the "finite number" of hashes is 2^128, and that's a veeeeery big number!
  • You could also generate a MD5 signature of the file and ignore duplicate entries. Won't help you find similar images though.

  • calculate MD5s using something like this:

    MessageDigest m=MessageDigest.getInstance("MD5");
    m.update(image.getBytes(),0,image.length());
    System.out.println("MD5: "+new BigInteger(1,m.digest()).toString(16));
    

    Put them in a hashmap.

  • I've done something very similar to this before in Java and I found that the PixelGrabber class inside the java.awt.image package of the api is extremely helpful (if not downright necessary).

    Additionally you would definitely want to check out the ColorConvertOp class which can performs a pixel-by-pixel color conversion of the data in the source image and the resulting color values are scaled to the precision of the destination image. The documentation goes on to say that the images can even be the same image in which case it would be quite simple to detect if they are identical.

    If you were detecting similarity, you need to use some form of averaging method as mentioned in the answer to this question

    If you can, also check out Volume 2 chapter 7 of Horstman's Core Java (8th ed) because there's a whole bunch of examples on image transformations and the like, but again, make sure to poke around the java.awt.image package because you should find you have almost everything prepared for you :)

    G'luck!

  • Hashing is already suggested and recognizing if two files are identical is very easy, but you said pixel level. If you want to recognize two images even if they are in different formats (.png/.jpg/.gif/..) and even if they were scaled I suggest: (using an image library and if the image are medium/big no 16x16 icons):

    1. scale the image to some fixed size, it depends on the samples
    2. transform it to grey scale using the RGB-YUV conversion for exampel and taking Y from there (very easy) 3 Do the hamming distance of each image and set a threshold to decide if they are the same or not.

    You will do a sum of the difference of all the grey pixels of both images you get a number if the difference is < T you consider both images identical

    --

  • Inspect the response headers and interrogate the HTTP header ETag value, if present. (RFC2616: ETag) They maybe the same for identical images coming from your target web server. This is because the ETag value is often a message digest like MD5, which would allow you to take advantage of the webserver's already completed computations.

    This may potentially allow you to not even download the image!

    for each imageUrl in myList
        Perform HTTP HEAD imageUrl
        Pull ETag value from request
        If ETag is in my map of known ETags
           move on to next image
        Else
           Download image
           Store ETag in map
    

    Of course the ETag has to be present and if not, well the idea is toast. But maybe you have pull with the web server admins?

  • Maybe you could check this one. http://www.java-forums.org/java-2d/22880-identify-pixel-image.html

How to run automatic "jobs" in asp.net?

I want to have my website do a number of calculations every 10 minutes and then update a database with the results. How exactly do I set such a timer, i am assuming it would be in global.asax?

From stackoverflow
  • Check this out.

    Charlie Flowers : Now that is downright clever!
    Neil N : ha, clever indeed. gotta love some of the solutions that come up on Code Project
    Jonathan Allen : I have to say I distrust clever things.
    Neil N : true, this is more for people who are on hosted/shared servers who may not have the option to use SQL agent or windows task scheduler.
    çağdaş : @Neil N, you're right. This is a work around and not a great solution. But considering there's a lot of people working on ASP.NET && can't use task scheduler or SQL agent, it's worth mentioning.
  • You'd be better off writing a separate non-UI application and then running that as a scheduled task.

    Michael Todd : I agree. A web server is meant to serve a page, not peform a function at a scheduled time. Unless there's a reason (lack of access to the server, etc.), it should be a service.
    marc_s : Or a Windows service, if it has to run at all times, aroudn the clock, regardless of whether someone is logged in or not.
    ChrisF : @marc_s - you can set up scheduled tasks to run when no-one's logged into the machine.
  • the Cache solution in cagdas' answer works. I've used it. It's only downside is that it's difficult to turn it off if you need to suspend the timer for some reason. Alternate, but not quite identical solutions we've used.

    1. Scheduled tasks in SQL Server
    2. Scheduled windows tasks.
  • ifs its strictly database calculations, keep it in the database. Create a stored proc that does what you want, then have SQL Server agent run that proc on a schedule.

  • I really don't like schecduled tasks. I would rather put this function in a windows servic and throw a timer in it. With window services you can handle stop events very nicely. I do agree with everyone else, the web site is not the place for this.

    Neil N : a service seems more of a hassle than a scheduled task any day. A service is meant for something that "waits".
  • Doing something like that in a web application is somewhere between difficult and unstable to impossible. Web applications are simply not meant to be run non-stop, only to reply to requests.

    Do you really need to do the calculations every ten minutes? I have found that in most cases when someone asks a question like this, they really just need the appearence of something running at an interval, but as long as noone is visiting the page to see the results, the results doesn't really need to be calculated.

    If this is true in your case also, then you just need to keep track of when the calculations were done the last time, and for every request check if enough time has gone by to recalculate.

  • Aside from (correct) statements about instability of web application for scheduled task execution, here's a strategy you could implement:

    in global.asax, define application.onstart event in which create timer:

    var dueTime = 5000;
    var period = 5000;
    var MyTimer = new Timer(new TimerCallback(MyClass.MyTaskProc), null, dueTime, period);
    Application["MyTaskTimer"] = MyTimer;
    

    this will pretty much take care of creating task and restarting it should application exit

MVC way of handling data input

I have a data input module where I add the information of my product and its sub information like:

product basic info product price info product price details

price info and price details are related to product and are lists

In my web forms approach I would store my main product object on the view state and I would populate it's pricing info and details while doing ajax postbacks. This way I can create a compact module that is very user friendly in terms of defining a lot of data from one place without the need to enter these data from seperate modules. And when I am done I would do one product.save() and that would persist all the data to the respective tables on db.

Now I am building similar app on .net mvc framework and pondering on what would be the good way of handling this on mvc.

I don't resonate towards storing all this on client side till I click save. And saving to the db after each action makes me remember the days I was coding on asp.

Will appreciate your inputs on ways to approach this on mvc framework

From stackoverflow
  • I believe the best way of doing this is storing the data on the client side. It reduces unnecessary postbacks and improves responsiveness of your application. If you really want to store it on the server, you can use SessionState.

    If you really want to store it in something like ViewState, you can go with a solution like this: http://stackoverflow.com/questions/669492/asp-net-mvc-is-there-a-way-to-simulate-a-viewstate/669495#669495. However, I recommend against it as it will make things more complicated. Doing it client-side is probably the most elegant way and storing it in SessionState is the easiest.

    Remember that you can always escape the MVC pattern and use a simple Web form for that specific page (which will give you ViewState where you need it): http://stackoverflow.com/questions/594898/asp-net-mvc-controller-actions-design-question/598021#598021

    kaivalya : Please See my comments above. I am not clear why it would be the best way. Small postbacks on a data input module does not create much noticeable delay from user perspective but adds a high maintenance layer to the app..
  • store your Product list to the Model of the view and each time you change a value you can do a Ajax post to the controller and save the changes to the db, use partial views to display each item in your product list

  • you can try to integrate http://www.castleproject.org/ActiveRecord/ for easy saving and updating. That way you can just map your Model on your database using ORM(Object Relational Mapping). It takes a bit more work in the beginning but you will end up with simple commands like product.Update() and product.Create()

Cut off the filename and extension of a given string.

I build a little script that parses a directory for files of a given filetype and stores the location (including the filename) in an array. This look like this:

def getFiles(directory)
  arr = Dir[directory + '/**/*.plt']
  arr.each do |k|
    puts "#{k}"
  end
end

The output is the path and the files. But I want only the path.

Instead of /foo/bar.txt I want only the /foo/

My first thought was a regexp but I am not sure how to do that.

From stackoverflow
  • I would split it into an array by the slashes, then remove the last element (the filename), then join it into a string again.

    path = '/foo/bar.txt'
    
    path = path.split '/'
    path.pop
    path = path.join '/'
    
    # path is now '/foo'
    
  • The following code should work (tested in the ruby console):

    >> path = "/foo/bar/file.txt"
    => "/foo/bar/file.txt"
    >> path[0..path.rindex('/')]
    => "/foo/bar/"
    

    rindex finds the index of the last occurrence of substring. Here is the documentation http://docs.huihoo.com/api/ruby/core/1.8.4/classes/String.html#M001461

    Good luck!

  • not sure what language your in but here is the regex for the last / to the end of the string.

    /[^\/]*+$/
    

    Transliterates to all characters that are not '/' before the end of the string

  • You don't need a regex or split.

    File.dirname("/foo/bar/baz.txt")
    # => "/foo/bar"
    
  • Could File.dirname be of any use?

    File.dirname(file_name ) → dir_name

    Returns all components of the filename given in file_name except the last one. The filename must be formed using forward slashes (``/’’) regardless of the separator used on the local file system.

    File.dirname("/home/gumby/work/ruby.rb") #=> "/home/gumby/work"
    
  • For a regular expression, this should work, since * is greedy:

    .*/
    

Dropdownlist Checkbox

If you have ever used SQL reporting services, there are some dropdown lists that pop down with a checkbox list in it. It allows a user to select multi items very nicely. Does anyone know a free User control or an example of this implemented. Ok, so I know I can do this with some elbow grease and html, was just trying to see if there was already something out there. I’m using ASP.NET C#.

Thanks,

Joe

From stackoverflow

GXT KeyListener.componentKeyDown() immediately closes MessageBox.alert()

In GXT, MessageBox methods are asynchronous, meaning that the application does not "lock up" while the message box is displayed.

I using a KeyListener to process enter key presses in a form (to increase usability, i.e., allowing the form to be submitted by the enter key) and subsequently disabling the form fields while the application processes the user's credentials. If they are incorrect, I show a MessageBox.alert() and then re-enable the form fields. However, since alert() returns immediately, the form fields are immediately made available again, allowing the user to input data without closing the alert.

The solution is to use a callback in alert(); however, the enter keypress not only causes the form to submit, but also causes the alert to immediately dismiss (as if both the form and the message box are processing the enter key). How do I keep the alert box open until the user presses enter a second time or clicks the "Ok" button?

From stackoverflow
  • The key is DeferredCommand provided by GWT:

    This class allows you to execute code after all currently pending event handlers have completed, using the addCommand(Command) or addCommand(IncrementalCommand) methods. This is useful when you need to execute code outside of the context of the current stack.

    if(!validate())
    {
        DeferredCommand.addCommand(new Command() {
            public void execute() {
                MessageBox.alert("Error", "You must enter a username and password.", alertListener);
                return;
            }
        });
    }
    

How to read the data returned from WebRequest in asp.net ?

Hi,

I am trying to integrate 3D Secure to my customer's e-shop. I need to post some data to 3DGate and get the returned result from it.

I have used WebRequest for this, i have posted the data successfuly but the returned data is an Html Text which has a form in it and some inputs in the form. I need to read these values like Request.Form.Get("HashParams") but because of being just a string i couldn't do it.

Is there any way that i can get these form values.

*I am doing this WebRequest in the btnPayment_Click Event*

Thanks

From stackoverflow
  • I believe madcolor is thinking of a different scenario; you're making a completely new webrequest on the server, which means there are no request parameters; you're dealing with a response. Esentially, you've become the web browser, and you have to do the parsing yourself.

    Since the e-store you're using is an app that's designed for browsers, you'll have to deal with the limitations inherent to that format. You're esentially bound to "screen scraping" techniques, because the server doesn't see the text from the response as anything other than that: plain text.

    If you're dealing with valid XHTML, you can load it into an XmlDocument, and use XPath/XQuery to pull out the values.

    If you're dealing with standard crappy HTML, you're going to have to resort to some parsing; I'd suggest a regex for this one.

    Ideally, there would be a non-HTML based version of the e-shop, so you would know you were working with valid XML/JSON/whatever, but if there is no alternative, you're stuck ripping the data out yourself.

    Barbaros Alp : I ve posted the xhtml, could you please take a look
  • I can't see a way around having to parse the HTML that comes back from the WebRequest. If you're lucky it might be valid XML. Otherwise you'll have to do your own string parsing or use one of the other HTML parsers.

  • This is the returned data...

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <!-- gateerr_en.htm -->
    <html>
    <head>
    <script type="text/javascript" language="javascript">
    function moveWindow() {
      document.returnform.submit();
    }
    </script>  
    </head>
    
    <body onLoad="javascript:moveWindow()">
    <form action="urlHere" method="post" name="returnform">
    
        <input type="hidden" name="clientid" value="xxx">
        <input type="hidden" name="oid" value="">
    
        <input type="hidden" name="mdStatus" value="7">
        <input type="hidden" name="mdErrorMsg" value="Tanimlanamayan">
    
        <input type="hidden" name="ErrMsg" value="Tanimlanamayan">
        <input type="hidden" name="Response" value="Error">
        <input type="hidden" name="ProcReturnCode" value="99"> 
    
    
        <!-- To support javascript unaware/disabled browsers -->
        <noscript>
         <center>
         An Error Occurred, Please Click to continue.<br>
         <input type="submit"  value="Submit"></center>
        </noscript> 
    </form>
    </body>
    </html>
    

    I need to get those hidden inputs

    jvenema : Use this regex: That'll give you only the inputs with type hidden, and the first group will have the value.

What is ADS realm?

I am trying to setup share in my SAN system and it is asking for ADS realm for my Domain Controller.

What is it? Where can i find this value?

I am using ActiveDirectory as domain controller.

From stackoverflow
  • Weisch du das isch der ADS realm wo du musch iitrage.

What's the Scala syntax for a function taking any subtype of Ordered[A]?

I want to write a function that works on any Scala type with a total ordering (i.e. I can use '<' on it). What's the syntax for that? The best I've come up with is

def lessThan[T <: Ordered[T]](x: T, Y: T) = x < y

That doesn't work, though, when I try using it from the REPL:

scala> lessThan(1, 2)
<console>:8: error: inferred type arguments [Int] do not conform to method lessThan's type parameter bounds [T <: Ordered[T]]
       lessThan(1, 2)
       ^

scala> import runtime._
import runtime._

scala> lessThan(new RichInt(1), new RichInt(2))
<console>:8: error: inferred type arguments [scala.runtime.RichInt] do not conform to method lessThan's type parameter bounds [T <: Ordered[T]]
       lessThan(new RichInt(1), new RichInt(2))

Essentially, I believe I want the equivalent of this Haskell code:

lessThan :: (Ord a) => a -> a -> Bool
lessThan x y = x < y

I'm using scala 2.7.3 on a Debian system.

What am I missing, and where?

From stackoverflow
  • The equivalent of Haskell's type classes in Scala is done via implicits. There are two ways to do what you want

    The first is with view bounds

    scala> def lessThan[T <% Ordered[T]](x : T, y : T) = x < y
    lessThan: [T](T,T)(implicit (T) => Ordered[T])Boolean
    
    scala> lessThan(1,2)
    res0: Boolean = true
    

    The second is with an implicit parameter

    scala> def lessThan[T](x : T, y : T)(implicit f : T => Ordered[T]) = x < y      
    lessThan: [T](T,T)(implicit (T) => Ordered[T])Boolean
    
    scala> lessThan(4,3)
    res1: Boolean = false
    

    The former is syntax sugar for the later. The later allows more flexibility.

Data Connection Wizard/Dialog

I'm looking to add a feature to one of my apps that allows the user to select a data source similar to how Microsoft Word mail merge does. I'm wondering if there is something out there which I could use in a commercial application. If not, any suggestions on how to roll my own (i.e., how to look at the machines data source list)?

Also, I do know of the Microsoft.Data.ConnectionUI.DataConnectionDialog, however this is not usable (the dll is not free to distribute AFAIK).

Much obliged!

From stackoverflow
  • Hello, I have just released a Data Connection Dialog which is designed to work and look like the Microsoft version. It works will the built-in data sources in .NET and is extensible! It comes with a sample MySQL c# source code that demonstrates extending the functionality for custom data sources. Please check it out at www.mjmeans.com/dcd.aspx.

C#: HtmlDocument object has no constructor?

What's up with that? It seems the only way to get an working HtmlDocument object is copying the Document property of an mshtml/webbrowser control. But spawning that is sloooooooooooow. I'd like to avoid writing my own HTML parser and HtmlAgilityPack is copyleft.

Are there other sources of getting an instantiated HtmlDocument that I can dump HTML from a string into?

Or, is there a way to override HtmlElement's annoying habit of throwing a fit when using InnerHtml/OuterHtml with img tags and tr elements?

Edit: I'm referring to System.Windows.Forms.HtmlDocument. My apologies, I'm still new to C# and .Net and know very little about COM and some of the other things this topic brings up.

From stackoverflow
  • It has no constructor because it's just a wrapper class around an unmanaged object.

    Reference: http://msdn.microsoft.com/en-us/library/system.windows.forms.htmldocument.aspx

    HtmlDocument provides a managed wrapper around Internet Explorer's document object, also known as the HTML Document Object Model (DOM). You obtain an instance of HtmlDocument through the Document property of the WebBrowser control.

    Depending on what you want it for, you may want to look at SGMLReader or the up-to-date community version.

    Tom the Junglist : Thanks for the tip on SGMLReader. I was able to work around this by reading my HTML into SGMLReader, converting it to an XML document, and then injecting that code into the mshtml.HTMLDocument. THank you!
  • Robust Programming?

    When using the DOM through the WebBrowser control, you should always wait until the DocumentCompleted event occurs before attempting to access the Document property of the WebBrowser control. The DocumentCompleted event is raised after the entire document has loaded; if you use the DOM before then, you risk causing a run-time exception in your application.

    http://msdn.microsoft.com/en-us/library/ms171712.aspx

Debugging scripts added via jQuery getScript function

I have a page that dynamically adds script references via jQuery's $.getScript function. The scripts load and execute fine, so I know the references are correct. However, when I add a "debugger" statement to any of the scripts to allow me to step through the code in a debugger (such as VS.Net, Firebug, etc.), it doesn't work. It appears that something about the way jQuery loads the scripts is preventing debuggers from finding the files.

Does anybody have a work-around for this?

From stackoverflow
  • Ok, so it turns out that the default implementation of the $.getScript() function works differently depending on whether the referenced script file is on the same domain or not. External references such as:

    $.getScript("http://www.someothersite.com/script.js")
    

    will cause jQuery to create an external script reference, which can be debugged with no problems.

    <script type="text/javascript" src="http://www.someothersite.com/script.js"></script>
    

    However, if you reference a local script file such as any of the following:

    $.getScript("http://www.mysite.com/script.js")
    $.getScript("script.js")
    $.getScript("/Scripts/script.js");
    

    then jQuery will download the script content asynchronously and then add it as inline content:

    <script type="text/javascript">{your script here}</script>
    

    This latter approach does not work with any debugger that I tested (Visual Studio.net, Firebug, IE8 Debugger).

    The workaround is to override the $.getScript() function so that it always creates an external reference rather than inline content. Here is the script to do that. I have tested this in Firefox, Opera, Safari, and IE 8.

    <script type="text/javascript">
    // Replace the normal jQuery getScript function with one that supports
    // debugging and which references the script files as external resources
    // rather than inline.
    jQuery.extend({
       getScript: function(url, callback) {
          var head = document.getElementsByTagName("head")[0];
          var script = document.createElement("script");
          script.src = url;
    
          // Handle Script loading
          {
             var done = false;
    
             // Attach handlers for all browsers
             script.onload = script.onreadystatechange = function(){
                if ( !done && (!this.readyState ||
                      this.readyState == "loaded" || this.readyState == "complete") ) {
                   done = true;
                   if (callback)
                      callback();
    
                   // Handle memory leak in IE
                   script.onload = script.onreadystatechange = null;
                }
             };
          }
    
          head.appendChild(script);
    
          // We handle everything using the script element injection
          return undefined;
       },
    });
    </script>
    
    Andrew Matthews : nice. Have you experienced any issues that I should be aware of since you posted this answer? ;)
    Shrikant Sharat : I don't think this method makes use of the global ajax events.. not sure.
  • Is there a way around this without hacking JQuery? I'm hitting exactly the same problem but I can't see a workable solution, without overriding getScript. Thanks

    James Messinger : I don't recommend using my workaround in a production environment. This is just a workaround for your dev environment for debugging purposes. In our applications, we put this "hack" in a file that is only referenced in DEBUG builds. Our release builds use the normal jQuery getScript method, since we don't have any need to debug our production environments.
  • script somewhat works but does throw unexpected errors when using eval(). but thanks for posting, its pretty nifty workaround.

Is there a standard #include convention for C++?

This is a rather basic question, but it's one that's bugged me for awhile.

My project has a bunch of .cpp (Implementation) and .hpp (Definition) files.

I find that as I add additional classes and more class inter-dependencies, I have to #include other header files. After a week or two, I end up with #include directives in lots of places. Later, I'll try removing some of the #includes and discover that everything still works because some OTHER included class is also #including what I just removed.

Is there a simple, easy rule for putting in #includes that will stop this ugly mess from happening in the first place? What is the best practice?

For example, I've worked on projects where the Implementation .cpp file ONLY includes the corresponding Definition .hpp file, and nothing else. If there are any other .hpp files that need to be used by the Implementation .cpp, they are all referenced by the Definition .hpp file.

From stackoverflow
  • Some best practices:

    • Every .cpp or .C file includes all headers it needs and does not rely on headers including other related headers
    • Every .hpp or .h file includes all its dependencies and does not rely on the included headers including other related headers
    • Every header is wrapped with:

      #ifndef HEADER_XXX_INCLUDED
      #define HEADER_XXX_INCLUDED
      ...
      #endif /* HEADER_XXX_INCLUDED */
      
    • Headers do not include each others in cycles

    • Often: there is a single "project-wide header file" like "config.h" or ".h" which is always included first by any .cpp or .C file. Typically this has platform related configuration data, project-wide constants and macros etc.

    These are not necessarily "best practice", but rules which I usually follow also:

    • Project-specific headers are included as #include "..." and before the system-wide headers, which are included as #include <...>
    • Project-specific headers are included in alphabetical order as a way to ensure that there is no accidental, hidden requirement on which order they are included. As every header should include its dependents and the headers should be protected against multiple inclusion, you should be able to include them in any order you wish.
    Scott Chamberlain : I also do #endif //HEADER_XXX_INCLUDED so to see what the endif matches up with.
    Runcible : Interesting. Doesn't this mean that the .cpp file will have the same (possibly more) #includes as it's corresponding .hpp file?
    rmeador : I'm confused by your second bullet. IMHO, you should always include the headers your class relies on in the .h file for your class. If every class follows this practice, you should be able to rely on them to include whatever they need. You're advocating including all of their dependencies as well?
    Ferruccio : I like to create unit tests that do nothing but #include one header file. Unintended dependencies are quickly weeded out when the tests are compiled.
    RobH : If you're using Visual Studio 2005 or later, and portability is not an issue, you could replace the '#ifndef HEADER_XXX' / '#define HEADER_XXX' / '#endif // HEADER_XXX' lines with a simple '#pragma once' line where your #ifndef line would go.
    Greg Rogers : You should emphasize that system (and other external library) headers should be included last. This is probably the biggest source of ordinal include dependencies in our code.
    RobH : To save compile time, I typically include big system headers, like windows.h, as the first includes in every .cpp file and don't include them in the headers. If a header depends on the system header, you can do a #ifndef on something the system header defines...
    RobH : ... and put a #error line inside that block that tells the developer to include that particular system header before including your header. The #error line would be something like: #error You must include windows.h before including foobar.h
    Dan Olson : A common "best-practice" is to always include the header that corresponds to a cpp file first in the list of headers. This helps to enforce that all the dependencies of the header are included properly. Might be worth adding.
    X-Istence : I disagree with the .cc/.cpp/.C file including all of its header files and stuff it requires. That is what the corresponding .h file is for. In the Foo.cc I #include "Foo.h", nothing else. Foo.h should contain all of the #includes to make sure that my .cc compiles and can be properly linked against.
    antti.huima : X-Istence, if Foo.cc is implementation of class Foo and Foo.h its declaration, there can be headers which you only need to compile the unit Foo.cc but which are not needed for using the class.
  • Use only the minimum amount of includes needed. Useless including slows down compiling.

    Also, you don't have to include a header if you just need to pointer to a class. In this case you can just use a forward declaration like:

    class BogoFactory;
    

    edit: Just to make it clear. When I said minimum amount, I didn't mean building include chains like:

    a.h
    #include "b.h"
    
    b.h
    #include "c.h"
    

    If a.h needs c.h, it needs to be included in a.h of course to prevent maintenance problems.

    RobH : Typically, unless I can get away with just a forward declaration or two, if a file uses something that's defined in a header, I include that header, even if it's already included by one of the headers that I'm already including. That prevents the problem that Joel Coehoorn pointed out.
    abababa22 : Yep, I realized that my post can be interpreted this way, which I of course didn't mean. Edited.
    RobH : The exception to the above is when I'm including system headers. Then I'll include as few of those as I can get away with. (I.e., if system Header A includes system header B, and I'm including system header A, then I won't also include system header B.)
    X-Istence : @RobH: However you can't always rely on that. On certain systems certain system header files will not include things they rely on, and you may need to re-order the headers just to get the whole damn thing to compile! Annoying!
  • I always use the principle of least coupling. I only include a file if the current file actually needs it; if I can get away with a forward declaration instead of a full definition, I'll use that instead. My .cpp files always have a pile of #includes at the top.

    Bar.h:

    class Foo;
    
    class Bar
    {
        Foo * m_foo;
    };
    

    Bar.cpp:

    #include "Foo.h"
    #include "Bar.h"
    
    Runcible : I see. So it sounds like you're .h file will typically have very few (if any) #includes.
    Mark Ransom : Yes - most often it will be for the standard library containers, e.g. #include .
    dmckee : In one (moderately complicated) code base I've seen a marked improvement in rebuild times using this approach.
    Runcible : I actually like this answer best because it's so simple.
  • Building on what antti.huima said:

    Let's say you have classes A, B, and C. A depends on (includes) B, and both A and B depend on C. One day you discover you no longer need to include C in A, because B does it for you, and so you remove that #include statement.

    Now what happens if at some point in the future you update B to no longer use C? All of a sudden A is broken for no good reason.

  • There are several problems with the #include model used in C/C++, the main one being that it doesn't express the actual dependency graph. Instead it just concatenates a bunch of definitions in a certain order, often resulting in definitions coming in a different order in each source file.

    In general, the include file hierarchy of your software is something you need to know in the same way as you know your datastructures; you have to know which files are included from where. Read your source code, know which files are high up in the hierarchy so you can avoid accidentally adding an include so that it will be included "from everywhere". Think hard when you add a new include: do I really need to include this here? What other files will be drawn in when I do this?

    Two conventions (apart from those already mentioned) which can help out:

    • One class == one source file + one header file, consistently named. Class A goes in A.cpp and A.h. Code templates and snippets are good here to reduce the amount of typing needed to declare each class in a separate file.
    • Use the Impl-pattern to avoid exposing internal members in a header file. The impl pattern means putting all internal members in a struct defined in the .cpp file, and just have a private pointer with a forward declaration in the class. This means that the header file will only need to include those headerfiles needed for its public interface, and any definitions needed for its internal members will be kept out of the headerfile.
  • In A.cpp, always include A.h first, to ensure that A.h has no additional dependencies. Include all local (same module) files before all project files before all system files, again to ensure that nothing depends on pre-included system files. Use forward declarations as much as possible. Use #indef/#define/#endif pattern If a header is included in A.h, you don't need to include it in A.cpp. Any other headers A.cpp needs must be explicitly included, even if they happen to be provided by other .h files.

  • Organizing code files in C and C++:

  • Check out John Lakos's Large-Scale C++ Software Design. Here's what I follow (written as an example):

    Interface

    // foo.h
    // 1) standard include guards.  DO NOT prefix with underscores.
    #ifndef PROJECT_FOO_H
    #define PROJECT_FOO_H
    
    // 2) include all dependencies necessary for compilation
    #include <vector>
    
    // 3) prefer forward declaration to #include
    class Bar;
    class Baz;
    #include <iosfwd> // this STL way to forward-declare istream, ostream
    
    class Foo { ... };
    #endif
    

    Implementation

    // foo.cxx
    // 1) precompiled header, if your build environment supports it
    #include "stdafx.h"
    
    // 2) always include your own header file first
    #include "foo.h"
    
    // 3) include other project-local dependencies
    #include "bar.h"
    #include "baz.h"
    
    // 4) include third-party dependencies
    #include <mysql.h>
    #include <dlfcn.h>
    #include <boost/lexical_cast.hpp>
    #include <iostream>
    

    Precompiled Header

    // stdafx.h
    // 1) make this easy to disable, for testing
    #ifdef USE_PCH
    
    // 2) include all third-party dendencies.  Do not reference any project-local headers.
    #include <mysql.h>
    #include <dlfcn.h>
    #include <boost/lexical_cast.hpp>
    #include <iosfwd>
    #include <iostream>
    #include <vector>
    #endif
    

what is alignment css style for table layout?

I have a table in html. I prefer to layout the table centered. What is the css style for this layout? And any online css tools to interactively see the style change?

From stackoverflow
  • If I remember correctly, it's the text-align property. Or do you mean center the table in the html page?

  • To center the table in the page, use auto left and right margins.

    table {
        margin: 0 auto;
    }
    
  • Given this HTML:

    <div>
      <table>
        <!-- contents -->
      </table>
    </div>
    

    You can use this CSS:

    div{
      margin:0 auto;
      text-align:center;
    }
    div table{
      margin:0 auto;
      text-align:left;
    }
    

    Most browsers will already work if you apply margin: 0 auto to the table.

    However, the text-align CSS makes it work for pickier browsers. Specifically, text-align: center will center the table, but since it would have a side effect of also centering each cell's contents, you need to apply text-align: left to the table to reset that property.

registering driectshow filter on windows mobile 6

I'm trying to register my DirectShow filter on Windows Mobile. My project has Linker/General/Register Output set to Yes. However, nothing is getting registered and I'm getting the following error: Project : error PRJ0050: Failed to register output. Please try to register the component from a command prompt with elevated permissions.

I'm running Vista and UAC is disable.

Any ideas?

From stackoverflow
  • This is a guess, but my guess is that it's trying to register the WM DLL on the host system (vista) and not on the WM device. I don't think there is a method to register the DLL on the WM Device in Visual Studio.

    You can either create a CAB file, where you can indicate which DLL are self-registering or you can write a quick tool to register the DLL for you. Both are pretty simple to do.

Integrating custom membership provider in existing solution

I have the source for the sqlMembershipProvider and the sqlRolesProvider that MS ships and I want to modify it to use my own tables and schema.

I have an existing solution that will use this provider and I'd like to debug the provider code within that solution until I'm sure it works.

How do I set up my provider code in a project within that solution so I can reference my custom provider in the solution's web project's web.config?

From stackoverflow
  • It doesn't really matter as long as you provide a fully qualified type name in the config:

    <membership defaultProvider="YourProvider">
        <providers>
            <add name="YourProvider" type="FullTypeName, YourAssembly"/>
        </providers>
    </membership>
    

    It can be a type in a separate assembly, or in a web application. It won't work in case of a "Web Site" project though, because all code in App_Code folder is compiled on the fly, so you can't tell its type name.

How can I convert code from Fortran77 to Visual Fortran?

I need to convert a code from Fortran77 to Compaq Visual Fortran. Is it possible? If "yes": is it also possible to save the results in a form that can be imported in EXCEL 2003?

From stackoverflow
  • CVF is a Fortran 95 compiler, and Fortran 77 is more or less a subset of F95, so yes, it's certainly possible. What are your actual problems, or what exactly are you trying to do?

    And yes, you can certainly output data in a format that excel can import.

    Also note that CVF was discontinued many years ago.

  • There are several fortran standards; fortran77, 90, 95, 2003 and 2008 coming on. "Visual fortran" is not a standard's name, but purely a commercial name for Compaq's (and now Intel's) line of compilers. Since they added an IDE, they named it "Visual". Since fortran is backwards compatible, fortran77 was made a subset of fortran90 standard (meaning, fortran90 includes the whole f77 standard). F95 was a little expansion to the standard, keeping that backward compatibility.

    So there is no need for changing anything, apart from trying to "modernize" the code syntax itself. Since most of f77 code I've seen runs very efficient, I rarely have seen the need for rewriting.

    Compaq's compiler was part of the line: Microsoft Fortran Powerstation 1.0 --> then 4.0 --> Digital's version 5 --> Compaq's and now Intel's Visual Fortran which is currently at version 11. It is a relatively stable and quality line of compilers, popular among the fortran users.

    Regarding the last question, MS Excel can through import read text files, which can be written in fortran. If you're thinking of writing directly .xls files, I have not seen a library which can do that so far (please, if you know of any, supply me with a link).

dynamic string localization in WPF

I have a C#/WPF application and I am attempting to add multi-language support. From my looking around I've found that using resource files of string values for each language is how I should go about doing this. Then update the CultureInfo. This seems to work fine if the culture info is set at or before the creation of the window, but I want the ability to dynamically change the culture. How can I do this? I've tried playing around with binding and dynamicResource, but couldn't figure out how to get either one to work. I guess I should add that I'm pretty much a beginner with the WPF stuff.

From stackoverflow
  • Take a look at this Codeplex project. It provides a dynamic localization system that blends well with WPF's binding system.

    mmr : hard to trust a language package that misspells 'really' at the top of the page.
    Denis Troller : Granted, the guy should have someone proof-read his page, it's full of spelling/grammatical mistakes... But don't be too harsh on the guy, nobody said he is a native English speaker.
  • oops, i will remove the "really". maybe you can correct the text for me?

    english isnt my native language, its german.

    Best regards, SeriousM