Code Question: 03/23/11

Wednesday, March 23, 2011

LINQ to SQL query fails using two Contains statements

I have two tables, DH_MASTER and DH_ALIAS. DH_MASTER contains information about a person, including their name. DH_ALIAS contains AKA records about the person. The tables are linked by the Operator field which is a primary key in DH_MASTER.

The users want to search by the name stored in DH_MASTER as well as search through all of their known aliases. If any matches are found in either DH_MASTER or DH_ALIAS then the DH_MASTER entity should be returned.

I created the query below which should give the results I described (return any DH_MASTER rows where the DH_MASTER.Name == name or DH_MASTER.DH_ALIAs(n).Name == name).

It works fine if I use only ONE of the .Contains lines. It doesn't matter which one I use. But the execution fails when I try to use BOTH at the same time.


    qry = From m In Context.DH_MASTERs _
          Where (m.Name.Contains(name)) _
              OrElse ((From a In m.DH_ALIAs _
                       Where a.Name.Contains(name)).Count() > 0) _
          Select m

The LinqToSQL Query evaluates to the following SQL code (as displayed in the SQL Server Query Visualizer)


SELECT [t0].[Operator], [t0].[Name], [t0].[Version]
FROM [DHOWNER].[DH_MASTER] AS [t0]
WHERE ([t0].[Name] LIKE %smith%) OR (((
    SELECT COUNT(*)
    FROM [DHOWNER].[DH_ALIAS] AS [t1]
    WHERE ([t1].[Name] LIKE %smith%) AND ([t1].[Operator] = [t0].[Operator])
    )) > 0)

EDIT: Checking the "Show Original" box in the Query Visualizer reveals the parameterized query as expected so this block of text below should be ignored.

I don't know if this is a problem or not but the `.Contains` evaluates to a `LIKE` expression (which is what I expect to happen) but the parameter is not encapsulated in apostrophes.

The interesting thing is that if I copy/paste the SQL Query into SQL 2005 Query Analyzer and add the apostrophes around the LIKE parameters, it runs just fine. In fact, it's lightning quick (blink of an eye) even with more than 2 million rows.

But when the LINQ query runs, the web app locks up for about 31 seconds before it finally fails with this error on gv.DataBind: Exception has been thrown by the target of an invocation.

With this innerException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

Does anyone know why this happens and how the behavior can be worked around? It's driving me nuts because the LinqToSql-generated SQL runs fine in query analyzer!

Update:

I have refactored my code based on the techniques in the answer. This works!

qry = From m In qry _
 Where m.Name.Contains(name) OrElse _ 
 m.DH_ALIAs.Any(Function(aliasRec) aliasRec.Name.Contains(name)) _ 
Select m

From stackoverflow

Linq to sql doesn't specify values directly into the query, it uses parameters. Are you sure it had the contains parameter value directly in the sql?

Anyway, the timeout is likely caused by a deadlock: the query wants to read from a row in a table which is locked by another (insert/update) query /transaction and that query apparently takes longer to complete.

Jeff Robinson : You're right about the first part - when I click the "Show Original" checkbox in the visualizer it shows the parameterized query. However, these tables are never inserted/updated other than through a nightly SSIS package.
This might not be appropriate, since your problem is different, but I remember a problem in one of my programs: Contains() would not work (in my case it would throw an Exception when evaluating), so maybe the Contains-method is a bit broken.

I replaced
```
result.Contains( x )
```
with
```
result.Any( p => p == x )
```
which did work.

Can you try if that works? At least it might be a step in the right direction.

Jeff Robinson : Using your idea, I was able to refactor my code a bit. This works better than my original code but as you can see, it does not eliminate the need for the .Contains call because I still want to return "fuzzy" matches. For some reason, calling .Contains a second time inside a lambda expression works, but calling it a second time outside the lambda expression does not work!

SQL 2008 Mirroring, in "restoring" state

I'm testing SQL Server 2008 mirroring with a principal, a mirror and a witness. I've gone through the examples, restoring the mirror WITH NORECOVERY. The principal says it is synchronized. But the mirror still shows it is in a "restoring..." state. Am I supposed to just leave it like that? Should I run a RESTORE WITH RECOVERY?

From stackoverflow

Yes, it stays in a recovering state.

The mirror isn't directly accessible--it only goes "live" when the main db goes down, and a failover occurs.

Ordering the results of a query from a temp table.

I have a SQL query where I am going to be transferring a fair amount of response data down the wire, but I want to get the total rowcount as quickly as possible to facilitate binding in the UI. Basically I need to get a snapshot of all of the rows that meet a certain criteria, and then be able to page through all of the resulting rows.

Here's what I currently have:

SELECT --primary key column
INTO #tempTable 
FROM --some table 
--some filter clause
ORDER BY --primary key column

SELECT @@ROWCOUNT

SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table

DROP TABLE #tempTable

Every once in a while, the query results end up out of order (presumably because I am doing an unordered select from the temp table).

As I see it, I have a couple of options:

Add a second order by clause to the select from the temp table.
Move the order by clause to the second select and let the first select be unordered.
Create the temporary table with a primary key column to force the ordering of the temp table.

What is the best way to do this?

From stackoverflow

Personally I would select out the data in the order you want to eventually have it. So in your first select, have your order by. That way it can take advantage of any existing indexes and access paths.
Move the order by from the first select to the second select.

A database isn't a spreadsheet. You don't put the data into a table in a particular order.

Just make sure you order it properly when you get it back out.
Use number 2. Just because you have a primary key on the table does not mean that the result set from select statement will be ordered (even if what you see actually is).

There's no need to order the data when putting it in the temp table, so take that one out. You'll get the same @@ROWCOUNT value either way.

So do this:
```
SELECT --primary key column
INTO #tempTable 
FROM --some table 
--some filter clause


SELECT @@ROWCOUNT

SELECT --the primary key column and some others
FROM #tempTable
JOIN -- some table
ORDER BY --primary key column

DROP TABLE #tempTable
```

Changing SharePoint Site Collection Title

I need to change the title of a site collection in SharePoint (MOSS 2007). I found one post saying it can be done in SharePoint Designer, but I wasn't seeing the specified menus, and haven't been able to find it anywhere else. I'm assuming I can do it programmatically if necessary, but I'd like to think they made it easier than that (silly me).

UPDATE: I actually didn't follow your advice entirely. I simply changed the XML file you located the heading in, and that worked perfectly. Thanks!

From stackoverflow

When you said changing the "title of a site collection", do you mean the title of the site collection's top-level-site?

If so, non-programmatically, at the the top level site, go: Site Settings > Title, Description, and Icon.

And programmatically, you can new up the SPWeb of the top level site, then set the Title property.

orthod0ks : No, I mean the entire collection. The collection name is displayed on each page. By default it says 'Home - ' on each page before the site name. I want to change the 'Home' portion.
No, I mean the entire collection. The collection name is displayed on each page. By default it says 'Home - ' on each page before the site name. I want to change the 'Home' portion.
OK, I think I know what you mean now. You want to change the title for the html page and not the title of a site. I did a little digging and here's what I found.

The text "Home" comes from this xml file (on my rig):

C:\Inetpub\wwwroot\wss\VirtualDirectories\80\App_GlobalResources\wss.en-US.resx
```
<data name="multipages_homelink_text">
    <value>Home</value>
</data>
```
By default, Sharepoint creates the title text by concatenating "Home - " + site's title. So if you want to have totally custom text, do the following:

Open this file for edit:

C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\SiteTemplates\sts\default.aspx

Next, replace the sharepoint title with your custom text:

Before:
```
<asp:Content ContentPlaceHolderId="PlaceHolderPageTitle" runat="server">
    <SharePoint:EncodedLiteral runat="server" text="<%$Resources:wss,multipages_homelink_text%>" EncodeMethod="HtmlEncode"/> - <SharePoint:ProjectProperty Property="Title" runat="server"/>
</asp:Content>
```
After:
```
<asp:Content ContentPlaceHolderId="PlaceHolderPageTitle" runat="server">
    Your custom text goes here...
</asp:Content>
```
Hope this is what you're looking for!!!

Jason : note that modifying the SiteTemplate default.aspx file is an unsupported customization. it may get overwritten in subsequent releases/hotfixes/service packs. the "blessed" way to do this, is to create your own site definition based on an out of the box one (http://tinyurl.com/6uqaez).

barneytron : Thanks Jason. That's good to know!!!
In your case, I think the best way is to do it through code. I think barneytron's answer can solve your problem!
@ barneytron Perfect. thats exactly right. thanks

Accessing Oracle DB through SQL Server using OPENROWSET

I'm trying to access a large Oracle database through SQL Server using OPENROWSET in client-side Javascript, and not having much luck. Here are the particulars:

A SQL Server view that accesses the Oracle database using OPENROWSET works perfectly, so I know I have valid connection string parameters. However, the new requirement is for extremely dynamic Oracle queries that depend on client-side selections, and I haven't been able to get dynamic (or even parameterized) Oracle queries to work from SQL Server views or stored procedures.
Client-side access to the SQL Server database works perfectly with dynamic and parameterized queries.
I cannot count on clients having any Oracle client software. Therefore, access to the Oracle database has to be through the SQL Server database, using views, stored procedures, or dynamic queries using OPENROWSET.
Because the SQL Server database is on a shared server, I'm not allowed to use globally-linked databases.

My idea was to define a function that would take my own version of a parameterized Oracle query, make the parameter substitutions, wrap the query in an OPENROWSET, and execute it in SQL Server, returning the resulting recordset. Here's sample code:

// db is a global variable containing an ADODB.Connection opened to the SQL Server DB
// rs is a global variable containing an ADODB.Recordset
. . .
ss = "SELECT myfield FROM mytable WHERE {param0} ORDER BY myfield;";
OracleQuery(ss,["somefield='" + somevalue + "'"]);
. . .
function OracleQuery(sql,params) {
  var s = sql;
  var i;
  for (i = 0; i < params.length; i++) s = s.replace("{param" + i + "}",params[i]);
  var e = "SELECT * FROM OPENROWSET('MSDAORA','(connect-string-values)';"
    + "'user';'pass','" + s.split("'").join("''") + "') q";
  try {
    rs.Open("EXEC ('" + e.split("'").join("''") + "')",db);
  } catch (eobj) {
    alert("SQL ERROR: " + eobj.description + "\nSQL: " + e);
  }
}

The SQL error that I'm getting is Ad hoc access to OLE DB provider 'MSDAORA' has been denied. You must access this provider through a linked server. which makes no sense to me. The Microsoft explanation for this error relates to a registry setting (DisallowAdhocAccess). This is set correctly on my PC, but surely this relates to the DB server and not the client PC, and I would expect that the setting there is correct since the view mentioned above works.

One alternative that I've tried is to eliminate the enclosing EXEC in the Open statement:

rs.Open(e,db);

but this generates the same error.

I also tried putting the OPENROWSET in a stored procedure. This works perfectly when executed from within SQL Server Management Studio, but fails with the same error message when the stored procedure is called from Javascript.

Is what I'm trying to do possible? If so, can you recommend how to fix my code? Or is a completely different approach necessary?

Any hints or related information will be welcome. Thanks in advance.

From stackoverflow

I'm answering this myself. I found the answer, and I'm not happy with the results. The functions that have worked are being run under my personal user id, and I have db-owner privileges. For the ad hoc access to work, I need to either set the DisallowAdhocAccess registry setting to 0, or give db-owner privileges to the user id used in the web access. Because this is a shared server with tight security, I won't be allowed to change the registry setting, which would affect much more than my database. And I consider the second option to be equally dangerous.

As a result, I'm apparently stuck with forcing users to have the Oracle Instant Client installed so I can open an ADO connection to the Oracle database directly in client-side Javascript.

I will still welcome any alternative thoughts on this.
Opening client-side ADO connections to a database is a huge security no-no. You’re essentially giving the user the connection credentials to your database and daring them to find a hole in your database security. Even if your audience is internal to your company you can run into problems with them not having the oracle drivers installed (or an older version). Why not just hide the Oracle query behind a JSON call? This way you can sanitize your database input on the server side, and your user never received the database connection credentials in plain text.

Ken Paul : I agree. If I could get the query to work server-side I would do that. The problem is that I can't get the query to work on the server at all, due to the restrictions of OPENROWSET syntax.
try to recommand a linked server in your sqlserver box http://msdn.microsoft.com/en-us/library/ms188279.aspx to be used to get the oracle data

Why does my c program not free memory as it should?

I have made a program in c and wanted to see, how much memory it uses and noticed, that the memory usage grows while normally using it (at launch time it uses about 250k and now it's at 1.5mb). afaik, I freed all the unused memory and after some time hours, the app uses less memory. Could it be possible, that the freed memory just goes from the 'active' memory to the 'wired' or something, so it's released when free space is needed? btw. my machine runs on mac os x, if this is important.

From stackoverflow

How do you determine the memory usage? Have you tried using valgrind to locate potential memory leaks? It's really easy. Just start your application with valgrind, run it, and look at the well-structured output.

Nino : eeehm .. i just started activity monitor.app and watched how much memory it uses.

Konrad Rudolph : The reason I asked is that monitor.app and similar applications don't show the real memory usage reliably. Zan has pointed out one of the reasons for that.

Aaron Maenpaa : Valgrind is an awesome tool +1.
What kind of application is it?

Third Party stuff also tends to have memory leaks. (GUIs and whatnot).

I second Konrad Rudolphs proposal.
If you're looking at the memory usage from the OS, you are likely to see this behavior. Freed memory is not automatically returned to the OS, but normally stays with the process, and can be malloced later. What you see is usually the high-water mark of memory use.

As Konrad Rudolph suggested, use something that examines the memory from inside the process to look for memory links.
The C library does not usually return "small" allocations to the OS. Instead it keeps the memory around for the next time you use malloc.

However, many C libraries will release large blocks, so you could try doing a malloc of several megabytes and then freeing it.
On OSX you should be able to use MallocDebug.app if you have installed the Developer Tools from OSX (as you might have trouble finding a port of valgrind for OSX).

/Developer/Applications/PerformanceTools/MallocDebug.app

Nino : mallocdebug.app doesn't seem to allow network connections, and as my application handles network requests, it's difficult to see how much memory it allocates after a request
I agree with what everyone has already said, but I do want to add just a few clarifying remarks specific to os x:

First, the operating system actually allocates memory using vm_allocate which allocates entire pages at a time. Because there is a cost associated with this, like others have stated, the C library does not just deallocate the page when you return memory via free(3). Specifically, if there are other allocations within the memory page, it will not be released. Currently memory pages are 4096 bytes in mac os x. The number of bytes in a page can be determined programatically with sysctl(2) or, more easily, with getpagesize(2). You can use this information to optimize your memory usage.

Secondly, user-space applications do not wire memory. Generally the kernel wires memory for critical data structures. Wired memory is basically memory that can never be swapped out and will never generate a page fault. If, for some reason, a page fault is generated in a wired memory page, the kernel will panic and your computer will crash. If your application is increasing your computer's wired memory by a noticeable amount, it is a very bad sign. It generally means that your application is doing something that significantly grows kernel data structures, like allocating and not reaping hundreds of threads of child processes. (of course, this is a general statement... in some cases, this growth is expected, like when developing a virtual host or something like that).
In addition to what the others have already written:

malloc() allocates bigger chunks from the OS and spits it out in smaller pieces as you malloc() it. When free()ing, the piece first goes into a free-list, for quick reuse by another malloc if the size fits. It may at this time be merged with another free item, to form bigger free blocks, to avoid fragmentation (a whole bunch of different algorithms exist there, from freeLists to binary-sized-fragments to hashing and what not else). When freed pieces arrive so that multiple fragments can be joined, free() usually does this, but sometimes, fragments remain, depending on size and orderof malloc() and free(). Also, only when a big such free block has been created will it be (sometimes) returned to the OS as a block. But usually, malloc() keeps things in its pocket, dependig on the allocated/free ratio (many heuristics and sometimes compile or flag options are often available).

Notice, that there is not ONE malloc/free algotrithm. There is a whole bunch of different implementations (and literature). Highly system, OS and library dependent.

Why does C# List<T>.Find seemingly return NullReferenceException?

First off, according to http://msdn.microsoft.com/en-us/library/x0b5b5bc.aspx, the List.Find method is only listed as throwing ArgumentNullException. However I have the following test code which, when using Find with an anonymous delegate, throws a NullReferenceException when the object being searched for is not found.

namespace MyTestNS
{
  class MyTestClass
  {
    [TestMethod()]
    public void ArrayMatchTest()
    {
        List<A> objArray = new List<A>();
        objArray.Add(new A("1","one"));
        objArray.Add(new A("2", "two"));

        string findStr = "3";
        string foundVal;
        // Find using an anonymous delegate:
        foundVal = objArray.Find(delegate(A a) // <- System.NullReferenceException: Object reference not set to an instance of an object..
        {
            if (a.name == findStr)
                return true;
            else return false;
        }).value;
    }
  }
}

I don't understand why I'm getting a NullReferenceException instead of the Find just not finding the item and returning null. I'm 90% sure it's some subtle coding error on my part that I just haven't seen, but this has been bugging me all day, please help!

EDIT: I should mention I inherited this convoluted code form someone else, so the twisty code you see above is a somewhat simplified version of whats failing in my real code.

From stackoverflow

Find is returning null. But then you are dereferencing that result. That is, you're invoking:

Find(...).value

Ogre Psalm33 : You are exactly correct! Sometimes it stinks maintaining someone else's convoluted code!
You have a lot of stuff going on in one place - I would suggest that you simplify it a bit so that you can see exactly what is going wrong.

A simpler version of your Find invocation is this:
```
A a1 = objArray.Find(a => a.name == findStr);
```
What happens if your Predicate<A> returns false? The objArray returns you a null reference and a is assigned to it. Now it is clear to see that using a will cause a NullReferenceException.

Ogre Psalm33 : I bumped your answer too because this is a step towards samuel's solution. I inherited this convoluted code from someone else, so I should have simplified it to start with, as you suggested.

Hide/Show controls with AJAX

I am using an ASP.NET AJAX-Enabled Web application (ASP.NET 2.0 and AJAX Toolkit 1.0) that contains one button and 2 UpdatePanels (UpdatePanel_1 and UpdatePanel_2)

The button is registered with RegisterAsyncPostBackControl in the ScriptManager object UpdatePanel_1 is in "Conditional" update mode and contains a TextBox.

UpdatePanel_2 is in "Always" update mode and contains another TextBox

When the button is pressed its handler calls UpdatePanel_1.Update() that updates the value of the TextBox based on a randomly selected value in a list; Also the UpdatePanel_2's TextBox is being updated automatically , also without page refresh

Based on the value of a boolean ViewState variable I would also like to hide/show the UpdatePanels alternatively but I get the error :

"Sys.InvalidOperationException: COuld not find UpdatePanel with ID 'UpdatePanel_2' (or UpdatePanel_1).

If it is  being updated dinamically then it must be inside another UpdatePanel"

How can it be done without adding extra wrapping UpdatePanels?

Thanks,

arunganu

protected void Page_Load(object sender, EventArgs e)
{
    ScriptManager1.RegisterAsyncPostBackControl(Button1); 


    if (!IsPostBack)   
    {

        Visibility = true;
    }

    UpdatePanel_1.Visible = !Visibility;
    UpdatePanel_2.Visible = Visibility;

    Visibility = !Visibility;        
}


protected void Button1_Click(object sender, EventArgs e)
{
        if (Panel1.Visible)
                 UpdatePanel_1.Update();    
}

protected bool Visibility
{
    get
    {
        return (bool)(ViewState["Visibility"] ?? true);
    }
    set
    {
        ViewState["Visibility"] = value;
    }
}

From stackoverflow

The problem is that invisible controls aren't rendered to the client. So then trying to make them visible isn't going to work because as far as the client is concerned, they don't exist.

Try using style="display:none", or use different CSS classes and styles for visible and invisible panels, rather than setting visible=false;
You can invisible, or visible controls is child of updatepanel, not invisible, visible updatepanel, I try use updatemode = conditional but error, and then I visible controls add to updatepanel. Hopy help you Thanks everybody post

Delete or update a dataset in HDF5?

I would like to programatically change the data associated with a dataset in an HDF5 file. I can't seem to find a way to either delete a dataset by name (allowing me to add it again with the modified data) or update a dataset by name. I'm using the C API for HDF5 1.6.x but pointers towards any HDF5 API would be useful.

From stackoverflow

According to the user guide (section 5.2, you'll need to scroll down some):

The size of the dataset cannot be reduced after it is created. The dataset can be expanded by extending one or more dimensions, with H5Dextend. It is not possible to contract a dataspace, or to reclaim allocated space.

HDF5 does not at this time provide a mechanism to remove a dataset from a file, or to reclaim the storage from deleted objects. Through the H5Gunlink function one can remove links to a dataset from the file structure. Once all links to a dataset have been removed, that dataset becomes inaccessible to any application and is effectively removed from the file. But this does not recover the space the dataset occupies.

The only way to recover the space is to write all the objects of the file into a new file. Any unlinked object is inaccessible to the application and will not be included in the new file.

So deleting appears to be out of the question. On the other hand modifying the dataset in place is supported.

Barry Wark : Thanks. Any idea how PyTables (a python engine built on top of HDF5) handles this?

Max Lybbert : The documentation for "altering" a table in PyTables is at http://www.pytables.org/moin/HintsForSQLUsers#Alteringatable , but note "(adding a column) is currently not supported in PyTables."

model (3ds) stats & snapshot in linux

I want to write an app that takes in a model filename via cmd line, create a list of stats (poly count, scaling, as much as possible or maybe the stats that i would like) and to load the model with its textures (with anything else) and draw it from multiple position to save the images as pngs.

How would i do this? are there utils i can use to extract data from models? how about drawing the models? my server does not have a desktop or video card, would no video HW be a problem?

From stackoverflow

Have a look at the 3DS specifications at wotsit.org. AFAIK, there are no official specs. Another possibility would be to look for open source 3DS libraries/tools and use them.
You can use http://www.lib3ds.org/ for reading the data, IIRC it also comes with a sample program that you could base your code on.

Access x86 COM from x64 .NET

I have an x64 server which, since my libraries are compiled to AnyCPU, run under x64. We are needing to access a COM component which is registered under x86. I don't know enough about COM and my google searches are leading me nowhere.

Question: Can I use a symbolic registry link from x64 back to x86 for the COM component? Do I need to register the COM component under x64 as well? Can I (any statement here...) ?

Thanks.

From stackoverflow

If a component is running x64-native, it can't load a 32-bit COM server in-process, because it's the wrong sort of process. There are a couple of solutions possible:
1. If you can, build a 64-bit version of the COM code (which would of course register itself in the 64-bit registry). This is the cleanest solution, but may not be possible if you don't have the code for the COM server.
2. Run your .NET component as 32-bit x86, instead of x64. I assume you've already considered and rejected this one for some reason.
3. Host the COM component out-of-process using the COM surrogate DLLhost.exe. This will make calls to the COM server much, much slower (they will now be interprocess Windows messages instead of native function calls), but is otherwise transparent (you don't have to do anything special).
  
  This probably won't be an option if the server requires a custom proxy-stub instead of using the normal oleaut32 one (very rare, though), since there won't be a 64-bit version of the proxy available. As long as it can use the ordinary OLE marshalling, you can just register it for surrogate activation.
Craig Wilson : #1 is not possible as there is no x64 version. #2 defeats the purpose of running on x64. #3 worked great. We can live with the performance hits here until we get a new version of the library. Thanks for your help.
It's your COM component is housed in a COM server (ie a seperate process) then you won't need to do anything special as the COM subsystem will remote your calls from your x64 app to the X86 app and back again.

If your component is an in-process COM component then you'll have to rethink things as a 64 bit process can use 32 bit in process COM components. You could force your server to run under x86 so that you can access the components (they'll both be 32 bit processes). If you don't want to do this then you'll have to see if there a x64 bit version of the COM components you're using.
I have found this solution, Dealing with Legacy 32-bit Components in 64-bit Windows see in article :
• Converting a project type from in-process to out-of-process
• Using COM+ as a host (this work for me)
• Using dllhost as a surrogate host

System.Runtime.InteropServices.COMException (0x82DA0002): Exception from HRESULT: 0x82DA0002

Does anyone know what this means?

System.Runtime.InteropServices.COMException (0x82DA0002): Exception from HRESULT: 0x82DA0002 at System.Windows.Forms.Control.MarshaledInvoke(Control caller, Delegate method, Object[] args, Boolean synchronous) at System.Windows.Forms.Control.Invoke(Delegate method, Object[] args) at Midden.cMidden.OnFileChanged(Object sender, FileSystemEventArgs e) at System.IO.FileSystemWatcher.OnDeleted(FileSystemEventArgs e) at System.IO.FileSystemWatcher.NotifyFileSystemEventArgs(Int32 action, String name) at System.IO.FileSystemWatcher.CompletionStatusChanged(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* overlappedPointer) at System.Threading.IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped pOVERLAP)

From stackoverflow

From MSDN: (hope it's helpful)

The exception that is thrown when an unrecognized HRESULT is returned from a COM method call.

The common language runtime transforms well-known HRESULTs to.NET exceptions, enabling COM objects to return meaningful error information to managed clients. The HRESULT to exception mapping also works in the other direction by returning specific HRESULTs to unmanaged clients. For mapping details, see How to: Map HRESULTs and Exceptions.

When the runtime encounters an unfamiliar HRESULT (an HRESULT that lacks a specific, corresponding exception), it throws an instance of the COMException class. This all-purpose exception exposes the same members as any exception, and includes a public ErrorCode property that contains the HRESULT returned by the callee. If an error message is available to the runtime (obtained from the IErrorInfo interface or the Err object in Visual Basic, or in some cases from the operating system), the message is returned to the caller. However, if the COM component developer fails to include an error message, the runtime returns the eight-digit HRESULT in place of a message string. Having an HRESULT allows the caller to determine the cause of the generic exception.

Although you can use the COMException class to return specific HRESULTs to unmanaged clients, throwing a specific.NET exception is better than using a generic exception. Consider that managed clients as well as unmanaged clients can use your.NET object, and throwing an HRESULT to a managed caller is less comprehendible than throwing an exception.

Python's re module - saving state?

Hi,

One of the biggest annoyances I find in Python is the inability of the re module to save its state without explicitly doing it in a match object. Often, one needs to parse lines and if they comply a certain regex take out values from them by the same regex. I would like to write code like this:

if re.match('foo (\w+) bar (\d+)', line):
  # do stuff with .group(1) and .group(2)
elif re.match('baz whoo_(\d+)', line):
  # do stuff with .group(1)
# etc.

But unfortunately it's impossible to get to the matched object of the previous call to re.match, so this is written like this:

m = re.match('foo (\w+) bar (\d+)', line)
if m:
  # do stuff with m.group(1) and m.group(2)
else:
  m = re.match('baz whoo_(\d+)', line)
  if m:
    # do stuff with m.group(1)

Which is rather less convenient and gets really unwieldy as the list of elifs grows longer.

A hackish solution would be to wrap the re.match and re.search in my own objects that keep state somewhere. Has anyone used this? Are you aware of semi-standard implementations (in large frameworks or something)?

What other workarounds can you recommend? Or perhaps, am I just misusing the module and could achieve my needs in a cleaner way?

Thanks in advance

From stackoverflow

You might like this module which implements the wrapper you are looking for.

Eli Bendersky : Thanks, this is what I had in mind

Peter Rowell : Thanks for the pointer! I like the basic concept of the recipe linked to, but it could be improved if you are using a more recent version of Python. I'm strictly 2.5+ so I'm going to go play hacky-hack now.

You could write a utility class to do the "save state and return result" operation. I don't think this is that hackish. It's fairly trivial to implement:

class Var(object):
    def __init__(self, val=None): self.val = val

    def set(self, result):
        self.val = result
        return result

And then use it as:

lastMatch = Var()

if lastMatch.set(re.match('foo (\w+) bar (\d+)', line)):
    print lastMatch.val.groups()

elif lastMatch.set(re.match('baz whoo_(\d+)', line)):
    print lastMatch.val.groups()

Eli Bendersky : This is an interesting concept. Hmm, it can handle a lot of the cases where Python's inability to use assignment in an expression hurts.

Trying out some ideas...

It looks like you would ideally want an expression with side effects. If this were allowed in Python:

if m = re.match('foo (\w+) bar (\d+)', line):
  # do stuff with m.group(1) and m.group(2)
elif m = re.match('baz whoo_(\d+)', line):
  # do stuff with m.group(1)
elif ...

... then you would clearly and cleanly be expressing your intent. But it's not. If side effects were allowed in nested functions, you could:

m = None
def assign_m(x):
  m = x
  return x

if assign_m(re.match('foo (\w+) bar (\d+)', line)):
  # do stuff with m.group(1) and m.group(2)
elif assign_m(re.match('baz whoo_(\d+)', line)):
  # do stuff with m.group(1)
elif ...

Now, not only is that getting ugly, but it's still not valid Python code -- the nested function 'assign_m' isn't allowed to modify the variable m in the outer scope. The best I can come up with is really ugly, using nested class which is allowed side effects:

# per Brian's suggestion, a wrapper that is stateful
class m_(object):
  def match(self, *args):
    self.inner_ = re.match(*args)
    return self.inner_
  def group(self, *args):
    return self.inner_.group(*args)
m = m_()

# now 'm' is a stateful regex
if m.match('foo (\w+) bar (\d+)', line):
  # do stuff with m.group(1) and m.group(2)
elif m.match('baz whoo_(\d+)', line):
  # do stuff with m.group(1)
elif ...

But that is clearly overkill.

You migth consider using an inner function to allow local scope exits, which allows you to remove the else nesting:

def find_the_right_match():
  # now 'm' is a stateful regex
  m = re.match('foo (\w+) bar (\d+)', line)
  if m:
    # do stuff with m.group(1) and m.group(2)
    return # <== exit nested function only
  m = re.match('baz whoo_(\d+)', line)
  if m:
    # do stuff with m.group(1)
    return

find_the_right_match()

This lets you flatten nesting=(2*N-1) to nesting=1, but you may have just moved the side-effects problem around, and the nested functions are very likely to confuse most Python programmers.

Lastly, there are side-effect-free ways of dealing with this:

def cond_with(*phrases):
  """for each 2-tuple, invokes first item.  the first pair where
  the first item returns logical true, result is passed to second
  function in pair.  Like an if-elif-elif.. chain"""
  for (cond_lambda, then_lambda) in phrases:
    c = cond_lambda()
    if c:
      return then_lambda(c) 
  return None


cond_with( 
  ((lambda: re.match('foo (\w+) bar (\d+)', line)), 
      (lambda m: 
          ... # do stuff with m.group(1) and m.group(2)
          )),
  ((lambda: re.match('baz whoo_(\d+)', line)),
      (lambda m:
          ... # do stuff with m.group(1)
          )),
  ...)

And now the code barely even looks like Python, let alone understandable to Python programmers (is that Lisp?).

I think the moral of this story is that Python is not optimized for this sort of idiom. You really need to just be a little verbose and live with a large nesting factor of else conditions.

Eli Bendersky : LOL and ++ about the Lispy code. Well thought out :-) But I would be strongly unhappy with any programmer who writes real code like this ;-)

Aaron : @eliben - heh, thanks. could have been worse.. at least I didn't try to use call/cc!

class last(object):
  def __init__(self, wrapped, initial=None):
    self.last = initial
    self.func = wrapped

  def __call__(self, *args, **kwds):
    self.last = self.func(*args, **kwds)
    return self.last

def test():
  """
  >>> test()
  crude, but effective: (oYo)
  """
  import re
  m = last(re.compile("(oYo)").match)
  if m("abc"):
    print("oops")
  elif m("oYo"): #A
    print("crude, but effective: (%s)" % m.last.group(1)) #B
  else:
    print("mark")

if __name__ == "__main__":
  import doctest
  doctest.testmod()

last is also suitable as a decorator.

Realized that in my effort to make it self-testing and work in 2.5, 2.6, and 3.0 that I obscured the real solution somewhat. The important lines are marked #A and #B above, where you use the same object to test (name it match or is_somename) and retrieve its last value. Easy to abuse, but also easy to tweak and, if not pushed too far, get surprisingly clear code.

Based on the great answers to this question, I've concocted the following mechanism. It appears like a general way to solve the "no assignment in conditions" limitation of Python. The focus is transparency, implemented by silent delegation:

class Var(object):
    def __init__(self, val=None):
        self._val = val

    def __getattr__(self, attr):
        return getattr(self._val, attr)

    def __call__(self, arg):
        self._val = arg
        return self._val


if __name__ == "__main__":
    import re

    var = Var()

    line = 'foo kwa bar 12'

    if var(re.match('foo (\w+) bar (\d+)', line)):
        print var.group(1), var.group(2)
    elif var(re.match('baz whoo_(\d+)', line)):
        print var.group(1)

In the general case, this is a thread-safe solution, because you can create your own instances of Var. For more ease-of-use when threading is not an issue, a default Var object can be imported and used. Here's a module holding the Var class:

class Var(object):
    def __init__(self, val=None):
        self._val = val

    def __getattr__(self, attr):
        return getattr(self._val, attr)

    def __call__(self, arg):
        self._val = arg
        return self._val

var = Var()

And here's the user's code:

from var import Var, var
import re

line = 'foo kwa bar 12'

if var(re.match('foo (\w+) bar (\d+)', line)):
    print var.group(1), var.group(2)
elif var(re.match('baz whoo_(\d+)', line)):
    print var.group(1)

While not thread-safe, for a lot of simple scripts this provides a useful shortcut.

probability interview question, random sampling

This is a good one because it's so counter-intuitive:

Imagine an urn filled with balls, two-thirds of which are of one color and one-third of which are of another. One individual has drawn 5 balls from the urn and found that 4 are red and 1 is white. Another individual has drawn 20 balls and found that 12 are red and 8 are white. Which of the two individuals should feel more confident that the urn contains two-thirds red balls and one-third white balls, rather than vice-versa? What odds should each individual give?

I know the right answer, but maybe I don't quite get the odds calculation. Can anyone explain?

From stackoverflow

I assume that the 'a priori' probability of one hypothesis versus the other is 1/2, and moreover that both the individuals reinsert each ball after extracting it (extractions are independent from each other).

The correct answer is that the second observer should be more confident than the first. My previous answer was wrong due to a trivial error in computations, many thanks and +1 to Adam Rosenfield for his correction.

Let 2/3R 1/3W denote the event "the urn contains 2/3 of red balls and 1/3 white balls", and let 4R,1W denote the event "4 red balls and 1 white ball get extracted". Then, using Bayes's rule,

P[2/3R 1/3W | 4R,1W] = P[4R,1W | 2/3R 1/3W] P[2/3R 1/3W] / P[4R,1W] = (2/3)⁴ (1/3)¹ (1/2) / P[4R, 1W]

Now, since 2/3R 1/3W and 1/3R 2/3W are complementary by hypothesis,

P[4R,1W] = P[4R,1W | 2/3R 1/3W] P[2/3R 1/3W] + P[4R,1W | 1/3R 2/3W] P[1/3R 2/3W] = (2/3)⁴ (1/3)¹ (1/2) + (1/3)⁴ (2/3)¹ (1/2)

Thus,

P[2/3R 1/3W | 4R,1W] = (2/3)⁴ (1/3)¹ (1/2) / { (2/3)⁴ (1/3)¹ (1/2) + (1/3)⁴ (2/3)¹ (1/2) } = 2^4 / (2^4 + 2) = 8/9

The same calculation for P[2/3R 1/3W | 12R,8W] (i.e. having (2/3)¹² (1/3)⁸ instead of (2/3)⁴ (1/3)¹) yields now 16/17, hence the confidence of the second observer is greater than that of the first.

Jason S : re: the reinsertion -- not necessary if the # of balls is large (probably an equally valid assumption)

FryGuy : shouldn't P[4R, 1W | 2/3R 1/3W] = (2/3)^4 * (1/3)^1 * (5 choose 4)? Also, I'm not sure how you came up with a 50% a priori distribution

Daniel Daranas : @FryGuy the 50% (or any other known number!) a priori is a must precondition to make a decision... If I tell you a priori "100% sure that there are 2/3 red balls" then the problem is trivial, both people can be equally confident... too many data missing here, I think

Adam Rosenfield : Check your arithmetic - your reasoning is sound, but if you plug in your numbers you should get 8/9 for the first observer and 16/17 for the second observer.

Federico Ramponi : @Adam Rosenfield: AAARGH! there is a 2^1 that magically becomes 1. Correcting in a minute. Thank you very much!
P[2/3R 1/3W | 4R, 1W] = (2/3)^4 * (1/3)^1 * (1/2) / { (2/3)^4 * (1/3)^1 * (1/2) + (1/3)^4 * (2/3)^1 * (1/2) } = 2^4 / (2^4 + 1) = 16/17

er,
```
= ⅔^4*⅓ / (⅔^4*⅓ + ⅓^4*⅔)
= 16/243 / (16/243 + 2/243)
= 16/18
```
P(⅔R⅓W | 12R8W) does indeed however = 16/17, so the 12R8W can be more confident.

yx : if that is the case, then how is this problem counter intuitive? more sampling = more confidence, especially when your sample agrees with what you expect

yx : btw, my comment was more directed at the "This is a good one because it's so counter-intuitive:" line the topic creator said.

Daniel Daranas : I don't see how anyone should "intuite" _anything_ from the statement of the problem. One has taken more balls, the other has a stronger red percentage, so both have arguments in their favour of being more confident. You have to calculate and find the result, you can't guess anything.

bobince : Yeah, I dunno, unless there's another sneaky arithmetic error caused by my gin intake. I would have guessed 12R8W to be more likely, although I'd not have been at all sure about it...

A. Rex : @Daneil Daranas: Your comments on the "prime factor of 3*10^11" question were hilarious. Unfortunately, this problem requires *no* calculation and is easy if you know the theory. You're right it's a poor programming question, but it isn't "too long and tedious" and you *can* intuit the answer.

Daniel Daranas : @A. Rex Who is right then, Federico (same probability) or bobince (more one than the other)? And what is the "reasoning" without any calculation?

A. Rex : @Daniel Daranas (pardon my misspelling last time): bobince and Adam Rosenfield are correct because their arithmetic is correct. Please see my explanation for reasoning without calculation, as well as calculation without mistakes.

Federico Ramponi : @bobince: (2/3)^4 * (1/3)^1 * (1/2) / { (2/3)^4 * (1/3)^1 * (1/2) + (1/3)^4 * (2/3)^1 * (1/2) }, you can simplify 1/2 and (1/3)^5 in both numerator and denominator, you are left with 2^4/(2^4 + 2) = 8/9. The (2^4 + 1) was a mistake while I was typing my answer, sorry for that.

bobince : Indeed, correct - I left the denominator as it was just to make clear that 16/17>16/18.

Federico Ramponi : @yx: I suppose that the counter-intuition stems from the fact that the first observer has a sample which is "more biased" toward the 2/3-red-hypothesis (he has a larger fraction of red balls). But remember that the second has a **larger sample**, and both these facts must be taken into account.

Daniel Daranas : What I said. There are two conflicting "reasons for confidence", one for each person, and you can't guess _anything_. Just do the math, either in the normal way or with @A. Rex's interesting shorthand method.
Eliezer Yudkowsky has a (really, really long, but good) explanation of Bayes' Theorem. About 70% down, there's a paragraph beginning "In front of you is a bookbag" which explains the core of this problem.

The punchline is that all that matters is the difference between how many red and white balls have been drawn. Thus, contrary to what others have been saying, you don't have to do any calculations. (This is making either of the reasonable assumptions (a) that the balls are drawn with replacement, or (b) the urn has a lot of balls. Then the number of balls doesn't matter.) Here's the argument:

Recall Bayes' theorem: P(A|B) = P(B|A) * P(A) / P(B). (A note on terminology: P(A) is the prior and P(A|B) is the posterior. B is some observation you made, and the terminology reflects your confidence before and after your observation.) This form of the theorem is fine, and @bobince and @Adam Rosenfield correctly applied it. However, using this form directly makes you susceptible to arithmetic errors and it doesn't really convey the heart of Bayes' theorem. Adam mentioned in his post (and I mention above) that all that matters is the difference between how many red and white balls have been drawn, because "everything else cancels out in the equations". How can we see this without doing any calculations?

We can use the concepts of odds ratio and likelihood ratio. What is an odds ratio? Well, instead of thinking about P(A) and P(¬A), we will think about their ratio P(A) : P(¬A). Either is recoverable from the other, but the arithmetic works out nicer with odds ratios because we don't have to normalize. Furthermore, it's easier to "get" Bayes' theorem in its alternate form.

What do I mean we don't have to normalize, and what is the alternate form? Well, let's compute. Bayes' theorem says that the posterior odds are

P(A|B) : P(¬A|B) = (P(B|A) * P(A) / P(B)) : (P(B|¬A) * P(¬A) / P(B)).

The P(B) is a normalizing factor to make the probabilities sum to one; however, we're working with ratios, where 2 : 1 and 4 : 2 odds are the same thing, so the P(B) cancels. We're left with an easy expression which happens to factor:

P(A|B) : P(¬A|B) = (P(B|A) * P(A)) : (P(B|¬A) * P(¬A)) = (P(B|A) : P(B|¬A)) * (P(A) : P(¬A))

We've already heard of the second term there; it's the prior odds ratio. What is P(B|A) : P(B|¬A)? That's called the likelihood ratio. So our final expression is

posterior odds = likelihood ratio * prior odds.

How do we apply it in this situation? Well, suppose we have some prior odds x : y for the contents of the urn, with x representing 2/3rds red and y representing 2/3rds white. Suppose we draw a single red ball. The likelihood ratio is P(drew red ball | urn is 2/3rds red) : P(drew red ball | urn is 2/3rds white) = (2/3) : (1/3) = 2 : 1. So the posterior odds are 2x : y; had we drawn a white ball, the posterior odds would be x : 2y by similar reasoning. Now we do this for every ball in sequence; if the draws are independent, then we just multiply all the odds ratios. So we get that if we start with an odds ratio of x : y and draw r red balls and w white balls, we get a final odds ratio of

(x : y) * (2 : 1)^r * (1 : 2)^w = (x * 2^r) : (y * 2^w) = (x : y) * (2^(r-w) : 1).

so we see that all that matters is the difference between r and w. It also lets us easily solve the problem. For the first question ("who should be more confident?"), the prior odds don't matter, as long as they're not 1 : 0 or 0 : 1 and both people have identical priors. Indeed, if their identical prior was x : y, the first person's posterior would be (2^3 * x) : y, while the second person's posterior would be (2^4 * x) : y, so the second person is more sure.

Suppose moreover that the prior odds were uniform, that is 1 : 1. Then the first person's posterior would be 8 : 1, while the second person's would be 16 : 1. We can easily translate these into probabilities of 8/9 and 16/17, confirming the other calculations.

The point here is that if you get the bolded equation above, then this problem is really easy. But as importantly, you can be sure you didn't mess up any arithmetic, because you have to do so little.

So this is a bad programming question, but it is a good test of the bolded equation. Just for practice, let's apply it to two more problems:

I randomly choose one of two coins, a fair coin or a fake, double-headed coin, each with 50% probability. I flip it three times and it comes up heads all three times. What's the probability it's the real coin?

The prior odds are real : fake = 1 : 1, as stated in the problem. The probability that I would have seen three heads with the real coin is 1 / 8, but it's 1 with the fake coin, so the likelihood ratio is 1 : 8. So the posterior odds are = prior * likelihood = 1 : 8. Thus the probability it's the real coin is 1 / 9.

This problem also brings up an important caveat: there is a possibly different likelihood ratio for every possible observation. This is because the likelihood ratio for B is P(B|A) : P(B|¬A), which is not necessarily related to the likelihood ratio for ¬B, which is P(¬B|A) : P(¬B|¬A). Unfortunately, in all the examples above, they've been inverses of each other, but here, they're not.

Indeed, suppose I flip the coin once and get tails. What's the probability it's the real coin? Obviously one. How does Bayes' theorem check out? Well, the likelihood ratio for this observation is the probability of seeing this outcome with the real coin versus the fake coin, which is 1/2 : 0 = 1 : 0. That is, seeing a single tails kills the probability of the coin's being fake, which checks out with our intuition.

Here's the problem I mentioned from Eliezer's page:

In front of you is a bookbag containing 1,000 poker chips. I started out with two such bookbags, one containing 700 red and 300 blue chips, the other containing 300 red and 700 blue. I flipped a fair coin to determine which bookbag to use, so your prior probability that the bookbag in front of you is the red bookbag is 50%. Now, you sample randomly, with replacement after each chip. In 12 samples, you get 8 reds and 4 blues. What is the probability that this is the predominantly red bag? (You don't need to be exact - a rough estimate is good enough.)

The prior odds are red : blue = 1 : 1. The likelihood ratios are 7 : 3 and 3 : 7, so the posterior odds are (7 : 3)^8 * (3 : 7)^4 = 7^4 : 3^4. At this point we just estimate 7 : 3 as, say, 2 : 1, and get 2^4 : 1 = 16 : 1. Our final answer is even greater, so it's definitely bigger than 95% or so; the right answer is around 96.7%. Compare this with most people's answers, which are in the 70--80% range.

I hope you agree that problems become really easily, and intuitive, when viewed in this light.

A. Rex : PS. I think for the "who should feel more confident" part, it doesn't actually matter if you're drawing with replacement. It does, of course, matter for the probability calculations.

Daniel Daranas : I had to read it a couple times, but I think I get it... :)

A. Rex : Great! Thanks for reading, Daniel.
Let A be the event that 2/3 of the balls are red, and then ¬A is the event that 2/3 of the balls are white. Let B be the event that the first observer sees 4 red balls out of 5, and let C be the event that the second observer sees 12 red balls out of 20.

Applying some simple combinatorics, we get that
- P(B|A) = (5 choose 4)(2/3)⁴(1/3)¹ = 80/243
- P(B|¬A) = (5 choose 4)(1/3)⁴(2/3)¹ = 10/243
Therefore, from Bayes' Law, observer 1 has a confidence level of 80/(80+10) = 8/9 that A is true.

For the second observer:
- P(C|A) = (20 choose 12)(2/3)¹²(1/3)⁸ = 125970 * 2¹²/3²⁰
- P(C|¬A) = (20 choose 12)(1/3)¹²(2/3)⁸ = 125970 * 2⁸/3²⁰
So again from Bayes' Law, observer 2 has a confidence level of 2¹²/(2¹² + 2⁸) = 16/17 that A is true.

Therefore, observer two has a higher confidence level that 2/3 of the balls are red. The key is to understand how Bayes' Law works. In fact, all that matters is the difference in the number of red and white balls observed. Everything else (specifically the total number of balls drawn) cancels out in the equations.

A. Rex : Adam, if you haven't seen this calculation done with odds and likelihood ratios, take a look at my post. I hope you enjoy it.

urlsafe_b64encode always ends in '=' ?:

I think this must be a stupid question, but why do the results of urlsafe_b64encode() always end with a '=' for me? '=' isn't url safe?

from random import getrandbits
from base64 import urlsafe_b64encode
from hashlib import sha256
from time import sleep

def genKey():
   keyLenBits = 64
   a = str(getrandbits(keyLenBits))
   b = urlsafe_b64encode(sha256(a).digest())
   print b

while 1:
   genKey()
   sleep(1)

output :

DxFOVxWvvzGdOSh2ARkK-2XPXNavnpiCkD6RuKLffvA=
xvA99ZLBrLvtf9-k0-YUFcLsiKl8Q8KmkD7ahIqPZ5Y=
jYbNK7j62KCBA5gnoiSpM2AGOPxmyQTIJIl_wWdOwoY=
CPIKkXPfIX4bd8lQtUj1dYG3ZOBxmZTMkVpmR7Uvu4s=
HlTs0tBW805gaxfMrq3OPOa6Crg7MsLSLnqe-eX0JEA=
FKRu0ePZEppHsvACWYssL1b2uZhjy9UU5LI8sWIqHe8=
aY_kVaT8kjB4RRfp3S6xG2vJaL0vAwQPifsBcN1LYvo=
6Us3XsewqnEcovMb5EEPtf4Fp4ucWfjPVso-UkRuaRc=
_vAI943yOWs3t2F6suUGy47LJjQsgi_XLiMKhYZnm9M=
CcUSXVqPNT_eb8VXasFXhvNosPOWQQWjGlipQp_68aY=

From stackoverflow

Base64 uses '=' for padding. Your string bit length isn't divisible by 24, so it's padded with '='. By the way, '=' should be URL safe as it's often used for parameters in URLs.

See this discussion, too.

Tomalak : You mean "divisible by 4".

Ant P. : No, divisible by 24, i.e. 3 input bytes per 4 out.

Tomalak : Oh, you said "the *bit* length". Then you are right of course. But since when does Base64 operate on bits?
The '=' is for padding. If you want to pass the output as the value of a URL parameter, you'll want to escape it first, so that the padding doesn't get lost when later reading in the value.
```
import urllib
param_value = urllib.quote_plus(b64_data)
```
Python is just following RFC3548 by allowing the '=' for padding, even though it seems like a more suitable character should replace it.
I would expect that an URI parser would ignore a "=" in the value part of a parameter.

The URI parameters are: "&" , [name], "=", [value], next, so an equals sign in the value part is harmless. An unescaped ampersand has more potential to break the parser.

Parsing XML into a SQL table WITHOUT predefining structure. Possible?

So using the code below... can I parse @xml_data into a table structure without predefining the structure?

DECLARE @receiveTable TABLE(xml_data XML) DECLARE @xml_data XML
DECLARE @strSQL NVARCHAR(2000)
SET @strSQL = 'SELECT * INTO #tmp1 FROM sysobjects;
DECLARE @tbl TABLE(xml_data xml);
DECLARE @xml xml;    
Set @xml = (Select * from #tmp1 FOR XML AUTO);
INSERT INTO @tbl(xml_data) SELECT @xml;
SELECT * FROM @tbl'

INSERT INTO @receiveTable EXEC (@strSQL)    
SET @xml_data = (SELECT * FROM @receiveTable)    
SELECT @xml_data

From stackoverflow

As in your @xml_data, if /element[1] has the same number of attributes as /element[n] and they're in the same order ltr, you can.

It's not pretty, but you can:

declare @tbl_xml xml 
set @tbl_xml = (
  select @xml_data.query('
      <table>
        {for $elem in /descendant::node()[local-name() != ""] 
        return <row name="{local-name($elem)}">
          {for $attr in $elem/@*
            return <col name="{local-name($attr)}" value="{$attr}" />}
        </row>}
      </table>'
  )
)

declare @sql_def_tbl varchar(max)
select @sql_def_tbl = 
  coalesce(@sql_def_tbl,'')
    +'declare @tbl table ('+substring(csv,1,len(csv)-1)+') '
  from (
    select (
      select ''+col.value('@name','varchar(max)')+' varchar(max),'
      from row.nodes('col') r(col) for xml path('')
    ) csv from @tbl_xml.nodes('//row[1]') n(row)
  ) x

declare @sql_ins_rows varchar(max)
select @sql_ins_rows = 
  coalesce(@sql_ins_rows,'')
    +'insert @tbl values ('+substring(colcsv,1,len(colcsv)-1)+') '
  from (
    select (
      select ''''+col.value('@value','varchar(max)')+''','
      from row.nodes('col') r(col) for xml path('')
    ) colcsv from @tbl_xml.nodes('//row') t(row)
  ) x

exec (@sql_def_tbl + @sql_ins_rows + 'select * from @tbl')

Wednesday, March 23, 2011

Blog Archive