in ENVI and ERDAS you can use a sub-pixel endmember classifier (ex: SAM).
Yes, is difficult, probably impossible to separate some classes in urban areas based only on spectral values.
An OBIA approach, using eCognition can produce better classification results compared to traditional per-pixel methods (ex: maximum likehood and ISODATA). This is because eCognition creates object segments, and you can use the shape, texture or position of the objects in you satellite image for superior classification results.
Conclusion: first, try to use a sub-pixel method, with a neural network method, and if the results are not satisfactory, try to use eCognition.
You can find the eCognition software here, on this forum, with the crack/medicine, Give a search on the forum.